Introduction
The power system consists of remotely placed generating units where bulk power is transferred through high-voltage transmission networks to the city centers and distributed to the consumers through low-voltage distribution networks [1]. Electric transmission networks are susceptible to different kinds of faults, which can occur at any random location on the interconnected high-voltage lines, resulting in full or partial disconnection of the transmission lines. Therefore, fast and accurate identification of transmission line faults and their location is of significant importance to prevent cascading failure leading to blackouts and for the secure and uninterrupted transfer of bulk electric energy over long distances [2]. Heating due to high temperatures, insulation breakdown, sudden load changes, switching actions, and lines breaking due to natural calamities can lead to permanent faults [3]. Once a permanent fault occurs, it is necessary to classify (to analyze the extent of repair needed) and locate the faults for quick maintenance. There are five types of permanent faults in a power system: Single-line-to-ground (SLG), Double-line-to-ground (DLG), Line-to-line (LL), Line-to-line-to-line-to-ground (LLLG), and Line-to-line-to-line (LLL) faults. Traditionally, manual inspections or model-based methods, namely impedance and traveling waves, were used to locate faults in power systems [4], [5]. However, the impedance-based method requires complex mathematical modeling, domain expertise, time, and several assumptions in their implementation, making them inaccurate and less reliable for modern power systems [6]. While traveling wave-based methods do not require complex mathematical modeling, they require manual operation or installation of costly devices at each end of transmission lines. Further, these are not suitable for lines with tapping, such as three-terminal lines or insertion of loads and sources [4], [7], [8].
The availability of synchronized data measurement devices and data loggers has paved the way for learning-based methods such as Machine Learning (ML) models for the fault diagnosis of power systems, leading to considerable research on data-driven power system monitoring, control, and maintenance [9]. Several works on ML-based power system fault detection, classification, and localization are reported in the literature. Various models, including Logistic Regression (LR), K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Neural Network (NN), Naïve Bayesian (NB), Decision Tree (DT), Gaussian Process Regression (GPR), and Ensemble methods have been proposed for power systems fault diagnosis [5].
LR has been used as a base classifier model for fault detection and classification [10]. KNN has found application as a classifier for fault detection in microgrids [11], fault classification in two-bus systems [12], fault classification in the IEEE 9 Bus system [13], and faulty zone detection in the IEEE 9 Bus System [14]. Many researchers have incorporated NB as a base classifier model for fault classification in two-bus systems [15] and microgrid fault detection and classification [16]. DT has been utilized as a base model for both fault detection and classification [17], [18], and for faulty zone detection [14] in two bus systems. A regression tree-based approach was implemented by [19] for fault localization in the IEEE 39 Bus System. SVM, a prevalent classification and regression model in the literature, has been extensively used for power systems fault diagnosis. SVM was applied for high-impedance fault detection in the IEEE 13 Bus System [20] and distribution feeders [21]. SVM is found to be an excellent classifier for fault classification in different transmission networks [16], [22], [23], [24], [25]. Similarly, works by [19], [26], [27], and [28] have all utilized SVM regression (SVR) for fault localization in various power networks. GPR has also been employed for fault localization [29], [30], [31]. Interest in ensemble methods is growing in the literature, with numerous works being published on one or more ensemble methods. Random Forest (RF) has been utilized for fault classification [19], classification and localization [32], and fault location detection [14]. Other ensemble models such as Bagging, Boosting, and AdaBoost are also present in the literature for fault classification in transmission and distribution networks [13], [33], classification, and faulty branch identification [34]. Many studies on fault classification and faulty line/zone detection have demonstrated outstanding performance on the studied power system networks. However, fewer studies on fault localization using ML regression are available in the literature. Moreover, power system fault data, comprising line currents and voltages measured at different buses or network nodes, exhibits some degree of correlation despite variations in fault attributes. In the literature, it has been observed that Bayesian Ridge regression is an ideal technique for datasets with multicollinearity [35]. Therefore, it has been proposed and compared with Extra Tree (ET) regressor and other potential ML regression models in this paper for the localization of transmission network faults. XGBoost is a new and advanced ML ensemble technique, while Bayesian ridge regression and ET have been utilized in literature in other fields for the past few years. However, they have not yet been applied to power system fault localization. Furthermore, the exceptional performance of RF in power system fault classification has motivated exploration into other potential ensemble techniques, such as ET and XGBoost, which have not been widely used in power system fault diagnosis. XGBoost has been noted as a fast ensemble technique due to multithreading parallel computing [36]. Compared with RF, ET has been found to perform equally well and has lower complexity [37].
The integration of solar and wind energy-based electrical power plants into distribution and transmission networks has been driven by the scarcity of conventional energy resources and their adverse environmental impacts [38]. These renewable energy plants/units can vary widely in capacity, ranging from small-scale units in kilowatts to large-scale installations in megawatts. Typically, large-scale renewable energy generating units are integrated into the transmission level, while smaller-scale units are connected to distribution networks. Over the past decades, numerous large-scale renewable energy sources (RES) have been integrated into grids worldwide [39], with many more expected to follow. This integration reduces reliance on conventional coal-based power plants, thereby reducing greenhouse gas emissions [40], [41], [42]. While RES-based power plants offer environmental benefits, their integration presents challenges to grid operations. The complexity of power systems increases with RES integration, heightening system vulnerability [42]. Ensuring stability and effective power management in RES integrated power networks, balancing supply and demand, and setting protection devices appropriately becomes particularly challenging. Although power flow in transmission lines is typically unidirectional, integrating RES at different locations introduces tapping points and enables bi-directional power flow in the lines [43]. Additionally, RES can feed fault currents during faults, which results in increased fault currents in lines [44]. Consequently, the signature of a fault occurring at a location will change with the inclusion of a new RES unit, even if the fault attributes remain the same. However, limited works have been reported in the literature for ML-based power system fault diagnosis with RES integration [45]. Existing works include Support Vector Data Description-based faulty region identification for distributed energy resources (DER) integration [46] and SLG fault detection for varying levels of DER penetrations in distribution networks [9]. A fault classification study has also been conducted for distribution networks with two distributed generation (DG) units using a convolutional neural network (CNN) [47]. Recently, a faulty line identification approach for RES integrated system utilizing a deep learning framework, with CNN layers for feature extraction from voltage and current waveforms, has been proposed [8].
Based on the literature survey above, it can be deduced that ML-based fault classifiers perform satisfactorily for conventional power system networks [5]. However, the performance of these classification schemes has not been sufficiently tested with RES integrated power systems. Furthermore, studies have yet to explore how the integration of a new RES into a conventional power network impacts the performance of these ML classifiers. Therefore, given the growing trend of RES integration into transmission networks [38], there is a pressing need for performance analysis of potential ML models post RES integration before their implementation in actual power systems. Additionally, it is noted that most fault localization schemes are aimed at identifying faulty lines or sections of transmission and distribution networks using classification approaches [14], [34]. However, pinpointing the exact fault location on a line using ML regressors holds greater value for expedited maintenance of transmission networks. Notably, there is no existing literature on identifying the precise fault location on transmission lines in RES integrated transmission systems using either ML-based classification or regression techniques. Moreover, the wide variability in power generation from RES plants due to weather conditions, particularly temperature and irradiance fluctuations throughout the day and year, directly impacts the power output from solar PV-based RES [48]. Consequently, the power fed into the grid fluctuates, leading to variations in fault current levels [49]. Therefore, it is imperative to consider temperature and irradiance variations while analyzing solar PV-based RES integrated transmission networks [50]. Nevertheless, existing literature on ML-based power system fault diagnosis overlooks these critical issues.
The integration of RES into an existing transmission network can significantly change the power system topology and fault characteristics, depending on the size of the added RES unit [43]. Large fluctuations in power generation from RES can result in substantial deviations of fault currents from their normal values [49], potentially leading to higher misclassification rates of ML models and significant errors in fault location estimation. However, to date, no reported study in the literature examines the performance of ML models for transmission line fault diagnosis following the integration of new RES of varying sizes. Therefore, further research is needed to investigate how ML models utilized for fault diagnosis in existing power systems are affected by the integration of new RES of diverse sizes. Additionally, the literature review suggests that only limited studies are available on ML-based fault diagnosis of RES integrated power systems. Furthermore, these studies assumed that RES had been integrated long ago and that sufficient fault data representing diverse fault variations were available for training ML models. However, this is not true when a new RES is integrated into a real-world power network. Transmission line faults statistically occur infrequently in real-world power systems [51]. Gathering diverse fault data, including various fault locations, types, and attributes, typically requires several years. Therefore, there is a very low probability of different types of faults occurring with significant variations in fault attributes shortly after RES integration. Hence, it is crucial to analyze the performance of ML models while considering these practical issues to ensure uninterrupted power transfer through lines, meeting current and future needs of power system protection and maintenance [43]. Therefore, this paper aims to fill this research gap by analyzing a power system in which the system topology changes due to the integration of RES of varying sizes, and fault data for the altered system is unavailable for ML-based transmission line fault classification and localization. An adaptability analysis of the considered ML models for power system fault diagnosis post RES integrations has also been conducted. This analysis will assist in selecting appropriate ML models based on their learning capabilities for the changed system topology under the practical condition of minimal fault data availability over time. The main objectives and contributions of the research work presented in this paper are as follows:
To analyze and compare the performance of XGBoost, Extra Tree, and Bayesian ridge regression with the potential ML models used to classify and locate power system faults considering diverse fault attributes and the impact of temperature and irradiance on power generation from different sizes of Solar PV RES.
To analyze and compare the impact of RES integrations on transmission line fault classification and localization performance of various ML models.
To test the adaptability performance of the ML models for classification and localization of faults after RES integrations considering real-world power systems scenarios of fault data availability over time for identifying models capable of rapid learning with minimal samples of new fault data post RES integrations.
The remaining sections of the paper are organized as follows: Section II describes the test system and ML models used in the study. Section III presents the methodology adopted to conduct the proposed study and the scenarios under examination. Section IV comprehensively describes fault data generation for the standard IEEE 9 bus system and the IEEE 9 bus system with RES integrations. This section also explains various fault attributes considered during the creation of the fault database. Section V presents the results and discussion of transmission line fault classification and localization conducted for all three scenarios. This section also presents the analysis of adaptability trends exhibited by machine learning classifiers and regressors in response to the incremental availability of fault data post RES integrations. Finally, the conclusions derived from this study have been stated in section VI.
An Overview of the Test System and ML Models
This section describes the test system and the machine learning models chosen for the proposed analysis. It also discusses the rationale for selecting the transmission network and RES integration cases, which were considered for studying the impact and adaptability performance of the ML models. Additionally, this section provides an overview of the key features of the selected ML models, along with the values of the hyperparameters used for training the models in the proposed study.
A. IEEE 9 Bus System
For fault localization studies, numerous ML-based fault diagnosis analyses have been conducted in power system literature on simple networks, such as two-bus transmission lines of varying voltage levels and lengths. However, standard IEEE transmission and distribution systems have been preferred as they represent real power systems. Several studies on power system fault diagnosis have utilized standard IEEE systems, including the IEEE 9 Bus [14], 14 Bus [52], 39 Bus [53], and 68 Bus [54] in the transmission network, and IEEE 4 Bus [55], 13 Bus [56], 33 Bus [57], and 34 Bus [58] in the distribution network. Furthermore, combined transmission and distribution networks such as IEEE 123 Bus [59] have been widely referenced. Generally, these studies were conducted on large systems, considering two or three network lines. The standard IEEE 9 Bus System comprises six transmission lines. Therefore, instead of selecting one or two transmission lines from a larger transmission network, the IEEE 9 Bus system was chosen to facilitate the analysis of RES integration’s impact on the complete network. The IEEE 9 bus comprises nine buses, three synchronous generating units, six transmission lines, three two-winding transformers, and three loads [60], [61]. The lengths of the six interconnected transmission lines are as follows: Line 4–5 is 116.798km, Line 4–6 is 115.131 km, Line 7–5 is 211.955 km, Line 7–8 is 98.907 km, Line 8–9 is 138.603 km, and Line 9–6 is 235.579 km at a frequency of 50 Hz. The single-line representation of the IEEE 9 Bus system is shown in Figure 1.
B. Res Placement and Size Selection
The increasing integration of large-scale RES into existing transmission and distribution networks worldwide is driven by the pursuit of green and clean energy. Many countries have planned to increase their RES-based power generation to 50-70% of their total generation capacity by the next few decades [62]. Solar PV power generation capacity alone has reached 849 GW globally, according to the International Renewable Energy Agency (IRENA) 2021 report, making it a significant source of future energy generation. Consequently, the integration of large-scale solar plants into the power grid is growing worldwide. Therefore, this paper analyzes the impact of solar-based RES integration into transmission networks on ML-based fault diagnosis.
Optimal placement and sizing analysis is essential before integrating RES into the power grid to ensure voltage stability, grid reinforcement, minimized power loss, on-peak operation cost, and improved load factor [39]. The optimal placement (bus location) and size (generation capacity in MW) of RES integration into a transmission network can be determined for a given transmission network. Several methodologies are available in the literature for optimal placement and sizing of RES into a transmission system. The RES integrated into the IEEE 9 Bus system under study has been placed after considering optimal placement and maximum allowable sizing analysis [62]. Bus number 7 (Bus 7) is identified as the best location for RES integration, followed by bus number 5 (Bus 5) based on the Lyapunov exponent estimation analysis presented in [62]. The assumed maximum allowable RES size at these locations is 30MW [62]. However, the RES integration may be of the maximum allowed size or some lower value. Therefore, this paper integrates RES of three sizes into the power system: 10MW, 20MW, and 30MW. Additionally, smaller sizes than the maximum allowable penetration have been considered to address practical issues such as installation time and land availability limits at the optimal placement point. Given two optimal placement locations with the possibility of three different sizes of RES integration at each, there are nine possible combinations for RES integration into the IEEE 9 bus system. Six combinations out of nine are considered to avoid redundancy in the results. Further, an independent analysis was conducted to evaluate the performance and adaptability of different machine learning models in classifying and localizing faults for each of the six RES integration cases to gain insights into the impact of RES integration. Integration of three different sizes of solar PV plant-based RES at BUS 7 and BUS 5 has been defined as Cases in further study and is given below:
Case 1: 10MW at Bus 7
Case 2: 10MW at Bus 7 and 10MW at Bus 5
Case 3: 20MW at Bus 7 and 10MW at Bus 5
Case 4: 20MW at Bus 7 and 20MW at Bus 5
Case 5: 30MW at Bus 7 and 20MW at Bus 5
Case 6: 30MW at Bus 7 and 30MW at Bus 5
C. Machine Learning Models Selection
Various ML models have been used in the literature for fault diagnosis of transmission and distribution networks, as discussed in the literature review part of the introduction. Many of these models have been found to perform exceptionally well. Only a few works are available for RES integrated system fault diagnosis. Moreover, none of these are tested for fault diagnosis after RES integration, considering real power system data availability issues. In this study, the best performing models for power system fault diagnosis and a few base models used in literature are selected for their performance testing after RES integration. The models included in this study for the classification of transmission line faults are LR, KNN, SVM, Multilayer Perceptron (MLP), Gaussian NB, DT, RF, AdaBoost, Bagging, and Ridge Regression. Similarly, the regression models used for fault localization are SVR, Regression Tree (RT), Bagging, RF, and KNN Regression (KNR). KNN is simple, easy to understand, and robust to outliers. However, its performance deteriorates for high dimensional data. SVM is suitable for non-linear problems with proper selection of kernel functions. SVM has good generalization capability and is robust to outliers. However, its performance is highly dependent on the tuning of hyperparameters, which is challenging, computationally expensive, and unsuitable for large datasets with several features.
DTs are fast, efficient, and easy to interpret but are prone to overfitting and have high variance. Bagging is less prone to overfitting as it has low variance. However, based on the data, it may have a high bias. RF gives high accuracy and works well for large datasets. It is robust to overfitting; however, ensemble methods work like a black box. Thus, knowing how they predict is difficult [5]. Further, the proposed research study includes a few other models that have been reported to improve performance in different engineering applications and are potentially suitable for power systems fault diagnosis. Hence, XGBoost and ET models are used for fault classification, and Bayesian Ridge regression (BRR) and ET regressors (ETR) are tested for fault localization.
Like RF, extremely randomized trees, called extra trees (ET), have classification and regression algorithms. ET differs slightly from other ensemble methods, such as RF and bagging. Here, many unpruned DTs are developed for the whole training dataset, whereas, in RF and bagging, DTs are developed from a bootstrap sample of the training dataset. The final prediction from the ET classifier is made by majority voting from all DTs and for the ET regressor by averaging results from all DTs. The ET model has several advantages, such as reduced bias and variance, and is faster than many ensemble methods. However, it isn’t easy to interpret and may introduce overfitting [37]. XGBoost has features like multithreading parallel computing, making it faster than other ensemble methods. XGBoost is fast, gives high accuracy, prevents overfitting, and is not easy to interpret. It is very suitable for big data sets that are common in actual power systems. Also, it doesn’t require normalizing data obtained from PMUs or fault data loggers; hence, it is very suitable for power system applications [36]. Bayesian ridge regression has lesser overfitting relatively and effectively deals with multicollinearity. However, its performance is highly dependent on the appropriate selection of priors. Thus, Bayesian ridge regression is very suitable for datasets with multicollinearity, which is very common in power system data; thus, it is perfect for power systems related applications [35].
The ML models were trained and tested on the Jupyter Notebook platform, employing the Sklearn library [63]. The models used in the present study were tuned to give the best performance, for which a wide range of hyperparameter values were systematically investigated to achieve optimal outcomes across all models presented in the paper. The critical parameters used to train the classification and regression models are given in Table 1 and Table 2, respectively. The nomenclature of model parameters shown in Table 1 and Table 2 are explained in the documentation of the scikit-learn application programming interface (API) [63].
Proposed Methodology
This section describes the methodology adopted to accomplish the objectives outlined in the introduction. Figure 2 illustrates the proposed performance and adaptability study of ML models for classifying and localizing faults in conventional transmission networks post RES integrations. The integration of RES changes the fault characteristics of the power system, necessitating an analysis of whether ML models trained on conventional power system fault data will maintain efficacy post RES integrations. However, such analyses are notably absent in existing literature. Moreover, the growing interest in smart grids, emphasizing green power generation, supports the importance of such analyses. The methodology presented in this section will enable power system engineers to choose a suitable ML model for fault diagnosis post RES integrations. Fault diagnosis in power systems necessitates two fundamental steps.
Fault Classification: This involves identifying the type of fault among five possible fault types (SLG, DLG, LL, LLL, LLLG), along with the phases of the faulty transmission line.
Fault Localization: This entails determining the precise location of the fault on the identified faulty transmission line.
The methodology adopted for the proposed research commences with developing a simulation model for the standard IEEE 9 bus system, which represents the conventional power system in this study and is detailed in the previous section. Initially, fault data for the IEEE 9 bus system is generated using the MATLAB Simulink environment, incorporating various practical variations in power system fault attributes to create a diverse fault database. Subsequently, in Scenario 1, the selected ML classifier and regressor models are trained and tested using this conventional power system fault database to assess their comparative fault diagnosis performances. As the fault database encompasses all fault types and adequate fault locations, satisfactory fault classification and localization performances of ML models were achieved, establishing it as a benchmark performance for Scenarios 2 and 3.
Given the unavailability of new fault data samples for all fault types and locations immediately after RES integration, the primary objective of Scenario 2 analysis is to evaluate the performance of previously trained ML models post RES integration. This analysis investigates whether ML models trained with conventional power system fault data will continue to accurately identify fault types and estimate fault locations after RES integrations. Solar PV plants of three different MW ratings have been integrated into the IEEE 9 Bus system at two optimal bus locations. Similar to conventional power systems, fault data for all six RES integration cases has been separately generated using MATLAB Simulink, considering temperature and irradiance effects. The performance of previously trained ML models is then assessed using new fault samples from RES integrated systems.
In adaptability testing, ML models are evaluated for their learning ability by incrementally training them with a growing amount of fault data from RES integrated systems over time, in conjunction with conventional power system fault data. The primary objective of Scenario 3 is to understand the learning trend of studied ML models. Thus, in Scenario 3, training each ML model commences by incorporating 0.25% of new fault data samples from the RES integrated fault database, with this percentage subsequently increasing to 0.5%, 1%, 2%, and so forth until the model’s performance matches with that observed for the conventional power system. This analysis can reveal the fastest learning classification and localization models suitable for dynamically changing transmission networks due to ongoing RES integrations. The model exhibiting the highest classification accuracy with the minimum requirement of new training data is deemed the most adaptable classifier. Similarly, the regression model demonstrating the lowest mean absolute percentage error (MAPE) with the least new training data is regarded as the most adaptable regression model for fault localization.
Implementation of the Proposed Analysis:
To investigate the impact of RES integration on the performance of ML models for power system fault classification and localization, the proposed study has been categorized into three major scenarios, as illustrated in Figure 2 and detailed in Table 3. Firstly, in Scenario 1, fault classification and localization for conventional power systems were conducted using various ML models. Then, the performance of the ML models proposed in the paper has been compared with the ML models used in the literature for power system fault diagnosis.
Secondly, in Scenario 2, the impact of RES integration on fault classification and localization has been analyzed and compared for various ML models. As explained in the previous section, the considered ML models are tested for RES integrations of different sizes into a power system for fault classification and localization. This scenario is crucial as no-fault data is available to train ML models after the new RES integration. Thus, ML models are trained only with standard IEEE 9 bus fault data. Analysis of this scenario helps to choose a suitable ML model for real-world power system conditions.
Thirdly, in Scenario 3, the learning ability has been tested and compared for fault classification and localization of the proposed ML model on the availability of fault data for the RES integrated system over time. In real-world systems, fault data for the RES integrated power system will be collected over time. Hence, analysis for ML models for fault classification and localization has been conducted with a gradually increasing amount of RES integrated power system fault data in training along with conventional power system fault data. Lastly, all ML models’ adaptability has been analyzed to identify the model that learns fast with the least available data. The training and testing datasets for the scenarios mentioned above are listed in Table 3.
The implementation of the proposed analysis is depicted in the flow chart provided in Figure 3. Furthermore, all significant stages of the study, including fault data generation, fault classification, and localization for all three scenarios, are elaborated upon with separate flow charts. Figure 4 illustrates the steps followed in fault data generation for both the standard IEEE 9 Bus system and the solar PV plant-based RES integrated system, including the number of fault data samples generated for the study. The procedures for fault classification and localization in each scenario are outlined in Figures 5 and 6, respectively.
Block diagram outlining the procedural steps for data generation and database formation.
Flowchart illustrating the procedural steps for fault classification in various scenarios.
Flowchart illustrating the procedural steps for fault localization in various scenarios.
Data Generation
Collecting sufficient labeled data of real-world power systems for fault study is difficult for the following reasons: 1) the power system usually operates normally, and failure is uncommon. As a result, healthy data makes up most of the collected data, while fault data is much smaller in scale; 2) Significant change is possible in the fault signature of even one type of fault occurring at the same location due to significant changes in attribute values. Collecting and labeling fault data under all fault attributes of the real-world power system is challenging. Moreover, the real-world power system data is unavailable for researchers to carry out the research. However, the availability of very high-end software such as MATLAB is explicitly designed to study various engineering systems, including power system networks [64]. These electromagnetic transients (EMT) based software uses advanced mathematical tools to model various power system components. Therefore, a simulated power system’s steady-state and transient response is close to the actual power system response. Thus, researchers use simulated data to do research, and development works on power system related fields. All the components of the ‘IEEE-9 Bus System’ and the ‘RES-integrated IEEE 9 BUS system’ have been simulated on MATLAB/Simulink to generate fault data. The fault databases used in the study have been created by simulating actual power system fault conditions on a simulated IEEE 9 Bus system model. The faults were generated on all six lines of the IEEE-9 Bus system and the RES-integrated IEEE-9 Bus system to form the fault database. Further, faults were generated for all five fault types, with their 11 faulty-phase combinations, at ten different locations of each transmission line by varying real-world power system fault conditions. While forming the fault database for RES integrated systems, the temperature and irradiation variations were also included for all six cases of RES integrations. To create a fault database, fault occurrences were simulated on all six transmission lines of the network, considering various fault attributes that affect the nature of the fault current. The fault current depends on factors such as faulty phase, fault types, fault resistance, fault inception angle, and fault distance from buses. On the occurrence of a fault, the three-phase voltages and currents were recorded at six buses. All 36 voltages and currents serving as features for ML Models have been collected at a sampling frequency of 1.6 kHz. The collected fault data comprises three cycles, one pre-fault cycle and two post-fault cycles. Figure 4 describes the fault dataset, i.e., the steps followed in fault data generation and database formation. It also states the number of fault data samples collected for the IEEE 9 bus system for each size/case of the RES integrated system. 15840 fault data samples have been generated for the IEEE 9 bus system and 4224 for each size RES integrated IEEE 9 bus system. Thus, 25344 fault data samples were generated for all combinations of RES integrated systems.
A. Conventional Power System Fault Data
The existing transmission networks, equipped with protection devices and fault data recorders, have been operational for decades [14]. Thus, an ample amount of fault data with significant diversity is available for these networks. Therefore, fault data for the standard IEEE 9 Bus system has been generated by considering all types of faults and variability of various fault attributes to form the conventional power system fault database. The values of different fault attributes for which data has been generated are shown in Table 4.
Fault Attributes:
Various attributes may affect the nature of fault current on fault occurrence. Since the fault current’s signature is crucial for fault diagnosis thus, we investigate every possibility that may vary the fault current after fault occurrence for fault data generation.
1) Faulty Phase and Fault Types
Five types of permanent faults may occur in power systems: SLG, DLG, LL, LLLG, and LLL. However, their occurrence may be classified into 11 different types, i.e., RG, YG, BG, RYG, YBG, RBG, RY, YB, RB, RYBG, RYB. Here R, Y, and B represent transmission line phases. Hence, this paper considers all possible combinations of fault occurrences during fault data generation.
2) Fault Resistance
The fault resistance means the opposition to the flow of current introduced by the material with which lines come into contact on the occurrence of the fault. For ground faults, this fault resistance is earth resistivity, while for LL or LLL faults, fault resistance is the material with which two lines/cables are shorted, such as wood or animal. Earth resistivity is an electrical characteristic of the ground and is very important while calculating the zero-sequence impedance of the transmission line. The value of earth resistivity varies significantly with soil type as Peat soil earth resistivity ranges (from 200 to 1200) typically taken as
3) Fault Inception Angle
The electrical degree instant at which a fault occurs is known as the fault inception angle (FIA). It varies between 0° to 360°. The instant at which fault occurs, i.e., FIA changes the transient level after fault occurrence. Different faults face maximum transient at different FIA [65]. Thus, fault data has been generated for various values of fault inception angles in this paper, as given in Table 4.
4) Fault Distance
In real-world power systems, a fault may occur at any point; however, to include sufficient diversity in the fault data for ML training, all six lines of the IEEE 9 Bus system have been divided into ten equal sections.
B. Res Integrated Fault Data
Solar PV Plants of 10, 20, and 30MW ratings were simulated on MATLAB separately to get power generation from these plants under varying temperature and irradiance values. The power generation obtained from these solar plants is tabulated in Table 5. Later, solar PV plants of varying sizes were integrated at Bus 7 and 5, explained in subsection B of section II. The integrated Solar PV Plant has been modeled considering standard temperature and irradiance variation throughout the day and year [66]. The key features influencing solar plants’ power output are solar irradiance, operating temperature, tilt angle, and load matching for maximum power. Among these, temperature and solar irradiance majorly influence power generation. The power output at lower temperatures is higher, while efficiency drops considerably at very high temperatures; therefore, electrical load adjustment and excess heat removal are needed for optimum power generation. The power output from PV plants has an almost direct linear relation with solar irradiance [67].
The IEEE 9 Bus System with RES integration is shown in Figure 7. For fault data generation of the RES integrated system, the RES is firstly placed on Bus 7 and then on Bus 5 in six different combinations, i.e., Bus 7-10MW, Bus 7-10MW and Bus 5-10MW, Bus 7-20MW and Bus 5-10MW, Bus 7-20MW and Bus 5-20MW, Bus 7-30MW, and Bus 5-20MW, and lastly Bus 7-30MW and Bus 5-30MW. As explained earlier, all these six combinations have been taken as different cases in this study to study the impact of RES integration of various sizes on ML model performance. Fault data has been generated considering one case or size combination at a time. The values of fault attributes at which fault data has been generated for the RES integrated system are listed in Table 6. The selection of Buses for RES placement has been made according to the optimal RES placement study on the IEEE 9 Bus system [62].
For the RES integrated system, fault data has been recorded for lines 4-5, 7-8, and 7-5, which are directly connected to the RES integrated bus. Meanwhile, the other three lines, lines 8-9, 4-6, and 9-6, are not directly connected to RES. Therefore, we have taken only line 8–9 among the lines not directly connected to the RES integrated bus, as the remaining lines will be affected similarly.
Results and Discussion
To achieve the objectives of this paper, the study has been designed in three scenarios, as shown in Figure 2. The complete information regarding the training and testing data used for evaluating ML models in each scenario is provided in Table 3. The performance of ML classifiers and regressors for fault classification and localization in each scenario has been analyzed. In this section, we report and discuss the testing accuracy obtained for fault classification and the Mean Absolute Percentage Error (MAPE) for fault location estimation. In our analysis, we first calculated the Mean Absolute Error (MAE) of the obtained fault location by finding the difference between the actual and estimated fault location. This value was then divided by the total length of the line and multiplied by 100 to ensure universality in all MAPE calculations regardless of line length.
A. Conventional Power System (Scenario 1)
In Scenario 1, we utilized the conventional power system fault database to evaluate the fault diagnosis performances of selected ML classifiers and regressor models. Since the conventional power system has been operational for a significant period, ample fault data representing all fault-type signatures are available for training. A total of 15,840 fault data samples were generated for the conventional power system, with SLG, DLG, and LL each having 4,320 samples, while LLLG and LLL have 1,440 samples each. The fault data was split into 80% for training and 20% for testing. The results obtained for fault classification and localization are discussed below.
1) Fault Classification
When several ML classification models were tested for conventional power system fault classification, the f1-score for various fault types obtained has been listed in Table 7. Additionally, Figure 8 presents the percentage testing accuracy for fault classification using bar plots. Ensemble methods such as ET, Bagging, RF, and XGBoost demonstrated remarkable performance, except for AdaBoost. The classification accuracy of RF, ET, Bagging, and XGBoost algorithms is 100% for conventional power systems, as sufficient fault data is available. DT demonstrated strong performance overall, except for showing misclassification for LLL faults. Similarly, the KNN model exhibited satisfactory performance. However, SVM struggled to classify non-ground faults in comparison to ground faults.
2) Fault Localization
Similarly, several ML regression models have been evaluated for the fault localization performance of all six transmission lines. In this evaluation, 80% of the data for each line fault was allocated for training, with the remaining 20% used for testing. The obtained location estimation MAPE of the studied models is listed in Table 8. Among the studied models, Bayesian Ridge regression (BRR) demonstrated the best performance, while RF, Bagging, and RT performed optimally. However, KNR and SVR exhibited inferior performance compared to the other models in this scenario.
The performance of SVM was unsatisfactory for both classification and localization, primarily due to its heavy reliance on feature selection and transformation techniques. Multi-class classification poses additional challenges for SVM algorithms [68]. In this study, voltage and current values were utilized without any feature engineering, potentially leading to SVM’s poor performance, especially in identifying LL and LLL faults, which are non-ground faults. Conversely, the performance of other models, such as RF, Bagging, ET, and BRR, remained notably good even without the use of feature engineering techniques. This suggests that these models are less dependent on such techniques for effective fault diagnosis.
B. Impact Analysis of Res Integration (Scenario 2)
In Scenario 2, we conducted an analysis of ML models for fault classification and localization subsequent to the integration of RES into conventional power systems. However, the available fault data pertains solely to conventional power systems, rendering the training of ML models with RES integrated power system fault data unfeasible. This limitation is common in real-world power systems undergoing the integration of new RES worldwide. Therefore, investigating this scenario is pivotal to identifying a model capable of effective fault diagnosis post RES integrations. In Scenario 2, the ML models were trained exclusively using the fault database of the standard IEEE 9 bus system. RES integration can occur in transmission networks of varying sizes, typically lower than the maximum allowable integration level. Previous studies on optimal RES placement on the IEEE 9 Bus System [62] considered 30 MW RES integration at an individual bus, with a total maximum allowed capacity of 60 MW. Hence, in this study, RES integration commenced at 10 MW for one bus and extended up to 30 MW for two buses. A detailed description of the training and testing fault dataset is provided in Table 3.
1) Fault Classification
The testing accuracies obtained for fault classification in Scenario 2 are illustrated in Figure 9. It was observed that when ML algorithms are trained on conventional power system fault data and subsequently tested for RES integrated power system faults, the classification accuracy degrades significantly compared to Scenario 1 performance. There are two possible ways to integrate a new RES into a transmission network. First, the incoming RES is added at a location (BUS) where one or more RES units are already connected. Second, the incoming RES is added at a new location (BUS) with no previously connected RES units. If the incoming RES is added at a new location (BUS), the total number of RES-integrated buses will increase. Therefore, the total number of transmission lines directly connected to the RES integrating BUS will also increase, resulting in degraded fault diagnosis performance of the pre-trained ML models. As depicted in Figure 9, the classification accuracy of all ML models exhibits a decreasing pattern as the level of RES integration increases. Further, there is a substantial decrease in accuracy from Case 1 (10MW at Bus 7) to Case 2 (10MW each at Bus 7 and Bus 5) as the number of lines directly connected to RES integrating bus increases. However, with further growth in the level of RES integrations, i.e., from Case 2 onwards, the number of lines directly connected to RES integration points does not change, and the degradation in accuracy becomes negligible for specific models like KNN, XGBoost, and RF. In contrast, it is minimal for others, such as ET, Bagging, and SVM. Therefore, the classification accuracy for Case 1 (only one RES) is higher than that of Case 2 (two RES) and the subsequent cases. The fault current level and distribution change significantly with each new RES integration. The f1-score for all fault types was poor; hence, it has not been listed.
The KNN classifiers performed better than all other classifiers tested in the study. Consequently, the KNN model is deemed suitable for unknown scenarios as it can discern fault patterns and extend the acquired knowledge of fault patterns from conventional power systems to untrained scenarios, such as integrating RES of unknown size. The KNN model operates based on the similarity index, thus exhibiting better performance for the altered system than other investigated models [69]. The testing accuracy of models such as DT, ET, and Bagging exhibited a decreasing trend as the fault current level increased with the size of RES. Conversely, AdaBoost, Logistic Regression, Ridge, and Gaussian NB consistently displayed poor classification accuracy for RES integration cases, rendering them unsuitable for ML-based fault diagnosis following new RES integrations. The classification accuracy of RF, KNN, and XGBoost declines with each new RES integration at a different bus. SVM outperformed other classifiers except for KNN in this scenario, indicating its strong generalization ability. However, its performance was inconsistent for Scenario 1, suggesting that it should be used with feature engineering for power system fault diagnosis.
2) Fault Localization
The fault localization in Scenario 2 was assessed for four out of six transmission lines, excluding those directly connected to RES. Consequently, lines directly linked to buses with integrated RES, namely Line 4-5, Line 7-5, and Line 7-8, were chosen for localization analysis. Assuming a similar effect on the other three lines not directly connected to RES, only Line 8–9 was included in the study. Before localization, identification of the faulty line was conducted to determine the line where the fault occurred. Figure 10 presents the column plot of MAPE of ML regression models for fault location estimation on the analyzed lines, illustrating the impact of increasing RES size.
Fault location estimation MAPE after RES integration for lines 4-5, 7-5, 7-8, and 8–9 of Scenario 2.
For most models, MAPE exhibited an increasing trend with the growth in RES integration size, as observed with Bagging, RF, and ET. Similar to classification, KNR outperformed other models in localization, displaying the lowest MAPE across all lines and cases of varying RES sizes. Additionally, SVR demonstrated superior performance compared to other regression models, with its location estimation remaining unaffected by changes in RES integration size. However, the BRR model failed to adequately fit the scenario, and its performance has not been reported. Consequently, it cannot be relied upon in scenarios involving network topology changes, such as RES integration, and in the absence of fault data.
The integration of RES plants at any bus alters the fault current level of the network, resulting in inferior localization performance compared to Scenario 1. Further, power generation from RES, a flexible power source, will bring more fluctuating changes in fault current level due to its weather dependent power generation characteristics. However, a lesser impact is expected if another generation source of fixed power is added. Further, the learning of ML models for such cases will be better compared to fluctuating power sources.
C. Adaptability Analysis After Res Integration (Scenario 3)
The performance of ML models for fault classification and localization was discussed in Scenario 2. However, as fault data for the RES integrated system was unavailable, Scenario 3 aimed to analyze the performance of ML models using minimal fault data collected from RES integrated systems of varying sizes. ML algorithms necessitate sufficient data covering all attribute variabilities to learn effectively. Nevertheless, the availability of post RES integration changed system data for training purposes is limited. Consequently, training was conducted using both conventional power system fault data and available fault samples from the RES integrated system. While the training set remained constant for all test cases in Scenario 2, in Scenario 3, each training set contained available fault data samples from RES integrated systems of the same size being tested. Consequently, ML models had limited samples to learn the fault patterns of the altered network topology and transfer their learning to updated system fault diagnosis. Therefore, this scenario analyzes the adaptability of models, aiding in the selection of appropriate models based on data availability and network topology.
1) Fault Classification
Testing the models to transfer their learning from conventional power system fault diagnosis to RES integrated system fault diagnosis, using only 5% of RES integrated system fault data for training, revealed that many models exhibited performance matching Scenario 1. ET, Bagging, RF, and XGBoost demonstrated fault classification accuracies comparable to those observed in Scenario 1, as shown in Table 9. This underscores their strong learning ability, as they can adapt to network changes with minimal training data. In contrast, although KNN’s performance improved compared to Scenario 2, it failed to reach the baseline performance of Scenario 1 due to inadequate training data, highlighting the limitations in KNN’s learning capacity despite its effectiveness as a classifier with generalization abilities for changed scenarios. MLP, AdaBoost, and Gaussian NB performed poorly, like Scenario 1, whereas Logistic Regression, Ridge, and DT performed optimally. As all cases included fault samples from the RES integration case in their training, there was a notable enhancement in the classification accuracy of all well performing classifiers compared to Scenario 2.
The f1-score of the studied classifiers, when trained with 5% RES integrated fault data inclusions, closely resembled the performance obtained in Scenario 1; hence, it is not included in Table 9. The f1-score for all fault types is detailed in Table 10 for 0.75% RES integrated fault data inclusions. The table presents the f1-score for RES integration cases 1 and 6. Analysis of the f1 score in the table concludes that SVM faces challenges in classifying non-ground faults, i.e., LL and LLL faults. However, SVM shows good performance for LLLG faults. Further, XGBoost, Random Forest (RF), and Extra Trees (ET) achieved testing accuracies surpassing 90% with a mere 0.75% inclusion of RES integrated fault data. Moreover, the performance of ET exceeded RF and XGBoost for the RES integration Case 1.
The ML classifiers’ learning ability for Scenario 3 has been analyzed by varying the fault data percentage of the RES integrated system during training. The percentages of new fault data included at each step during training are 0.25%, 0.5%, 1%, 2%, 4%, and 8%. The improvement in classification testing accuracy with the increasing percentage of new fault data in training has been plotted in Figure 11 using radar charts. A radar chart, also referred to as a spider plot or star plot, is a two-dimensional graphical method for representing multivariate data where each axis refers to variables having the same data length [70]. However, the arrangement and angles of the axes usually do not carry specific information. In Figure 11, the six axes represent the considered fault data inclusions (FDIs) of the RES integrated system used in training the ML models. Whereas the radial distance from the center represents the classification accuracy in percentage.
Learning trends of ML classifiers for different sizes of RES using radar chart: Testing accuracy (%) vs Percentage of new fault data inclusions (FDI) post RES integrations.
During the adaptability test for RES integration case 1, which entailed adding only a 10MW RES on Bus 7, ET and XGBoost achieved nearly 100% accuracy with just a 0.5% increase in new fault data inclusion (FDI) in their training, while DT and Bagging demonstrated matching classification performance with 1% increase in new FDI in their training. From case 2 onwards, where RES is added at the two selected buses, it was observed that RF, ET, and XGBoost achieved nearly 100% accuracy with just a 1% inclusion of new fault data in their training. Additionally, the plots indicate that the performance of other models also improves as the amount of RES integrated FDI in their training data increases. However, it can be concluded that RF, ET, and XGBoost classifiers are particularly adept at quickly learning and adapting to new scenarios with minimal changed topology data. Bagging classification accuracy reached nearly 100% with a 2% new FDI from case 2 onwards, while DT reached almost 100% accuracy with a 4% new FDI in training. The LR and Ridge classifier performance also improved with the inclusion of increased new fault data in their training. However, the accuracy could only reach 94% for LR and 98% for Ridge, even after adding 12% new fault data to their training, which is not shown in the plots. Combining predictions, leveraging parallel computing capabilities, and robustness to overfitting are key factors contributing to the adaptability of RF and ET models. Additionally, the gradient boosting and regularization features of XGBoost further enhance its capacity to quickly adapt to new fault data scenarios for fault classification [71], [72]. The performance of SVM, AdaBoost, Gaussian NB, and MLP classifiers showed only little improvement; therefore, their results are not included in the plots.
2) Fault Localization
When analyzing all six RES integration cases for fault localization, the Mean Absolute Percentage Error (MAPE) obtained for ML models is listed in Table 11 for 8% RES integrated FD in their training. The results indicate that the BRR model performs the best among the other models, consistent with its performance in Scenario 1. Bagging, RF, RT, and ET performed satisfactorily. However, the performance of KNR and SVR remains poor. Thus, it can be concluded that the BRR model learns most effectively with minimal data under such structural changes in the power networks.
The adaptability analysis of ML models in locating transmission line faults was conducted by gradually increasing the percentage of new fault data samples mixed with conventional power system fault data for training the ML models. Thus, all the regression models were trained for all RES integration cases by incorporating 2%, 4%, 8%, and 12% RES integrated fault data samples. This was done to analyze which regression model learns quickly after RES integration and to understand the effect of integrated RES size (MW) on the adaptability performance of studied regression models. The test results for Bayesian ridge regression, Bagging, RF, ET, and RT are presented in Figures 12 to 15 for all six cases using radar charts. In Figures 12 to 15, the axes represent the percentage fault data inclusions (FDIs) of the RES integrated system used in the training of ML models, while the distance from the center represents the MAPE of fault localization. The learning trends of the studied models for the considered transmission lines (MAPE versus percentage of new FDI shown in radar charts, figure 12 to 15) depict that BRR exhibits the best adaptability compared to other models.
Case wise adaptability trend for Line 4–5 using radar chart: MAPE vs. Percentage fault data inclusion.
Case wise adaptability trend for Line 7–5 using radar chart: MAPE vs. Percentage fault data inclusion.
Case wise adaptability trend for Line 7–8 using radar chart: MAPE vs. Percentage fault data inclusion.
Case wise adaptability trend for Line 8–9 using radar chart: MAPE vs. Percentage fault data inclusion.
The MAPE for BRR reduces considerably with only 2% inclusion of new fault data in its training set, which is much lower than that of other models. Therefore, a separate plot for the adaptability trend of the BRR model has been presented in Figure 16. BRR reached performance saturation with only 2% of data; in contrast, Bagging, RF, and ET reached performance saturation at 8% inclusion of new fault data in their training sets. KNR showed only minor improvement even after including 12% new fault data in its training; hence, it is not shown in the plots. Thus, the Bayesian Ridge regressor demonstrates efficient transfer of learning with minimal inclusion of new data.
Learning trends of bayesian ridge regression model for different sizes RES: MAPE vs available fault data.
D. Result Analysis
A comprehensive analysis of the proposed scenarios has been conducted using fault datasets from the IEEE 9 Bus system and the RES integrated IEEE 9 Bus system. The analysis outcomes demonstrate that ML models perform effectively for power system fault classification and localization when sufficient data is available. Therefore, this research establishes that ML-based automatic fault diagnosis of transmission networks can be achieved using voltage and current measurements at transmission line buses. The test conducted in Scenario 1 revealed that conventional power system fault classification via DT, RF, ET, Bagging, and XGBoost models achieved nearly 100% accuracy. The KNN classifier demonstrated satisfactory performance. Additionally, BRR emerged as a superior model for fault localization compared to Bagging, RF, DT, ET, and SVR regression models. However, KNN computes the average of target values in regression tasks. Consequently, its performance suffers in localization tasks. The performance of SVM was unsatisfactory in terms of both classification and localization. The inadequate performance of SVM in both classification and localization can be mainly ascribed to its extensive dependence on feature selection and transformation techniques, while multi-class classification introduces further challenges.
In Scenario 2, when ML models trained on conventional power system fault data were tested for RES integrated system faults, significant degradation in fault classification and localization performance was observed across all ML models. This study confirms that even integrating a 10MW RES plant into an IEEE 9 bus system, with a total load capacity of 315MW and 115MVAR, brings about significant changes in the fault signature. Additionally, as the level of RES integration increases, the classification accuracy of all ML models exhibits a diminishing trend. Furthermore, a marked decrease in accuracy is observed with an increase in the number of lines directly connected to the BUS integrating RES, i.e., if the incoming RES is added at a new location (BUS). The KNN classification model demonstrated relatively better results for fault classification, making it a reliable average classifier for unseen event scenarios. The accuracy of the KNN classifier remains relatively stable due to its reliance on the similarity principle. Its non-parametric approach requiring no training phase makes it suitable for changed systems [12]. Hence, without proper study, ML models cannot be entirely relied upon for power system fault diagnosis under such structural changes.
Finally, in Scenario 3, when the availability of fault data is increased gradually, ML models regain their initial performance, as demonstrated in Scenario 1. Extensive analysis in Scenario 3 identified XGBoost, RF, and ET as the fastest learning classification models, as shown by the trend plots of testing accuracy versus data availability. Although KNN’s performance improved compared to Scenario 2, it failed to reach the baseline performance of Scenario 1 due to inadequate training data, highlighting the limitations in KNN’s learning capacity despite its effectiveness as a classifier with generalization abilities for changed scenarios. The intrinsic capability for incremental learning by the XGBoost algorithm supports XGBoost results outcomes. In incremental learning, new models are trained on new data while retaining the knowledge learned from previous data [73]. XGBoost supports incremental learning primarily due to its boosting algorithm and the nature of its implementation. Since XGBoost builds trees sequentially, it’s conducive to updating the model with new data. XGBoost is highly optimized for speed and efficiency [36]. The design and implementation of XGBoost make it well suited for incremental learning tasks, allowing it to efficiently adapt to new data while leveraging the knowledge gained from previous iterations [73]. Therefore, it is suitable for classifying transmission line faults post RES integrations with gradual fault data availability. Similarly, Bayesian ridge regression emerged as the fastest learning model for fault localization, as indicated by the trend plots of MAPE versus data availability. BRR is based on a Bayesian approach to linear regression. In Bayesian inference, the prior beliefs about the parameters of the model get updated as new data becomes available. This incremental updating of beliefs aligns well with the concept of incremental learning required post RES integrations. BRR naturally incorporates regularization to prevent overfitting. By updating the posterior distribution of the parameters with new data, the regularization term can adapt to the changing characteristics of the dataset, ensuring that the model remains regularized even during incremental learning. Overall, the Bayesian framework, efficient computation, regularization, and memory efficiency make BRR well suited for incremental learning tasks [74]. Thus, the proposed BRR model can be utilized for automatic fault localization of transmission network faults using voltage and current measurements at transmission line buses post RES integrations with gradual fault data availability.
Conclusion and Future Work
This paper presents a three-facet comprehensive performance analysis of potential ML models for the classification and location estimation of transmission network faults. At first, several baseline and proposed models were tested for the classification and localization of conventional transmission network faults. Among various models tested for fault classification, the proposed XGBoost and ET ensemble methods and RF gave 100% accuracy. However, XGBoost with multithreading parallel computing features is faster than RF and doesn’t require normalization of data obtained from PMUs or fault data loggers; hence, it is suitable for transmission lines fault diagnosis. The Bayesian Ridge Regression proposed in the paper outperformed other models in the fault localization analysis.
Next, the impact analysis of different size RES integrations on the performance of these models showed that the classification accuracy of all tested classifiers except KNN falls into the 40-50% range, making them unsuitable for such scenarios. Fault localization results of all regressors are also significantly impacted by new RES integration, including KNR, and degradation increases with an increase in the size of integrated RES. KNN’s performance falls from the previously obtained accuracy of 98.86% in Scenario 1 to 90.67% on 10MW RES integration at bus 7 and 80.39% on 30MW RES integrations at both the optimal integrational buses. Thus, the KNN model can be relied upon for scenarios where system topology has changed, and no-fault data is available for the changed transmission network.
Finally, the adaptability testing of these models was performed considering the practical case of gradual fault data availability over time with different sizes of RES integration. XGBoost, ET, and RF were found to be the best performers, giving 100% classification accuracy with just 0.5% new scenario’s fault samples in their training on 10MW RES integration at bus 7 and 2% new fault samples of the maximum allowed 30MW RES placed at bus 7 and 5. Similarly, Bayesian ridge regression outperformed other compared regression models in the adaptability testing requiring only 2% changed network fault samples to give 1% MAPE. Although when no fault data is available for changed topology, it could not fit the model, however, as soon as it gets a few data of changed topology, it can very quickly learn the changed fault patterns and, hence, can be used for practical power system fault localization.
Similar studies can be performed for ML-based fault diagnosis of distribution systems, microgrids, etc., under DG integration scenarios. Advanced neural network-based transfer learning is the focus of the proposed study’s future work. Further, the methodology adopted can also be implemented for other engineering systems, e.g., electrical motor fault diagnosis undergoing structure changes due to integrating new system components or ratings.