Nomenclature
AbbreviationExpansionABBREVIATIONS | |
DNN | Deep neural network |
DRL | Deep reinforcement learning |
DDPG | Deep deterministic policy gradient |
SAC | Soft actor-critic |
Introduction
With the rapid growth of clean energy sources (wind turbines and PV panels) and their integration with the power system, hybrid AC-DC power networks are becoming popular in modern power systems. Due to the integration of AC/DC sources and AC/DC loads, a DC subgrid has recently been developed and tied to the AC bus via one or multiple bidirectional DC/AC interlinking converters [1]. In a subgrid, primary energy resources and local load are commonly connected to the electrical network via DC-DC/AC-DC converters [2].
Power converters consist of interconnected individual circuit components (e.g., resistors, capacitors, inductors, diodes, and switching devices), making their design quite complex and prone to inefficiencies. To increase the energy conversion efficiency and stability in modern power systems, power converters are often optimally designed to achieve different operation goals [3]–[6]. For instance, the authors in [4] have proposed a novel soft-switching integrated boost DC-DC converter for PV power systems to reduce loss and improve power conversion efficiency. The authors in [5] have developed an optimization-based design method for high-efficiency power converters that considers the power factor and limitations of current ripples. The authors in [6] have developed a multi-objective design for buck converters to improve the overall performance of the output LC filter.
To achieve the design goals, many optimization algorithms have also been proposed and widely used for specific problem, including single- or multiple-objective problems [7], [8]. The simulated annealing algorithm has been utilized for parameter identification of an energetic hysteresis model [9]. A population-based search algorithm, the bee colony optimization algorithm, has been used for a Sheppard-Taylor power factor correction converter [10]. A genetic algorithm and particle swarm optimization algorithm are proposed in [11] for the total power loss optimization of a resonant converter. A novel optimal design for medium-frequency transformers using a multi-objective genetic algorithm has been presented in [12] to determine the best transformer for a given power converter topology.
However, these optimization algorithms [8]–[12] have often been developed based on a model-based approach. Therefore, a complex model might be required, and it may be impossible to develop detailed models for complicated power converter topologies. Therefore, several recent studies have applied self-learning models, e.g., reinforcement learning (RL), for decision-making problems of power converter design [13]–[18]. In RL, there is no requirement for any model of the dynamic environment, and the learning agent evaluates its selected actions based on penalties or rewards obtained by interacting with the dynamic environment [19], [20]. A Markov decision process (MDP) model and a deep Q network-based optimization model have been developed in [15] for the stabilization issue of power converters. A Q-learning algorithm has been applied in [16] to train an agent offline for a DC-DC converter, and then the trained agent provides control decisions online during real-time operation to reduce the power losses. A deep deterministic policy gradient (DDPG) algorithm has been applied in [17] to minimize power losses in dual active bridge (DAB) converters. A DDPG-based optimization model has been proposed in [18] to adjust the active disturbance rejection controller for the voltage regulation of DC-DC buck converters.
Most of these studies have focused on solving optimal parameter design problems using Q-learning, deep Q-learning [15], [16], or DDPG [17], [18]. In these methods, it is often difficult to tune hyperparameters to achieve global optimization due to brittle convergence properties and high sample complexity [21], [22]. Using Q learning and deep Q network-based models also present challenges to handling a large or continuous action space [23], [24]. In addition, in the optimal parameter design for power converters, thousands of simulations are required for the optimization process. This process often takes many hours to determine the optimal solution with high computation cost. Therefore, an approximation model to replace dynamic simulation modeling is necessary to quickly map the input information onto the output measurement.
To overcome the above-mentioned problems, a soft actor-critic (SAC)-based optimization algorithm is proposed in this study to optimize the design parameters of power converters. First, a DNN-based surrogate model is developed and is then trained by an offline process. Then, the trained agent can interact with the surrogate model to quickly estimate the power efficiency, without requiring any simulation. This can significantly reduce the learning time for optimal parameter design of power converters. Furthermore, the proposed SAC-based optimization model inherits several desirable traits, such as the ability to handle continuous state and action spaces, accelerate and stabilize the learning process, and prevent trapping in local optima by using entropy-regularized RL. The performance of the proposed method is also evaluated using over ten DC-DC power converter topologies. Finally, the performance of the proposed method is compared with different parameter design algorithms for power converters. The major contributions of the paper are as follows.
An SAC-based optimization strategy is proposed to optimize the parameter design for power converters using a deep neural network-based surrogate model. The use of entropy-regularized reinforcement learning results in an accelerated and stabilized learning process.
Deep neural network-based surrogate models are developed to replace dynamic simulation models. This greatly reduces the learning time for the DRL agent to find the optimal solution.
Implementation of the state-of-the-art RL algorithm can achieve a better cumulative reward and can determine better solutions for the parameter design problems. The proposed method also shows good performance with over ten test topologies.
The rest of the paper is organized as follows. Section II discusses the development of a DNN-based surrogate model for power converters. Section III presents the proposed DRL-based optimization algorithm for component sizing of power converters. Section IV shows the effectiveness of the proposed method with various power converter topologies. Finally, conclusions are drawn in Section V.
Surrogate Model for Power Converters
A. Physical Model and Surrogate Model
1) Physical/Dynamic Model
Figure 1(a) shows a popular power converter topology used in power systems (i.e., buck converter). This topology consists of a DC voltage source, two switches, a capacitor, and a resistor. Figure 1(b) describes a block diagram to calculate the power efficiency using input/output voltage, input/output current, and power losses. In the design process of power converters, the values of input parameters were varied within a predefined boundary and power efficiency was measured by carrying out Simulink simulations. However, this process might require hundreds of thousands of simulations for a single user’s desired outputs. Therefore, we collect simulation data and use it to train a surrogate model to reduce the design cycle of power converters, as shown in Figure 2.
2) Surrogate Model
The surrogate model is an approach used when the outcomes of a model cannot be directly measured easily or require a long time to complete a simulation [25], [26]. The surrogate model is developed in those cases to mimic the behaviors of the physical or simulation models as closely as possible [26]. This will greatly reduce the time spent in optimizing the design and analyzing the sensitivity of a model that requires thousands of simulations to carry out.
Figure 2 shows the process of developing a surrogate model. This process is divided into two main stages, (i) dataset collection and (ii) training and testing of the surrogate model. The dataset is generated by performing simulations with different setting values of the input parameters
B. DNN-Based Surrogate Models for Power Converters
In this study, a DNN-based surrogate model is used to replace the simulation model of power converters. Firstly, a large number of simulations are performed using MATLAB/Simulink with different combinations of input parameters, such as resistance, inductance, switching frequency, and so on. This dataset is then split into training and test sets for training the surrogate model. After the training process, the surrogate model can provide an estimation of the power efficiency that is close to the Simulink simulation results using the same input information. The training and testing process of the DNN-based surrogate models of the power converter is summarized in Figure 3.
Table 1 shows ten common parameters for a typical power converter (e.g., buck converter). This table provides setting-values boundaries for different components like inductors, capacitors, resistors, and switching components. The number of converter parameters determines the number of nodes in the input layer or the size of the input layer. For instance, 10 input parameters are considered for a buck converter, so the input layer size is 10 nodes. The different values of input parameters are fed into the input layers of the DNN-based surrogate model. The output of the surrogate model represents the power efficiency of the converter and the output size is always 1. To determine the optimal parameters for DNN, a sensitivity analysis was performed with many different network configurations. Selected configuration not only avoids underfitting and overfitting issues but also minimizes the training/testing loss. Detailed parameters for the DNN-based surrogate model are tabulated in Table 2.
DRL-Based Component Sizing of Power Converters
In this section, a detailed DRL-based optimization model is presented for optimal component sizing of the power converter. In power converter design, there are two major steps including (i) designing the converter topology and (ii) optimal component sizing of a given topology. This study mainly focuses on step 2 to solve the component sizing problem. Since a power converter is composed of various components, different sizes of components might significantly affect the performance of the power converter (e.g., power efficiency). Therefore, optimal component sizing for power converters is an important task to determine the suitable size of all components in a power converter for a given design objective, such as reducing design costs or increasing power efficiency.
A. Component Sizing As a Markov Decision Process
Here, we formulate the component sizing problem of power converter as an MDP which consists of four essential elements: state, action, state transition, and reward, as follows [19], [20].
State (
): A state represents the information about setting values of input parameters of power converter, as shown in (1). The value of state parameters needs to be normalized for the convergence and stability of the training process, as given in (2).s_{t} Action (
): The agent observes the system statea_{t} at time t and selects an action, which represents the change in input design parameters, as shown in (3).s_{t} State transition: The state’s transition from time
tot is shown ast+{\textit {1}} and the state transition mainly depends on the action selection.s_{t+{\textit {1}}} = f(s_{t}, a_{t}) Reward (
): A reward function represents the improvement in power efficiency at the current state compared with the initial state, as shown in (4), and the learning agent attempts to maximize the cumulative reward during the training process. The reward at each state is quickly determined using the surrogate model.r_{t} \begin{align*} s_{t}=&\left [{ P_{m,t}^{Set},\ldots, P_{M,t}^{Set} }\right]\quad \forall t\in T\tag{1}\\ s_{t}\left [{ k }\right]=&\frac {s_{t}\left [{ k }\right]-s_{t}^{min}\left [{ k }\right]}{s_{t}^{max}\left [{ k }\right]-s_{t}^{min}\left [{ k }\right]}\quad \forall k\in K,~t\in T\tag{2}\\ a_{t}=&\{\Delta P_{m,t}^{Set},\ldots,\Delta P_{M,t}^{Set}\}\tag{3}\\ r_{t}=&SM\left ({s_{t}+a_{t} }\right)-SM\left ({s_{inital} }\right)\quad \forall t\in T\tag{4}\end{align*} View Source\begin{align*} s_{t}=&\left [{ P_{m,t}^{Set},\ldots, P_{M,t}^{Set} }\right]\quad \forall t\in T\tag{1}\\ s_{t}\left [{ k }\right]=&\frac {s_{t}\left [{ k }\right]-s_{t}^{min}\left [{ k }\right]}{s_{t}^{max}\left [{ k }\right]-s_{t}^{min}\left [{ k }\right]}\quad \forall k\in K,~t\in T\tag{2}\\ a_{t}=&\{\Delta P_{m,t}^{Set},\ldots,\Delta P_{M,t}^{Set}\}\tag{3}\\ r_{t}=&SM\left ({s_{t}+a_{t} }\right)-SM\left ({s_{inital} }\right)\quad \forall t\in T\tag{4}\end{align*}
A SAC-based optimization model is implemented to determine the optimal components’ size of power converter.
B. Sac-Based Optimization Model
In this study, soft actor-critic (SAC) is used to optimally determine the component sizing of power converters due to its fast learning and stable training process. The main idea behind SAC is to use a modified RL objective by adding entropy regularization, as given in (5) [22], [27].\begin{equation*} J\left ({\theta }\right)=\sum \limits _{t\in T} \mathbb {E}_{\left ({s_{t},a_{t} }\right)\sim \rho _{\pi _{\theta }}}\left [{ r(s_{t},a_{t})+\alpha \mathcal {H}(\pi _{\theta }(.\vert s_{t})) }\right]\tag{5}\end{equation*}
where:
AbbreviationExpansionis the entropy measure | |
is a weight factor (or temperature parameter) representing the importance of the entropy term |
The entropy term indicates how unpredictable a random variable is. This means that high entropy leads to the value of the variable being more unpredictable. Therefore, high entropy is important to explicitly encourage exploration, and the policy assigns the same probability to actions having the same Q-values. This ensures that the learning agent does not collapse into repeatedly selecting the same action during the learning process.
In SAC, the learning agent learns a policy network \begin{align*} L\left ({\phi _{i},\mathcal {D} }\right)=&\mathop {E}\limits _{(s,a,r,s',d)\sim \mathcal {D}}\left [{ \left ({Q_{\phi _{i}}\left ({s,a }\right)-y\left ({r,s^{\prime },d }\right) }\right)^{2} }\right]\tag{6}\\ y\left ({r,s^{\prime },d }\right)=&r+\gamma \left ({1-d }\right)\Biggl ({{MinQ}_{\phi _{tar,i}}\left ({s^{\prime },\tilde {a}^{\prime } }\right)} \\&{-\,\alpha log\pi _{\theta }\left ({\tilde {a}'\vert s^{\prime } }\right) }\Biggr )\quad i=1,2\tag{7}\\ G_{\phi }=&\nabla _{\phi _{i}}\frac {1}{\vert B\vert }\sum \limits _{sample~in~B}\left ({Q_{\phi _{i}}\left ({s,a }\right)-y\left ({r,s^{\prime },d }\right) }\right)^{2} \\&i=1,2\tag{8}\end{align*}
The learning agent tries to maximize the sum of the expected future returns and expected future entropy. This helps the learning agent to accelerate the learning process and avoid trapping in local optima. The policy network is updated by one step of gradient ascent using (9). Finally, the target network is updated by soft copy, as given in (10). This helps the target network to change slowly and ensures a stable learning process.\begin{align*} G_{\theta }=&\nabla _{\theta }\frac {1}{\vert B\vert }\sum \limits _{sample~in~B} \Biggl ({{MinQ}_{\phi _{i}}\left ({s,\tilde {a}_{\theta }(s) }\right)} \\&{-\,\alpha log\pi _{\theta }\left ({\tilde {a}_{\theta }(s)\vert s }\right) }\Biggr )\quad i=1,2\tag{9}\\ {\phi }_{tar,i}\leftarrow&\rho {\phi }_{tar,i}+\left ({1-\rho }\right)\phi _{i}\quad i=1,2\tag{10}\end{align*}
The optimization algorithm and learning process are presented in detail in the following section.
C. Optimization Algorithm and Learning Processes
The optimization process for maximizing power efficiency is shown in Figure 4 with two major learning models, including training a surrogate model and the optimal parameter design of power converters.
Firstly, a DNN-based surrogate model is trained to mimic the behavior of a given power converter with a large dataset. This dataset is collected by carrying out a Simulink simulation, where input combinations are randomly selected and measure a simulation outcome (i.e. power efficiency). A DNN with optimal weights is used to predict the power efficiency for the optimal components sizing of the power converter.
A DRL agent is developed to optimize the component size of a power converter using the soft actor-critic algorithm. Firstly, five different deep neural networks are initialized for the learning process, including a policy network, two Q networks, and two target networks. A memory of size
After training with a large number of episodes, the learning agent can use policy parameters to determine the optimal setting values for components of power converters with any initial input parameters.
Simulation Results
A. Surrogate Model: Training and Testing Processes
In this section, the detailed training and evaluation process of a DNN-based surrogate model is presented for a specific case (i.e., buck topology). The same process is performed for ten other common topologies, including boost, buck-boost, LLC, DAB, flyback, forward, push-pull, half-bridge, full-bridge, and switched-capacitor converters.
1) Buck Converter
To show the performance of the proposed surrogate model, a simple buck converter is used in this section. As mentioned earlier, a large dataset is required to train and test the surrogate model. In this study, the dataset is generated by carrying out Simulink simulations with different setting values of input information and measuring power efficiency in each case. We consider ten different parameters for the buck converter, as given in Table 1. Therefore, the input and output size of the DNN-based surrogate model are as follows.
Input layer: 10 nodes representing 10 input parameters
Output layer: 1 node representing the power efficiency
Other parameters of DNN is optimally selected as given in Table 2. Firstly, a sensitivity analysis is performed to determine a suitable size for the dataset. The DNN-based surrogate model is trained with different sizes of datasets, and the average error on the test set is summarized in Figure 5. It can be observed that with a larger dataset, the performance of the surrogate model is improved. However, generating a large dataset is time-consuming, so a trade-off between the size of the dataset and the prediction error is considered. A suitable size for the dataset is about 30,000 data points, and the error of the proposed model is also within an acceptable range. Furthermore, with a larger dataset (i.e. over 30,000), the performance of the surrogate model does not show significant improvement. Thus, a dataset of 30,000 is used for all DNN-based surrogate models in this study. The detailed dataset for a buck converter is shown in Figure 6. With random combinations of 10 input parameters, the power efficiency ranges from [0.35, 1.00].
The dataset is split into the train, validate, and test sets composed of 70%, 20%, and 10% of the dataset respectively. Figure 7 shows the training process for the DNN-based surrogate model. It can be observed that the training and validation losses have converged after 500 training epochs, as shown in Figure 7(a). The well-trained parameters of the DNN are saved for power efficiency prediction. To represent the accuracy of the proposed surrogate model, 100 data points are randomly drawn from the test set and fed to the model. The actual and predicted power efficiencies are shown in Figure 7(b). The detailed error for each data sample is shown in Figure 7(c). It can be observed that the proposed DNN-based surrogate model can estimate the output with acceptable errors. The average error for 100 random test samples is less than 0.005, where the error of each sample and average error of a test set are determined by (11), (12), respectively.\begin{align*} e_{k}=&\left |{ \eta _{act,k}-\eta _{pred,k} }\right |\tag{11}\\ e_{avg}=&\frac {1}{N}\sum \limits _{k=1}^{N} \left |{ \eta _{act,k}-\eta _{pred,k} }\right |\tag{12}\end{align*}
where:
AbbreviationExpansionactual power efficiency using Simulink simulation | |
predicted power efficiency using surrogate model | |
average prediction error over |
Finally, the performance of the proposed surrogate model is summarized in Figure 8. Figure 8(a) shows the prediction error with the entire available dataset, while Figure 8(b) shows the prediction error with only new data (i.e., test sets). It can be observed that most test data have a rather small prediction error in [−0.03, 0.03].
2) Other Power Converters
As mentioned earlier, the proposed method is tested with 11 well-known converter topologies. Table 3 summarizes the prediction error for DNN-based surrogate models for buck, boost, buck-boost, LLC, DAB, flyback, forward, push-pull, half-bridge, full-bridge, and switched-capacitor converters. It can be observed that all proposed surrogate models are able to provide good performance. Most predictions on the test data have small errors, between 0 to 0.03.
B. DRL-Based Component Sizing of Power Converters
In the previous section, DNN-based surrogate models were developed and tested with various power converter topologies to predict power efficiency with high accuracy and also to reduce the learning time for an optimal components sizing process. This section determines the optimal size of components in power converters that use different deep reinforcement algorithms for maximizing the power efficiency.
1) Three Major Input Parameters
Firstly, the optimal number of parameters needs to be specified. In the first test case, only 3 parameters are selected for the optimization process, including switching frequency, inductance, and capacitance. To show the effectiveness of the proposed method, the performance of both DDPG-based and SAC-based optimization models is presented in this study. The training process with both algorithms is shown in Figure 9 for a buck converter. Figure 9(a) shows the total episode reward and Figure 9(b) shows the average rewards with 500 training episodes. It can be observed that both algorithms converge to a close value of total episode rewards after 500 training episodes.
Since both algorithms converge to a similar total reward, the optimal solutions using 2 algorithms are also similar. In addition, the brute force method is also used to search the optimal parameters by performing Simulink simulation with a large number of input combinations. Optimal parameters determined using different methods are presented as follows:
Brute force: Optimail parameters [F, L, C] are [128889, 0.0001, 0.0005] and optimal efficiency is 0.975592.
DDPG: Optimail parameters [F, L, C] are [158964.0, 0.0001, 0.0005] and optimal efficiency is 0.9774.
SAC: Optimal parameters [F, L, C] are [148964.0, 0.0001, 0.000185] and optimal power efficiency is 0.9775.
Other parameters are kept constant at default value as follows.
MOSFET resistance (
): 5e-R_{ds} 3~\Omega Diode voltage (
): 0.8 VV_{d} Rising time (
): 4e-9 st_{r} Falling time (
): 6e-9 st_{f} Input voltage (
): 48 VV_{in} Output voltage (
): 5 VV_{out} Output power (
): 20 WP_{out}
Based on the comparison of simulation results of 3 different methods, it can be observed that the proposed method can find the optimal parameters with better power efficiency.
2) Increased Number of Input Parameters
In previous case, only 3 parameters are kept constant according to user requirements, including input voltage, output voltage, and output power at 48 V, 5 V, and 20 W, respectively. All remaining parameters are optimally determined using different optimization algorithms in this section. The training process is shown in Figure 10 with 500 episodes. Figure 10(a) shows the total episode rewards, while Figure 10(b) shows the average value of rewards over the learning process. It can be observed that both algorithms converge at high total rewards; however, the SAC-based optimization model can find an optimal solution, which is much better than using the DDPG method.
Table 4 summarizes the results obtained from the three different methods, with detailed information for the setting values of the parameters. The proposed method and brute force simulation can determine better parameters with higher power efficiency than the DDPG-based model. However, the brute force method often requires doing a lot of simulation and this can require a high computational cost.
Similar to the previous section, the proposed method is also tested with different power converter topologies. The optimal solutions for over ten different topologies are presented in Table 5. It can be observed that the proposed method can find optimal solutions close to those of the brute force method, where a large number of simulations are carried out with random input combinations.
3) Computation Cost Comparison
Table 6 summarizes the computation cost for determining the optimal parameters for power converters. In most studies available in the literature on optimal parameter design, the optimization model typically measures the response from the dynamic environment by performing an actual simulation. Therefore, it often prolongs the learning process considerably. In Table 6, the buck converter is presented as an example, with detailed information for the offline/online learning process. It can be seen that an offline learning process is required to train the surrogate model and it may take about 1 day. However, this is just an offline learning process and the well-trained surrogate model is used to accelerate the online learning process to determine component sizes (~30 minutes), whereas without a surrogate model, the learning agent must carry out a large number of simulations. This significantly increases the online learning time (~18 hours). Moreover, this online learning process is required every time if the user information changes (e.g., input voltage, output voltage, output power).
Conclusion
In this paper, a soft actor-critic (SAC)-based optimization strategy using a deep neural network-based surrogate model is proposed to optimize the design parameters for power converters. Firstly, a DNN-based surrogate model is trained for each power converter with a large dataset. After the offline training process, the online training time is significantly reduced to only about 30 minutes with any change in user settings, whereas it may take up to 18 hours to determine the optimal solution with each change in user settings without the surrogate model. In addition, the comparison of different optimization algorithms shows that the SAC-based models can help the learning agent converge quickly to the optimal solution. The results obtained using the SAC-based optimization model are acceptable and are similar to the brute force method, i.e., carrying out a large number of simulations with random input combinations.
ACKNOWLEDGMENT
The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.