I. Introduction
The amount of data that organizations need to deal with is growing exponentially as technologies such as Internet of Things (IoT) and Artificial Intelligence (AI) evolve. As a result, organizations require faster and more dependable infrastructure to process and analyze this data. This encouraged research in the field of High Performance Computing using various standards and Application Programming Interfaces (APIs) such as Message Passing Interface (MPI), Open Multi-Processing (OpenMP), Compute Unified Device Architecture (CUDA) etc. Various applications for implementing HPC have been investigated over the last few decades. The power system analysis has emerged as a desirable application for HPC because the analysis becomes more compute intensive as the size of the electrical network grows. Power system analysis includes power flow, economic dispatch, contingency analysis, static and dynamic state estimation etc. Various research has been done to implement HPC for aforementioned problems. D. F. Rodriguez et al. [1] presented an algorithm based on the Newton-Raphson method to execute load flows using low-cost Embedded Computers (ECs). To optimize the convergence time of the NR method, a vectorization in the computation of the matrix, Jacobian matrix, and power injected flows is developed to execute these tasks in parallel using GPU. In [2],an unique method for parallelizing several Newton-Raphson power flow computations on the CPU and with GPU acceleration is put forward. X. Su et al. [3] developed a sophisticated GPU-CPU based parallel power flow approach by adopting vectorization parallelization and sparse techniques. GPU-based power flow analysis carried out in [4] and [5]. Authors in [6] implemented parallel power flow based on OpenMP. The developed algorithm was tested on 118, 200, 300 and 1354 bus networks. The speedup achieved is three times the sequential computation of power flow. A parallel power flow algorithm is implemented in [7] using OpenMP on CPU and GPU for networks having 4 to 2383 buses. Maximum speedup achieved is 45.2 for Gauss-Seidel method and 17.8 for Newton-Raphson method.