Power Text Data Preprocessing of Power Grid Infrastructure Project based on Skip-gram Model | IEEE Conference Publication | IEEE Xplore

Power Text Data Preprocessing of Power Grid Infrastructure Project based on Skip-gram Model


Abstract:

The power grid infrastructure project is a large-scale and long-period instance which often involves various subjects. It would produce a large amount of data, serving as...Show More

Abstract:

The power grid infrastructure project is a large-scale and long-period instance which often involves various subjects. It would produce a large amount of data, serving as an important original data source of the operating maintenance and asset management systems in power supply enterprises. However, the manpower analysis fails to deal with unstructured natural text language data as well as nonstandard semi-structured tabular data. To address this issue, the deep analysis on the data with different forms is first conducted based on the power grid infrastructure project. Then, a data cleaning technique is used to eliminate the noise in the original low-quality data. Finally, a skip-gram model is built to convert the text data into a word embedding vector form. The well-preprocessed data contains contextual semantic information which is more suitable for data mining. Extensive simulation experiments clearly demonstrate the effectiveness of the proposed method.
Date of Conference: 12-14 May 2023
Date Added to IEEE Xplore: 10 July 2023
ISBN Information:
Conference Location: Hefei, China
References is not available for this document.

I. Introduction

The power grid infrastructure project usually delivers various features [1], e.g, large scale, long period, complex technology, cascaded stages, to name a few. Throughout the design, construction and acceptance check processes, there exists a vast number of multi-format power transmission project data derived from different sources. The involved data can be simply placed in two categories [2]. (1) Structured Data: The kind of data can be collected from design drawings, equipment nameplates and closeout drawings. It consists of multi-class environment attribute data and multi-dimensional geographic information data with different scales. Under unified design standards, this data is used for the digital loading and visual expression of physical characteristics and functional properties of the power transmission project. (2) Semi-structured and Unstructured Data: The kind of data is usually acquired from various design specifications, equipment test reports, equipment lists, etc. By stored in EXCEL, WORD, PDF and other formats, the text data is filled with useful information which is relevant to the power grid topology, asset and equipment. At the other end, it also exists in the inspection, dispatching and finance systems, such as manufacturer, project cost, etc. Compared with the structured data, this kind of data lacks unified design standards or formats, thereby it can be hardly stored in computers. In practice, it's frequently used for the service personnel working.

Select All
1.
X Peng, D Deng, S Cheng et al., "Key technologies of electric power big data and its application prospects in smart grid[J]", Proceedings of the CSEE, vol. 35, no. 3, pp. 503-511, 2015.
2.
M Fang and L. HU, "High-efficiency large-scale power grid engineering data analysis based on preprocessing iterative method[J]", International Electronic Elements, vol. 30, no. 08, pp. 171-175, 2022.
3.
D He, N Kumar, S Zeadally et al., "Efficient and privacy-preserving data aggregation scheme for smart grid against internal adversaries[J]", IEEE Transactions on Smart Grid, vol. 8, no. 5, pp. 2411-2419, 2017.
4.
S Liu, S Zhang, L Zheng et al., "Fine analysis of power network state estimation based on big data[J]", Power Systems and Big Data, vol. 23, no. 07, pp. 9-15, 2020.
5.
B Xu, J Su, X Zhang et al., "Exploration of early warning and decision-making of power grid infrastructure projects based on big data analysis [C]", Management innovation practice of China's power enterprises, pp. 466-468, 2020.
6.
Y Zeng, X N Li and X Z. Liu, "Research on parallelization clustering algorithm for power communication big data[J]", Application of Electronic Technique, vol. 44, no. 05, pp. 1-4+24, 2018.
7.
S P. Liang, "Design and implementation of power equipment operation data analysis system based on big data[D]", North China Electric Power University, 2018.
8.
A Lazaridou, N T Pham and M. Baroni, "Combining language and vision with a multimodal skip-gram model[J]", arXiv preprint, 2015.
9.
S Bauskar, V Badole, P Jain et al., "Natural language processing based hybrid model for detecting fake news using content-based features and social features[J]", International Journal of Information Engineering and Electronic Business, vol. 10, no. 4, 2019.
10.
T A Patel, M Puppala, R O Ogunti et al., "Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods [J]", Cancer, vol. 123, no. 1, pp. 114-121, 2017.
11.
P Wang, J Xu, B Xu et al., "Semantic clustering and convolutional neural network for short text categorization[C]", Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 2, pp. 352-357, 2015.
12.
M Goudjil, M Koudil, M Bedda et al., "A novel active learning method using SVM for text classification[J]", International Journal of Automation and Computing, vol. 15, no. 3, pp. 290-298, 2018.
Contact IEEE to Subscribe

References

References is not available for this document.