Power Text Data Preprocessing of Power Grid Infrastructure Project based on Skip-gram Model | IEEE Conference Publication | IEEE Xplore

Power Text Data Preprocessing of Power Grid Infrastructure Project based on Skip-gram Model


Abstract:

The power grid infrastructure project is a large-scale and long-period instance which often involves various subjects. It would produce a large amount of data, serving as...Show More

Abstract:

The power grid infrastructure project is a large-scale and long-period instance which often involves various subjects. It would produce a large amount of data, serving as an important original data source of the operating maintenance and asset management systems in power supply enterprises. However, the manpower analysis fails to deal with unstructured natural text language data as well as nonstandard semi-structured tabular data. To address this issue, the deep analysis on the data with different forms is first conducted based on the power grid infrastructure project. Then, a data cleaning technique is used to eliminate the noise in the original low-quality data. Finally, a skip-gram model is built to convert the text data into a word embedding vector form. The well-preprocessed data contains contextual semantic information which is more suitable for data mining. Extensive simulation experiments clearly demonstrate the effectiveness of the proposed method.
Date of Conference: 12-14 May 2023
Date Added to IEEE Xplore: 10 July 2023
ISBN Information:
Conference Location: Hefei, China
State Grid Zhejiang Electric Power Co., Ltd., China
State Grid Zhejiang Electric Power Co., Ltd., China
State Grid Economic and Technological Research Institute Co., Ltd., China
Southeast University, China

I. Introduction

The power grid infrastructure project usually delivers various features [1], e.g, large scale, long period, complex technology, cascaded stages, to name a few. Throughout the design, construction and acceptance check processes, there exists a vast number of multi-format power transmission project data derived from different sources. The involved data can be simply placed in two categories [2]. (1) Structured Data: The kind of data can be collected from design drawings, equipment nameplates and closeout drawings. It consists of multi-class environment attribute data and multi-dimensional geographic information data with different scales. Under unified design standards, this data is used for the digital loading and visual expression of physical characteristics and functional properties of the power transmission project. (2) Semi-structured and Unstructured Data: The kind of data is usually acquired from various design specifications, equipment test reports, equipment lists, etc. By stored in EXCEL, WORD, PDF and other formats, the text data is filled with useful information which is relevant to the power grid topology, asset and equipment. At the other end, it also exists in the inspection, dispatching and finance systems, such as manufacturer, project cost, etc. Compared with the structured data, this kind of data lacks unified design standards or formats, thereby it can be hardly stored in computers. In practice, it's frequently used for the service personnel working.

State Grid Zhejiang Electric Power Co., Ltd., China
State Grid Zhejiang Electric Power Co., Ltd., China
State Grid Economic and Technological Research Institute Co., Ltd., China
Southeast University, China
Contact IEEE to Subscribe

References

References is not available for this document.