I. Overview of Pre-Trained Language Models
The pre-training language model is currently the best type of language model for natural language modeling. In early 2018, Peters et al. proposed the ELMo model, using the idea of a two-way language model and introducing a pre-training process. Afterwards, Radford et al. proposed the GPT model, using Transformer as the basic structure of the model, combined with ultra-large-scale unsupervised text data for pre-training, and achieved significant performance improvements in natural language generation tasks. Yu Tongrui and Jin Ran (2020) believe that pre-training technology refers to pre-designing the network structure, and inputting the encoded data into the network structure for training to increase the generalization ability of the model. Pre-training technology was originally proposed for problems in the image field (such as Res [NET], VGG). Because of its good effect, related technologies have been applied to the NLP field. According to the historical development sequence of the pre-training model, the development of pre-training technology is mainly divided into two stages, namely the traditional pre-training model stage based on probability statistics and the pre-training model stage based on deep learning. Chen Deguang, Ma Jinlin, etc. (2021)believe that both neural network pre-training techniques and traditional pre-training techniques need to pre-process the corpus. Specifically, pre-processing is to clean the original corpus(including removing blanks and removing invalid tags), Removal of symbols and pause words, document segmentation, basic error correction, coding conversion, etc.), word segmentation (only available for independent languages similar to Chinese), and standardization operations, so as to transform the corpus into a machine-recognizable language. Design in the text