I. Introduction
Large language models (LLMs), which are trained on a large number of data sets, have strong language understanding and generation capabilities [1]. Utilizing deep learning technologies, these models are capable of comprehending and producing human language, significantly propelling advancements in machine translation, question-answering systems, and text summarization. In 201 7, Google researchers Vaswani and other researchers proposed the Transformer model based on the self-attention mechanism that can handle long-range dependencies more efficiently [1]. The transformer model’ s debut laid the groundwork for creating extensive pre-trained language models. In 2018, Devlin et al. unveiled BERT (Bidirectional Encoder Representations from Transformers), a pre- trained model for language representation that leverages the transformer architecture. It captures the context of language more effectively through bidirectional training [2]. The emergence of the BERT model has not only achieved unprecedented results in various NLP tasks but also laid the foundation for more advanced subsequent research and model development. Since the emergence of BERT and its subsequent variations, large-scale language models have emerged as crucial instruments in both research and practical applications within the field of natural language processing.