Abstract:
The use of deep learning techniques to build transformer language models such as SciBERT and GPT3 have transformed the natural language technology (NLT) landscape. These ...Show MoreMetadata
Abstract:
The use of deep learning techniques to build transformer language models such as SciBERT and GPT3 have transformed the natural language technology (NLT) landscape. These new NLTs are being used in speech to text and vice versa, automated text classification, sentiment analysis, topic modeling, text summarization, and cognitive assistants. While Earth science has no shortage of unstructured data such as journal and conference papers, little efforts have focused on harnessing NLTs for knowledge extraction and supporting the scientific process. This paper surveys the use of language models in different science. BERT-E, a new Earth science-specific language model, is presented. BERT-E is generated using a transfer learning solution. A language model that has already been trained for general Science (SciBERT) is fine-tuned using abstracts and full text extracted from various Earth science-related articles. A downstream keywords classification application is used for evaluation, and the use of BERT-E shows improved performance. The need to develop a robust set of benchmarks in evaluating the language model such as BERT-E is discussed. Finally, example applications are presented to inspire additional ideas for applications using domain-specific language models.
Date of Conference: 17-22 July 2022
Date Added to IEEE Xplore: 28 September 2022
ISBN Information:
ISSN Information:
NASA/MSFC
The University of Alabama in Huntsville
The University of Alabama in Huntsville
The University of Alabama in Huntsville
NASA/MSFC
NASA/MSFC
The University of Alabama in Huntsville
The University of Alabama in Huntsville
The University of Alabama in Huntsville
NASA/MSFC