Loading [MathJax]/extensions/MathZoom.js
Language Model for Earth Science: Exploring Potential Downstream Applications as well as Current Challenges | IEEE Conference Publication | IEEE Xplore

Language Model for Earth Science: Exploring Potential Downstream Applications as well as Current Challenges


Abstract:

The use of deep learning techniques to build transformer language models such as SciBERT and GPT3 have transformed the natural language technology (NLT) landscape. These ...Show More

Abstract:

The use of deep learning techniques to build transformer language models such as SciBERT and GPT3 have transformed the natural language technology (NLT) landscape. These new NLTs are being used in speech to text and vice versa, automated text classification, sentiment analysis, topic modeling, text summarization, and cognitive assistants. While Earth science has no shortage of unstructured data such as journal and conference papers, little efforts have focused on harnessing NLTs for knowledge extraction and supporting the scientific process. This paper surveys the use of language models in different science. BERT-E, a new Earth science-specific language model, is presented. BERT-E is generated using a transfer learning solution. A language model that has already been trained for general Science (SciBERT) is fine-tuned using abstracts and full text extracted from various Earth science-related articles. A downstream keywords classification application is used for evaluation, and the use of BERT-E shows improved performance. The need to develop a robust set of benchmarks in evaluating the language model such as BERT-E is discussed. Finally, example applications are presented to inspire additional ideas for applications using domain-specific language models.
Date of Conference: 17-22 July 2022
Date Added to IEEE Xplore: 28 September 2022
ISBN Information:

ISSN Information:

Conference Location: Kuala Lumpur, Malaysia

1. Introduction

Unstructured (mainly text) data remains an untapped resource with tremendous potential insights. Recent advances in technology have transformed the Natural Language Technology (NLT) landscape, specifically, the use of deep learning techniques to build transformer language models such as SciB-ERT [1] and GPT3 [2]. These new NLTs are being adopted in the private industry to improve operations and services [3]. Most of the uses focus on improving customer services. Industry applications focus on the operational use of these technologies include translations from speech to text and vice versa, automated text classification, sentiment analysis for feedback, comments and financial reports, topic modeling, text summarization, personalization and cognitive assistants (contextual chatbots). These technologies also serve as the basis for insight engines. Insight engines combine search capabilities with artificial intelligence to deliver actionable insights from the full spectrum of content and data sourced within and external to an enterprise [3]. Earth science has no shortage of unstructured data. Almost all knowledge within Earth science gets published as journal or conference papers. However, limited efforts have focused on harnessing this potential resource for knowledge extraction and supporting the scientific process. This paper examines four aspects related to language models in Earth science. First, it surveys the use of Language models in different science areas to provide context. Second, it describes BERT-E, an Earth science-specific language model. Third, the challenges of developing robust benchmarks in evaluating the language model such as BERT-E are presented. Finally, the paper explores the use of BERT-E in some ongoing prototyping efforts and future applications to support the scientific process.

Contact IEEE to Subscribe

References

References is not available for this document.