Conferences >IGARSS 2022 - 2022 IEEE Inter...

Language Model for Earth Science: Exploring Potential Downstream Applications as well as Current Challenges

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The use of deep learning techniques to build transformer language models such as SciBERT and GPT3 have transformed the natural language technology (NLT) landscape. These ...Show More

Metadata

Abstract:

The use of deep learning techniques to build transformer language models such as SciBERT and GPT3 have transformed the natural language technology (NLT) landscape. These new NLTs are being used in speech to text and vice versa, automated text classification, sentiment analysis, topic modeling, text summarization, and cognitive assistants. While Earth science has no shortage of unstructured data such as journal and conference papers, little efforts have focused on harnessing NLTs for knowledge extraction and supporting the scientific process. This paper surveys the use of language models in different science. BERT-E, a new Earth science-specific language model, is presented. BERT-E is generated using a transfer learning solution. A language model that has already been trained for general Science (SciBERT) is fine-tuned using abstracts and full text extracted from various Earth science-related articles. A downstream keywords classification application is used for evaluation, and the use of BERT-E shows improved performance. The need to develop a robust set of benchmarks in evaluating the language model such as BERT-E is discussed. Finally, example applications are presented to inspire additional ideas for applications using domain-specific language models.

Published in: IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium

Date of Conference: 17-22 July 2022

Date Added to IEEE Xplore: 28 September 2022

ISBN Information:

ISSN Information:

DOI: 10.1109/IGARSS46834.2022.9883682

Conference Location: Kuala Lumpur, Malaysia

Contents

1. Introduction

Unstructured (mainly text) data remains an untapped resource with tremendous potential insights. Recent advances in technology have transformed the Natural Language Technology (NLT) landscape, specifically, the use of deep learning techniques to build transformer language models such as SciB-ERT [1] and GPT3 [2]. These new NLTs are being adopted in the private industry to improve operations and services [3]. Most of the uses focus on improving customer services. Industry applications focus on the operational use of these technologies include translations from speech to text and vice versa, automated text classification, sentiment analysis for feedback, comments and financial reports, topic modeling, text summarization, personalization and cognitive assistants (contextual chatbots). These technologies also serve as the basis for insight engines. Insight engines combine search capabilities with artificial intelligence to deliver actionable insights from the full spectrum of content and data sourced within and external to an enterprise [3]. Earth science has no shortage of unstructured data. Almost all knowledge within Earth science gets published as journal or conference papers. However, limited efforts have focused on harnessing this potential resource for knowledge extraction and supporting the scientific process. This paper examines four aspects related to language models in Earth science. First, it surveys the use of Language models in different science areas to provide context. Second, it describes BERT-E, an Earth science-specific language model. Third, the challenges of developing robust benchmarks in evaluating the language model such as BERT-E are presented. Finally, the paper explores the use of BERT-E in some ongoing prototyping efforts and future applications to support the scientific process.

References is not available for this document.

MIT Libraries

MIT Libraries

Language Model for Earth Science: Exploring Potential Downstream Applications as well as Current Challenges

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Language Model for Earth Science: Exploring Potential Downstream Applications as well as Current Challenges

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References