Loading [MathJax]/extensions/MathZoom.js
ES2Vec: Earth Science Metadata Keyword Assignment using Domain-Specific Word Embeddings | IEEE Conference Publication | IEEE Xplore

ES2Vec: Earth Science Metadata Keyword Assignment using Domain-Specific Word Embeddings


Abstract:

Earth science metadata keyword assignment is a challenging problem. Dataset curators select appropriate keywords from the Global Change Master Directory (GCMD) set of key...Show More

Abstract:

Earth science metadata keyword assignment is a challenging problem. Dataset curators select appropriate keywords from the Global Change Master Directory (GCMD) set of keywords. The keywords are integral part of search and discovery of these datasets. Hence, selection of keywords are crucial in increasing the discoverability of datasets. Utilizing machine learning techniques, we provide users with automated keyword suggestions as an improved approach to complement manual selection. We trained a machine learning model that leverages the semantic embedding ability of Word2Vec models to process abstracts and suggest relevant keywords. A user interface tool we built to assist data curators in assignment of such keywords is also described.
Published in: 2020 SoutheastCon
Date of Conference: 28-29 March 2020
Date Added to IEEE Xplore: 13 November 2020
ISBN Information:

ISSN Information:

Conference Location: Raleigh, NC, USA
No metrics found for this document.

I. Introduction

NASA's growing collection of Earth science datasets are described by metadata records stored in a catalog called the Common Metadata Repository (CMR) [1]. The CMR leverages the Global Change Mastery Directory (GCMD) [2] science keyword taxonomy, which is a hierarchical set of controlled Earth science keywords. GCMD Keywords are used to help ensure Earth science data, services, and variables are described in a consistent and comprehensive manner [3]. These science keywords are manually assigned to datasets using data providers' and curators' knowledge of the dataset abstracts present in their respective metadata records. This process involves a team of people assigning these keywords to the metadata record with the best of their knowledge about the data. Assigning keywords manually is labor intensive and is prone to human error and inconsistencies. Thus, the error and inconsistencies propagate into the search and discovery of these datasets. Because the science keywords are vital to data discovery, there is a need for a reliable way to assign keywords to dataset.

This work is funded by NASA-IMPACT

Usage
Select a Year
2025

View as

Total usage sinceNov 2020:153
00.511.522.53JanFebMarAprMayJunJulAugSepOctNovDec002000000000
Year Total:2
Data is updated monthly. Usage includes PDF downloads and HTML views.
Contact IEEE to Subscribe

References

References is not available for this document.