I. Introduction
NASA's growing collection of Earth science datasets are described by metadata records stored in a catalog called the Common Metadata Repository (CMR) [1]. The CMR leverages the Global Change Mastery Directory (GCMD) [2] science keyword taxonomy, which is a hierarchical set of controlled Earth science keywords. GCMD Keywords are used to help ensure Earth science data, services, and variables are described in a consistent and comprehensive manner [3]. These science keywords are manually assigned to datasets using data providers' and curators' knowledge of the dataset abstracts present in their respective metadata records. This process involves a team of people assigning these keywords to the metadata record with the best of their knowledge about the data. Assigning keywords manually is labor intensive and is prone to human error and inconsistencies. Thus, the error and inconsistencies propagate into the search and discovery of these datasets. Because the science keywords are vital to data discovery, there is a need for a reliable way to assign keywords to dataset.
This work is funded by NASA-IMPACT