Abstract:
The ability of word embeddings to identify shared semantic regularities between word pair categories such as capital-country has led to the use of analogies as a method o...Show MoreMetadata
Abstract:
The ability of word embeddings to identify shared semantic regularities between word pair categories such as capital-country has led to the use of analogies as a method of validating word embedding models. Further research has shown that relative to the complete breadth of possible analogy categories, there exists a limit to the particular categories accessible, in terms of accuracy, to current analogy equations executed against word embeddings trained on generalized, non domain-specific text corpora. As most, if not all, domain-specific, scientific analogy pairs belong to problematic analogy categories (i.e. the lexicographical and the encyclopedic), we examine the degree to which a domain-specific text corpus and vocabulary positively improve analogy predictions from word embeddings. Our findings demonstrate that in comparison to analogy-based tests performed against general word embeddings, predictions by domain-specific word embeddings outperform in exactly those analogy categories that are both highly problematic and the location of domain knowledge.
Published in: 2020 SoutheastCon
Date of Conference: 28-29 March 2020
Date Added to IEEE Xplore: 13 November 2020
ISBN Information:
ISSN Information:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Vocabulary ,
- Filtering ,
- Cooling ,
- Semantics ,
- Scattering ,
- Mathematical model ,
- Carbon
- Index Terms
- Word Embedding ,
- Domain-specific Word ,
- Domain-specific Word Embeddings ,
- Cognitive Domains ,
- Text Data ,
- Word Pairs ,
- Lexicographic ,
- Use Of Analogues ,
- General Words ,
- Word Embedding Model ,
- Prediction Accuracy ,
- Syntactic ,
- Percentage Points ,
- Linear Equation ,
- Improvement In Accuracy ,
- Green Technology ,
- Latent Space ,
- Semantic Similarity ,
- Earth Science ,
- Use Of Words ,
- Parallel Questions ,
- Fourth Term ,
- General Text ,
- Compound Words ,
- Tokenized ,
- Top Predictors ,
- Corpus Of Articles ,
- Words In The Corpus ,
- Parallel Relationship ,
- Related Categories
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Vocabulary ,
- Filtering ,
- Cooling ,
- Semantics ,
- Scattering ,
- Mathematical model ,
- Carbon
- Index Terms
- Word Embedding ,
- Domain-specific Word ,
- Domain-specific Word Embeddings ,
- Cognitive Domains ,
- Text Data ,
- Word Pairs ,
- Lexicographic ,
- Use Of Analogues ,
- General Words ,
- Word Embedding Model ,
- Prediction Accuracy ,
- Syntactic ,
- Percentage Points ,
- Linear Equation ,
- Improvement In Accuracy ,
- Green Technology ,
- Latent Space ,
- Semantic Similarity ,
- Earth Science ,
- Use Of Words ,
- Parallel Questions ,
- Fourth Term ,
- General Text ,
- Compound Words ,
- Tokenized ,
- Top Predictors ,
- Corpus Of Articles ,
- Words In The Corpus ,
- Parallel Relationship ,
- Related Categories
- Author Keywords