Abstract:
The ability of word embeddings to identify shared semantic regularities between word pair categories such as capital-country has led to the use of analogies as a method o...Show MoreMetadata
Abstract:
The ability of word embeddings to identify shared semantic regularities between word pair categories such as capital-country has led to the use of analogies as a method of validating word embedding models. Further research has shown that relative to the complete breadth of possible analogy categories, there exists a limit to the particular categories accessible, in terms of accuracy, to current analogy equations executed against word embeddings trained on generalized, non domain-specific text corpora. As most, if not all, domain-specific, scientific analogy pairs belong to problematic analogy categories (i.e. the lexicographical and the encyclopedic), we examine the degree to which a domain-specific text corpus and vocabulary positively improve analogy predictions from word embeddings. Our findings demonstrate that in comparison to analogy-based tests performed against general word embeddings, predictions by domain-specific word embeddings outperform in exactly those analogy categories that are both highly problematic and the location of domain knowledge.
Published in: 2020 SoutheastCon
Date of Conference: 28-29 March 2020
Date Added to IEEE Xplore: 13 November 2020
ISBN Information:
ISSN Information:
No metrics found for this document.