Conferences >2020 IEEE International Confe...

Sent2Vec: A New Sentence Embedding Representation With Sentimental Semantic

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Text classification is considered as one of the primary task in many Natural Language Processing (NLP) applications. In industrial applications of NLP, sentimental analys...Show More

Metadata

Abstract:

Text classification is considered as one of the primary task in many Natural Language Processing (NLP) applications. In industrial applications of NLP, sentimental analysis is a task to understand how satisfied a user is after receiving a service or buying a product. The traditional approach is to convert a text into a format of numeric vector before feeding into machine learning algorithm. This representation of a word refers to word embedding. However the traditional embedding methods often model the syntactic context of words but ignore the sentiment information of text [1]. This can impact on the accuracy of a classification model to predict the correct sentimental score for a text. In this paper, we present Sent2Vec, an alternative embedding representation that includes the sentimental semantic of a sentence in its embedding vector. We utilized the unsupervised Smoothed Inverse Frequency (uSIF) sentence embedding method in the Sent2Vec neural network over a multi million samples dataset. The new sentence embedding presented, can be used as features in downstream (un)supervised tasks, which also leads to better or comparable results compared to sophisticated methods. Furthermore, with a simple logistic regression classifier, Sent2Vec reaches competitive performance to state-of-the-art results on several datasets when combined with GloVe(6B).

Published in: 2020 IEEE International Conference on Big Data (Big Data)

Date of Conference: 10-13 December 2020

Date Added to IEEE Xplore: 19 March 2021

ISBN Information:

DOI: 10.1109/BigData50022.2020.9378337

Conference Location: Atlanta, GA, USA

Funding Agency:

Contents

I. Introduction

Recent research on deep learning to learn a distribution representation vector over words has achieved popularity in Natural Language Processing (NLP). The technique to learn this distributed representation vector is referred to word embedding. In a solid definition word embedding is a technique to learn continuous low-dimensional vector space representations of words by leveraging the contextual information from large corpora. Word2Vec [2] is a neural network model to learn the center word by its surrounding (context) words in a statement. It makes words of similar meanings closer over continuous vector space. Despite its simplicity, the model has generated high quality in NLP tasks such as language modeling, text understanding and machine translation. Other popular representations range from the simplest BOW and its term-frequency based variants [3], language model based methods [4]–[6], topic models [7], [8], Denoising Autoencoders and its variants [9], [10], and distributed vector representations [11], [12], [13]. Another prominent line of work on word embedding includes GloVe [14] and fastText [15]. However, despite the rising popularity regarding the use of word embeddings in different natural language processing task, they often fail to capture the emotional semantics the words convey. To overcome this challenge and include effective information into the word representation, Tang et al [1] proposed Sentiment-Specific Word Embeddings (SSWE) which encodes both positive/negative sentiment and syntactic contextual information in a vector space. This work demonstrates the effectiveness of incorporating sentiment labels in a word level information for sentiment-related tasks compared to other word embeddings. Yet, it only focuses on binary labels, which weakens its generalization ability on other tasks. Yu et al. [16] instead proposed to refine pre-trained word embeddings with a sentiment lexicon, observing improved results. Felbo et al. [17] achieved good results on affect tasks by training a two-layer bidirectional Long Short-Term Memory (bi-LSTM) model, named DeepMoji, to predict emoji of the input document using a huge dataset of 1.2 billions of tweets. However, collecting billions of tweets is expensive and time consuming for researchers. Emo2vec [18] is a word-level representation that encodes emotional semantics into fixed-sized, real-valued vectors. Labutov and Lipson [19] proposed a method that takes an existing embedding and labeled data as input and produces an embedding in the same space, but with a better predictive performance in the supervised task.

References is not available for this document.

MIT Libraries

MIT Libraries

Sent2Vec: A New Sentence Embedding Representation With Sentimental Semantic

Abstract:

Metadata

Abstract:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Sent2Vec: A New Sentence Embedding Representation With Sentimental Semantic

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

I. Introduction

References