An annotated corpus for Turkish sentiment analysis at sentence level | IEEE Conference Publication | IEEE Xplore

An annotated corpus for Turkish sentiment analysis at sentence level


Abstract:

With the rapid growth of unstructured data accessible via web, managing these data and finding undiscovered information in huge dataset become a necessary task. Consequen...Show More

Abstract:

With the rapid growth of unstructured data accessible via web, managing these data and finding undiscovered information in huge dataset become a necessary task. Consequently text mining, which can be defined as gleaning important information from natural language text, has emerged. In this study, in order to facilitate information management for aspect based sentiment analysis studies, a Turkish sentiment corpus, which is comprised of user reviews and is annotated semi-automatically, is constructed. In the constructed corpus, the root form of the words, the usage (aspect/multi-aspect/seedsentiment/absent) of these words, Part of Speech (POS) tags and their polarities are defined. Turkish hotel review dataset which contains 1000 reviews and 5364 sentences for this study was crawled from a web source. The system takes reviews, aspect and seedsentiment lists and returns JSON data structures of the annotated corpus. In this paper, both we provide a ready to use dataset for developing aspect based sentiment analysis applications and we make this dataset easy to use for Java applications by creating JSON data.
Date of Conference: 16-17 September 2017
Date Added to IEEE Xplore: 02 November 2017
ISBN Information:
Conference Location: Malatya, Turkey

I. Introduction

In recent years, with the proliferation of online review websites, user's preferences and reviews show powerful impact on customers and companies [1]. While customers benefit from these reviews in the decision-making process, companies utilize feedbacks for developing their own brand products [2]. Moreover, many real-world applications take advantage of impacts effectively [3]. Unfortunately, the number of reviews expands; it is getting difficult to obtain information from these unstructured data manually [4]. So there is a need to develop an efficient document representation framework that is capable of annotating words in the reviews for the purpose of sentiment analysis. Therefore, text mining could be a suitable solution toward managing unstructured data to retrieve what information we want.

Contact IEEE to Subscribe

References

References is not available for this document.