Loading [MathJax]/extensions/MathMenu.js
Automatic classification of documents by formality | IEEE Conference Publication | IEEE Xplore

Automatic classification of documents by formality


Abstract:

This paper addresses the task of classifying documents into formal or informal style. We studied the main characteristics of each style in order to choose features that a...Show More

Abstract:

This paper addresses the task of classifying documents into formal or informal style. We studied the main characteristics of each style in order to choose features that allowed us to train classifiers that can distinguish between the two styles. We built our data set by collecting documents for both styles, from different sources. We tested several classification algorithms, namely Decision Trees, Naïve Bayes, and Support Vector Machines, to choose the classifier that leads to the best classification results. We performed attribute selection in order to determine the contribution of each feature to our model.
Date of Conference: 21-23 August 2010
Date Added to IEEE Xplore: 30 September 2010
ISBN Information:
Conference Location: Beijing, China
No metrics found for this document.

1. Introduction

The need for identifying and interpreting possible differences in linguistic style of texts, such as between formal and informal styles, has increased nowadays as more and more people are using the Internet as a main resource for their researches. There are different factors that affect formality, such as words and expressions, as well as syntactical features. Vocabulary choice is perhaps the biggest style marker. Generally speaking, longer words and Latin origin verbs are formal, while phrasal verbs and idioms are informal. There are also many formal/informal style equivalents that can be used in writing.

Usage
Select a Year
2025

View as

Total usage sinceJan 2011:366
02468JanFebMarAprMayJunJulAugSepOctNovDec137000000000
Year Total:11
Data is updated monthly. Usage includes PDF downloads and HTML views.

Contact IEEE to Subscribe

References

References is not available for this document.