Loading [MathJax]/extensions/MathZoom.js
ITSMatch: Improved Safe Semi-Supervised Text Classification Under Class Distribution Mismatch | IEEE Journals & Magazine | IEEE Xplore

ITSMatch: Improved Safe Semi-Supervised Text Classification Under Class Distribution Mismatch


Abstract:

Deep semi-supervised learning (SSL) brings deep learning from lab with expensive label data costs to real-world commercial application. Today, deep SSL is being universal...Show More

Abstract:

Deep semi-supervised learning (SSL) brings deep learning from lab with expensive label data costs to real-world commercial application. Today, deep SSL is being universally applied in various artificial intelligence commercial technologies. However, there may be a distribution mismatch between labeled and unlabeled datasets in practical application, which is a key issue that degrades deep SSL performance. Some recent studies deal with out-of-distribution (OOD) data by directly removing or uniformly reducing weights, which ignore potential value of OOD data. To address the issue, we propose ITSMatch, a simple, safe and effective SSL method to process text classification by recycling OOD data near labeled domain to fully utilize data information. Specifically, a weighted adversarial domain adaptation is applied to OOD data to project it into the space of labeled and in-distribution (ID) data, and its recover ability is accurately quantified by the transferable score. ITSMatch unifies mainstream methods, including pseudo-labels generation and consistency regularization on unlabeled data and its augmented data. Besides, we also perform metric learning on labeled data and ID data with pseudo-labels to fully acquire sample space features. Experiment results on the AG News and Yelp datasets demonstrate that our ITSMatch method performs better than the baseline methods including TextSMatch, MixText, UDA, and BERT. This method of semi-supervised text classification can be applied to the analysis of product reviews on e-commerce platforms to improve customers’ online shopping experience.
Published in: IEEE Transactions on Consumer Electronics ( Volume: 70, Issue: 2, May 2024)
Page(s): 4729 - 4740
Date of Publication: 13 October 2023

ISSN Information:

Funding Agency:


I. Introduction

Text classification technology, powered by artificial intelligence, has been widely used in intelligent electronic systems. Sentiment analysis is a fundamental application of text classification to analyze users’ emotional tendencies towards specific products, services, or events by recognizing and classifying emotions. Afzaal et al. [1] proposed a framework for sentiment classification of tourism reviews. Kim et al. [2] introduced a system that classifies music based on users’ emotional responses. Similarly, Chatterjee et al. [3] noted the widespread application of speech emotion recognition (SER) in the consumer field and introduced a comprehensive method for analyzing sentiment in human speech. Prabhakar et al. [4] introduced a multi-channel CNN-BLSTM architecture to enhance traditional speech emotion recognition techniques. Moreover, text classification can also be employed for user intention recognition, analyzing user comments, feedback, or social media posts to discern their intentions. Pal et al. [5] analyzed factors influencing users’ decisions to purchase IoT (Internet of Things) consumer electronic devices by classifying text from dialogues. The growing popularity and development of consumer electronics underscore the increasing demand for intelligent systems to effectively process and classify text.

Contact IEEE to Subscribe

References

References is not available for this document.