Abstract:
Cyber grooming is a compelling problem worldwide nowadays and many reports strongly suggested that it becomes very urgent to tackle this problem to protect the children f...Show MoreMetadata
Abstract:
Cyber grooming is a compelling problem worldwide nowadays and many reports strongly suggested that it becomes very urgent to tackle this problem to protect the children from sexual exploitation. In this study, we propose an effective method for sexual predator identification in online chats based on two-stage classification. The purpose of the first stage is to distinguish predatory conversations from the normal ones while the second stage aims to tell apart between the predator user and the victim within a single predatory conversation. Finally, some unique predators are derived from the second stage result. We investigate several machine learning classifiers including Naive Bayes, Support Vector Machine, Neural Network, Logistic Regression, Random Forest, K-Nearest Neighbors, and Decision Tree with Bag of Words features using several different term weighting methods for this task. We also proposed two ensemble techniques to improve the classification task. The experiment results on PAN12 dataset show that our best method using soft voting based ensemble for first stage and Naive Bayes based method for the second stage obtained an F0.5-score of 0.9348, which would place as number one in the PAN12 competition ranking.
Date of Conference: 29-30 April 2020
Date Added to IEEE Xplore: 04 June 2020
ISBN Information: