I. Introduction
Sparsely labeled classification may exist in many real-world applications such as content-based image retrieval, online web-page recommendation, object identification and text categorization, where the abundant unlabeled instances are available but the labeled ones are fairly expensive to obtain since manually labeling the training data for a machine learning algorithm is a tedious and time-consuming process, and even unpractical(e.g., online web-page recommendation). Correspondingly, one important challenge for large-scale text categorization is how to reduce the number of labeled documents that are required for building reliable text classifier.