Journals & Magazines >IEEE/ACM Transactions on Comp... >Volume: 8 Issue: 2

A Framework for Semisupervised Feature Generation and Its Applications in Biomedical Literature Mining

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Feature representation is essential to machine learning and text mining. In this paper, we present a feature coupling generalization (FCG) framework for generating new fe...Show More

Metadata

Abstract:

Feature representation is essential to machine learning and text mining. In this paper, we present a feature coupling generalization (FCG) framework for generating new features from unlabeled data. It selects two special types of features, i.e., example-distinguishing features (EDFs) and class-distinguishing features (CDFs) from original feature set, and then generalizes EDFs into higher-level features based on their coupling degrees with CDFs in unlabeled data. The advantage is: EDFs with extreme sparsity in labeled data can be enriched by their co-occurrences with CDFs in unlabeled data so that the performance of these low-frequency features can be greatly boosted and new information from unlabeled can be incorporated. We apply this approach to three tasks in biomedical literature mining: gene named entity recognition (NER), protein-protein interaction extraction (PPIE), and text classification (TC) for gene ontology (GO) annotation. New features are generated from over 20 GB unlabeled PubMed abstracts. The experimental results on BioCreative 2, AIMED corpus, and TREC 2005 Genomics Track show that 1) FCG can utilize well the sparse features ignored by supervised learning. 2) It improves the performance of supervised baselines by 7.8 percent, 5.0 percent, and 5.8 percent, respectively, in the tree tasks. 3) Our methods achieve 89.1, 64.5 F-score, and 60.1 normalized utility on the three benchmark data sets.

Published in: IEEE/ACM Transactions on Computational Biology and Bioinformatics ( Volume: 8, Issue: 2, March-April 2011)

Page(s): 294 - 307

Date of Publication: 30 September 2010

ISSN Information:

PubMed ID: 20876938

DOI: 10.1109/TCBB.2010.99

Citations are not available for this document.

Contents

1 Introduction

With the exponential explosion of biomedical literature, such as MEDLINE, developing automatic text mining tools has become essential for people to seek information more accurately and efficiently. Biomedical text miming (BioTM) [12] becomes a hot area in data mining. The fundamental tasks, such as named entity recognition (NER), protein-protein interaction extraction (PPIE) and text classification (TC) have attracted a lot of research interests in various domains including Bioinformatics, natural language processing (NLP), and machine learning (ML). Although these tasks focus on extracting information of different formats, e.g., entities, relations or documents, classical methods usually treat them as the classification of text snippets and the methodologies have a lot in common. In traditional methods, each example is represented by a feature vector where each element is generated by a Boolean function indicating whether a word, n-gram or lexical pattern appears in the current example, and then these features are integrated in a supervised learning framework.

Cites in Papers - |

Cites in Papers - IEEE (5)

Select All

Han Hu, Yonggang Wen, Tat-Seng Chua, Xuelong Li, "Toward Scalable Systems for Big Data Analytics: A Technology Tutorial", IEEE Access, vol.2, pp.652-687, 2014.

Show Article

Google Scholar

Dinesh Kumar A., Shahul Hammed, Hanah Ayisha V Hyder Ali, Vigneshwar Manokar, "Detecting diseased images by segmentation and classification based on semi - supervised learning", 2012 12th International Conference on Hybrid Intelligent Systems (HIS), pp.549-554, 2012.

Show Article

Google Scholar

Yanhua Wang, Zhihao Yang, Hongfei Lin, Yanpeng Li, "A syntactic rule-based method for automatic pathway information extraction from biomédical literature", 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, pp.626-633, 2012.

Show Article

Google Scholar

Jian Wang, Qian Xu, Hongfei Lin, Zhihao Yang, Yanpeng Li, "Combining labeled and unlabeled data for biomédical event extraction", 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, pp.594-601, 2012.

Show Article

Google Scholar

hong huang, Hailiang Feng, "Gene Classification Using Parameter-Free Semi-Supervised Manifold Learning", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no.3, pp.818-827, 2012.

Show Article

Google Scholar

Cites in Papers - Other Publishers (7)

Wenhui Hu, Xueyang Liu, Yu Huang, Yu Wang, Minghui Zhang, Hui Zhao, Algorithms and Architectures for Parallel Processing, vol.12453, pp.603, 2020.

CrossRef

Francesco Corea, An Introduction to Data, vol.50, pp.1, 2019.

CrossRef

Francesco Corea, Big Data Analytics: A Management Perspective, vol.21, pp.1, 2016.

CrossRef

Weigang Hou, Pengxing Guo, Lei Guo, Big Data Computing and Communications, vol.9196, pp.103, 2015.

CrossRef

Min Chen, Shiwen Mao, Yin Zhang, Victor C. M. Leung, "Big Data Applications", Big Data, pp.59, 2014.

CrossRef Google Scholar

Y. Mao, K. Van Auken, D. Li, C. N. Arighi, P. McQuilton, G. T. Hayman, S. Tweedie, M. L. Schaeffer, S. J. F. Laulederkind, S.-J. Wang, J. Gobeill, P. Ruch, A. T. Luu, J.-j. Kim, J.-H. Chiang, Y.-D. Chen, C.-J. Yang, H. Liu, D. Zhu, Y. Li, H. Yu, E. Emadzadeh, G. Gonzalez, J.-M. Chen, H.-J. Dai, Z. Lu, "Overview of the gene ontology task at BioCreative IV", Database, vol.2014, no.0, pp.bau086, 2014.

CrossRef Google Scholar

Y. Li, H. Yu, "A robust data-driven approach for gene ontology annotation", Database, vol.2014, no.0, pp.bau113, 2014.

CrossRef Google Scholar

References is not available for this document.

A Framework for Semisupervised Feature Generation and Its Applications in Biomedical Literature Mining

Abstract:

Metadata

Abstract:

ISSN Information:

1 Introduction

Cites in Papers - |

Cites in Papers - IEEE (5)

Cites in Papers - Other Publishers (7)

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A Framework for Semisupervised Feature Generation and Its Applications in Biomedical Literature Mining

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1 Introduction

Cites in Papers - IEEE (5) | Other Publishers (7)

Cites in Papers - IEEE (5)

Cites in Papers - Other Publishers (7)

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Cites in Papers - |