Protein-protein interaction abstract identification with contextual bag of words

Richard Tzong Han Tsai, Hsieh Chuan Hung, Hong Jie Dai, Yi Wen Lin

Research output: Contribution to journalConference articlepeer-review


Background: This paper is concerned with the identification of biomedical abstracts related to protein-protein interactions. We propose a novel feature representation scheme, contextual-bag-of-words, to exploit protein name information. Results: Our method outperforms well-known methods that use protein name information as additional features. We further improve performance by extracting reliable and informative instances from unlabeled and likely positive data to provide additional training data. We employ F-measure and the area under a receiver operating characteristic curve (AUC) to measure the classification and ranking abilities, respectively. Our final model achieves an F-measure of 80.34% and an AUC score of 88.06%, which are higher than those of the top-ranking system in BioCreAtIvE-II by 2.34% and 2.52%, respectively. Conclusions: These results show the effectiveness of our contextual-bag-of-words scheme and suggest that our system could serve as an efficient preprocessing tool for modern PPI database curation.

Original languageEnglish
Pages (from-to)7.1-7.24
JournalCEUR Workshop Proceedings
StatePublished - 2007
Event2nd International Symposium on Languages in Biology and Medicine, LBM 2007 - Singapore, Singapore
Duration: 6 Dec 20077 Dec 2007


Dive into the research topics of 'Protein-protein interaction abstract identification with contextual bag of words'. Together they form a unique fingerprint.

Cite this