Protein-protein interaction abstract identification with contextual bag of words

Richard Tzong Han Tsai, Hsieh Chuan Hung, Hong Jie Dai, Yi Wen Lin

研究成果: 雜誌貢獻會議論文同行評審


Background: This paper is concerned with the identification of biomedical abstracts related to protein-protein interactions. We propose a novel feature representation scheme, contextual-bag-of-words, to exploit protein name information. Results: Our method outperforms well-known methods that use protein name information as additional features. We further improve performance by extracting reliable and informative instances from unlabeled and likely positive data to provide additional training data. We employ F-measure and the area under a receiver operating characteristic curve (AUC) to measure the classification and ranking abilities, respectively. Our final model achieves an F-measure of 80.34% and an AUC score of 88.06%, which are higher than those of the top-ranking system in BioCreAtIvE-II by 2.34% and 2.52%, respectively. Conclusions: These results show the effectiveness of our contextual-bag-of-words scheme and suggest that our system could serve as an efficient preprocessing tool for modern PPI database curation.

頁(從 - 到)7.1-7.24
期刊CEUR Workshop Proceedings
出版狀態已出版 - 2007
事件2nd International Symposium on Languages in Biology and Medicine, LBM 2007 - Singapore, Singapore
持續時間: 6 12月 20077 12月 2007


深入研究「Protein-protein interaction abstract identification with contextual bag of words」主題。共同形成了獨特的指紋。