TY - JOUR
T1 - Protein-protein interaction abstract identification with contextual bag of words
AU - Tsai, Richard Tzong Han
AU - Hung, Hsieh Chuan
AU - Dai, Hong Jie
AU - Lin, Yi Wen
PY - 2007
Y1 - 2007
N2 - Background: This paper is concerned with the identification of biomedical abstracts related to protein-protein interactions. We propose a novel feature representation scheme, contextual-bag-of-words, to exploit protein name information. Results: Our method outperforms well-known methods that use protein name information as additional features. We further improve performance by extracting reliable and informative instances from unlabeled and likely positive data to provide additional training data. We employ F-measure and the area under a receiver operating characteristic curve (AUC) to measure the classification and ranking abilities, respectively. Our final model achieves an F-measure of 80.34% and an AUC score of 88.06%, which are higher than those of the top-ranking system in BioCreAtIvE-II by 2.34% and 2.52%, respectively. Conclusions: These results show the effectiveness of our contextual-bag-of-words scheme and suggest that our system could serve as an efficient preprocessing tool for modern PPI database curation.
AB - Background: This paper is concerned with the identification of biomedical abstracts related to protein-protein interactions. We propose a novel feature representation scheme, contextual-bag-of-words, to exploit protein name information. Results: Our method outperforms well-known methods that use protein name information as additional features. We further improve performance by extracting reliable and informative instances from unlabeled and likely positive data to provide additional training data. We employ F-measure and the area under a receiver operating characteristic curve (AUC) to measure the classification and ranking abilities, respectively. Our final model achieves an F-measure of 80.34% and an AUC score of 88.06%, which are higher than those of the top-ranking system in BioCreAtIvE-II by 2.34% and 2.52%, respectively. Conclusions: These results show the effectiveness of our contextual-bag-of-words scheme and suggest that our system could serve as an efficient preprocessing tool for modern PPI database curation.
UR - http://www.scopus.com/inward/record.url?scp=84879969057&partnerID=8YFLogxK
M3 - 會議論文
AN - SCOPUS:84879969057
SN - 1613-0073
VL - 319
SP - 7.1-7.24
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 2nd International Symposium on Languages in Biology and Medicine, LBM 2007
Y2 - 6 December 2007 through 7 December 2007
ER -