TY - JOUR
T1 - Multistage gene normalization and SVM-based ranking for protein interactor extraction in full-text articles
AU - Dai, Hong Jie
AU - Lai, Po Ting
AU - Tsai, Richard Tzong Han
PY - 2010
Y1 - 2010
N2 - The interactor normalization task (INT) is to identify genes that play the interactor role in protein-protein interactions (PPIs), to map these genes to unique IDs, and to rank them according to their normalized confidence. INT has two subtasks: gene normalization (GN) and interactor ranking. The main difficulties of INT GN are identifying genes across species and using full papers instead of abstracts. To tackle these problems, we developed a multistage GN algorithm and a ranking method, which exploit information in different parts of a paper. Our system achieved a promising AUC of 0.43471. Using the multistage GN algorithm, we have been able to improve system performance (AUC) by 1.719 percent compared to a one-stage GN algorithm. Our experimental results also show that with full text, versus abstract only, INT AUC performance was 22.6 percent higher.
AB - The interactor normalization task (INT) is to identify genes that play the interactor role in protein-protein interactions (PPIs), to map these genes to unique IDs, and to rank them according to their normalized confidence. INT has two subtasks: gene normalization (GN) and interactor ranking. The main difficulties of INT GN are identifying genes across species and using full papers instead of abstracts. To tackle these problems, we developed a multistage GN algorithm and a ranking method, which exploit information in different parts of a paper. Our system achieved a promising AUC of 0.43471. Using the multistage GN algorithm, we have been able to improve system performance (AUC) by 1.719 percent compared to a one-stage GN algorithm. Our experimental results also show that with full text, versus abstract only, INT AUC performance was 22.6 percent higher.
KW - Data mining
KW - feature evaluation and selection
KW - mining methods and algorithms
KW - scientific databases
KW - text mining
UR - http://www.scopus.com/inward/record.url?scp=77955464970&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2010.45
DO - 10.1109/TCBB.2010.45
M3 - 期刊論文
C2 - 20479501
AN - SCOPUS:77955464970
SN - 1545-5963
VL - 7
SP - 412
EP - 420
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 3
M1 - 5467043
ER -