TY - JOUR
T1 - Using contextual information to clarify cross-species gene normalization ambiguity
AU - Tsai, Richard Tzong Han
AU - Lai, Po Ting
N1 - Funding Information:
This research was supported in part by the National Science Council under grant NSC 97-2218-E-155-001, NSC 96-2752-E-001-001 and the thematic program of Academia Sinica under grant AS95ASIA02.
PY - 2010/3
Y1 - 2010/3
N2 - The goal of Gene Normalization (GN) is to identify the unique database IDs of genes and proteins mentioned in biomedical literature. A major difficulty in GN comes from the ambiguity of gene names. That is, the same gene name can refer to different database IDs depending on the species in question. In this paper, we introduce a method to exploit contextual information in an abstract, like tissue type, chromosome location, etc., to tackle this problem. Using this technique, we have been able to improve system performance (F-score) by 14.3% on the BioCreAtIvE-II GN task test set. We also examined our method on a full-text dataset with cross-species genes. The experimental results show a promising performance (AUC) of 42.94%. Our experimental results also show that with full text, versus abstract only, the system performance was 12.24% higher.
AB - The goal of Gene Normalization (GN) is to identify the unique database IDs of genes and proteins mentioned in biomedical literature. A major difficulty in GN comes from the ambiguity of gene names. That is, the same gene name can refer to different database IDs depending on the species in question. In this paper, we introduce a method to exploit contextual information in an abstract, like tissue type, chromosome location, etc., to tackle this problem. Using this technique, we have been able to improve system performance (F-score) by 14.3% on the BioCreAtIvE-II GN task test set. We also examined our method on a full-text dataset with cross-species genes. The experimental results show a promising performance (AUC) of 42.94%. Our experimental results also show that with full text, versus abstract only, the system performance was 12.24% higher.
KW - Gene normalization
KW - Natural language processing
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=77953242769&partnerID=8YFLogxK
U2 - 10.1142/S0218194010004694
DO - 10.1142/S0218194010004694
M3 - 期刊論文
AN - SCOPUS:77953242769
SN - 0218-1940
VL - 20
SP - 197
EP - 214
JO - International Journal of Software Engineering and Knowledge Engineering
JF - International Journal of Software Engineering and Knowledge Engineering
IS - 2
ER -