Using contextual information to clarify cross-species gene normalization ambiguity

研究成果: 雜誌貢獻期刊論文同行評審

摘要

The goal of Gene Normalization (GN) is to identify the unique database IDs of genes and proteins mentioned in biomedical literature. A major difficulty in GN comes from the ambiguity of gene names. That is, the same gene name can refer to different database IDs depending on the species in question. In this paper, we introduce a method to exploit contextual information in an abstract, like tissue type, chromosome location, etc., to tackle this problem. Using this technique, we have been able to improve system performance (F-score) by 14.3% on the BioCreAtIvE-II GN task test set. We also examined our method on a full-text dataset with cross-species genes. The experimental results show a promising performance (AUC) of 42.94%. Our experimental results also show that with full text, versus abstract only, the system performance was 12.24% higher.

原文???core.languages.en_GB???
頁(從 - 到)197-214
頁數18
期刊International Journal of Software Engineering and Knowledge Engineering
20
發行號2
DOIs
出版狀態已出版 - 3月 2010

指紋

深入研究「Using contextual information to clarify cross-species gene normalization ambiguity」主題。共同形成了獨特的指紋。

引用此