A supervised learning approach to biological question answering

Ryan T.K. Lin, Justin Liang Te Chiu, Hong Jie Dai, Richard Tzong Han Tsai, Min Yuh Day, Wen Lian Hsu

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

Biologists rely on keyword-based search engines to retrieve superficially relevant papers, from which they must filter out the irrelevant information manually. Question answering (QA) systems can offer more efficient and user-friendly ways of retrieving such information. Two contributions are provided in this paper. First, a factoid QA system is developed to employ a named entity recognition module to extract answer candidates and a linear model to rank them. The linear model uses various semantic features, such as named entity types and semantic roles. To tune the weights of features used by the model, a novel supervised learning algorithm, which only needs small amounts of training data, is provided. Second, a QA system may assign several answers with the same score, making evaluation unfair. To solve this problem, an efficient formula for a mean average reciprocal rank (MARR) is proposed to reduce the complexity of its computation. After employing all effective semantic features, our system achieves a top-1 MARR of 74.11% and top-5 MARR of 76.68%. In comparison of the baseline system, the top-1 and top-5 MARR increase by 9.5% and 7.1%. In addition, the experiment result on test set shows our ranking method, which achieves 55.58% top-1 MARR and 66.99% top-5 MARR, significantly surpasses traditional BM25 and simple voting in performance by averagely 35.23% and 36.64%, respectively.

Original languageEnglish
Pages (from-to)271-281
Number of pages11
JournalIntegrated Computer-Aided Engineering
Volume16
Issue number3
DOIs
StatePublished - 2009

Fingerprint

Dive into the research topics of 'A supervised learning approach to biological question answering'. Together they form a unique fingerprint.

Cite this