Cross-domain opinion word identification with query-by-committee active learning

Yi Lin Tsai, Richard Tzong Han Tsai, Chuang Hua Chueh, Sen Chia Chang

研究成果: 雜誌貢獻期刊論文同行評審

9 引文 斯高帕斯(Scopus)

摘要

Opinion word identification (OWI). is an important task for opinion mining. In OWI, it is necessary to find the exact positions of opinion word mentions. Supervised learning approaches can locate such mentions with high accuracy. To construct an OWI system for a new domain, it is necessary to annotate sufficient amounts of data to represent the new domain’s characteristics. However, since annotating every new domain extensively is costly, how to best utilize existing annotated data is a very important challenge for mention-based OWI systems. In this work, we propose a cross-domain OWI system. The query by committee (QBC) active learning scheme is used to select controlled amounts of data in the new domain for manual annotation. This new annotated data is used to complement the existing annotated data of the original domain. We compile three annotated datasets, each for one of three different domains, and conduct domain adaptation experiments on all six domain pairs. Our experiments show that by adding only 1,000 newly annotated sentences from the new domain to the existing annotated data, our system can achieve nearly the same level of accuracy as a system trained on 10,000 annotated new-domain sentences. Our system with the QBC active learning scheme also outperforms the same system with a random selection scheme.

指紋

深入研究「Cross-domain opinion word identification with query-by-committee active learning」主題。共同形成了獨特的指紋。

引用此