TY - JOUR
T1 - Surveying biomedical relation extraction
T2 - a critical examination of current datasets and the proposal of a new resource
AU - Huang, Ming Siang
AU - Han, Jen Chieh
AU - Lin, Pei Yen
AU - You, Yu Ting
AU - Tsai, Richard Tzong Han
AU - Hsu, Wen Lian
N1 - Publisher Copyright:
© The Author(s) 2024. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
PY - 2024/5/1
Y1 - 2024/5/1
N2 - Natural language processing (NLP) has become an essential technique in various fields, offering a wide range of possibilities for analyzing data and developing diverse NLP tasks. In the biomedical domain, understanding the complex relationships between compounds and proteins is critical, especially in the context of signal transduction and biochemical pathways. Among these relationships, protein–protein interactions (PPIs) are of particular interest, given their potential to trigger a variety of biological reactions. To improve the ability to predict PPI events, we propose the protein event detection dataset (PEDD), which comprises 6823 abstracts, 39 488 sentences and 182 937 gene pairs. Our PEDD dataset has been utilized in the AI CUP Biomedical Paper Analysis competition, where systems are challenged to predict 12 different relation types. In this paper, we review the state-of-the-art relation extraction research and provide an overview of the PEDD’s compilation process. Furthermore, we present the results of the PPI extraction competition and evaluate several language models’ performances on the PEDD. This paper’s outcomes will provide a valuable roadmap for future studies on protein event detection in NLP. By addressing this critical challenge, we hope to enable breakthroughs in drug discovery and enhance our understanding of the molecular mechanisms underlying various diseases.
AB - Natural language processing (NLP) has become an essential technique in various fields, offering a wide range of possibilities for analyzing data and developing diverse NLP tasks. In the biomedical domain, understanding the complex relationships between compounds and proteins is critical, especially in the context of signal transduction and biochemical pathways. Among these relationships, protein–protein interactions (PPIs) are of particular interest, given their potential to trigger a variety of biological reactions. To improve the ability to predict PPI events, we propose the protein event detection dataset (PEDD), which comprises 6823 abstracts, 39 488 sentences and 182 937 gene pairs. Our PEDD dataset has been utilized in the AI CUP Biomedical Paper Analysis competition, where systems are challenged to predict 12 different relation types. In this paper, we review the state-of-the-art relation extraction research and provide an overview of the PEDD’s compilation process. Furthermore, we present the results of the PPI extraction competition and evaluate several language models’ performances on the PEDD. This paper’s outcomes will provide a valuable roadmap for future studies on protein event detection in NLP. By addressing this critical challenge, we hope to enable breakthroughs in drug discovery and enhance our understanding of the molecular mechanisms underlying various diseases.
KW - natural language processing
KW - protein–protein interaction
KW - relation extraction
UR - http://www.scopus.com/inward/record.url?scp=85190491167&partnerID=8YFLogxK
U2 - 10.1093/bib/bbae132
DO - 10.1093/bib/bbae132
M3 - 回顧評介論文
C2 - 38609331
AN - SCOPUS:85190491167
SN - 1467-5463
VL - 25
JO - Briefings in bioinformatics
JF - Briefings in bioinformatics
IS - 3
M1 - bbae132
ER -