TY - JOUR
T1 - Automatic Extraction of Medication Mentions from Tweets - Overview of the BioCreative VII Shared Task 3 Competition
AU - Weissenbacher, Davy
AU - O'Connor, Karen
AU - Rawal, Siddharth
AU - Zhang, Yu
AU - Tsai, Richard Tzong Han
AU - Miller, Timothy
AU - Xu, Dongfang
AU - Anderson, Carol
AU - Liu, Bo
AU - Han, Qing
AU - Zhang, Jinfeng
AU - Kulev, Igor
AU - Köprü, Berkay
AU - Rodriguez-Esteban, Raul
AU - Ozkirimli, Elif
AU - Ayach, Ammer
AU - Roller, Roland
AU - Piccolo, Stephen
AU - Han, Peijin
AU - Vydiswaran, V. G.Vinod
AU - Tekumalla, Ramya
AU - Banda, Juan M.
AU - Bagherzadeh, Parsa
AU - Bergler, Sabine
AU - Silva, João F.
AU - Almeida, Tiago
AU - Martinez, Paloma
AU - Rivera-Zavala, Renzo
AU - Wang, Chen Kai
AU - Dai, Hong Jie
AU - Alberto Robles Hernandez, Luis
AU - Gonzalez-Hernandez, Graciela
N1 - Publisher Copyright:
© 2023 The Author(s). Published by Oxford University Press.
PY - 2023
Y1 - 2023
N2 - This study presents the outcomes of the shared task competition BioCreative VII (Task 3) focusing on the extraction of medication names from a Twitter user's publicly available tweets (the user's 'timeline'). In general, detecting health-related tweets is notoriously challenging for natural language processing tools. The main challenge, aside from the informality of the language used, is that people tweet about any and all topics, and most of their tweets are not related to health. Thus, finding those tweets in a user's timeline that mention specific health-related concepts such as medications requires addressing extreme imbalance. Task 3 called for detecting tweets in a user's timeline that mentions a medication name and, for each detected mention, extracting its span. The organizers made available a corpus consisting of 182 049 tweets publicly posted by 212 Twitter users with all medication mentions manually annotated. The corpus exhibits the natural distribution of positive tweets, with only 442 tweets (0.2%) mentioning a medication. This task was an opportunity for participants to evaluate methods that are robust to class imbalance beyond the simple lexical match. A total of 65 teams registered, and 16 teams submitted a system run. This study summarizes the corpus created by the organizers and the approaches taken by the participating teams for this challenge. The corpus is freely available at https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-3/. The methods and the results of the competing systems are analyzed with a focus on the approaches taken for learning from class-imbalanced data.
AB - This study presents the outcomes of the shared task competition BioCreative VII (Task 3) focusing on the extraction of medication names from a Twitter user's publicly available tweets (the user's 'timeline'). In general, detecting health-related tweets is notoriously challenging for natural language processing tools. The main challenge, aside from the informality of the language used, is that people tweet about any and all topics, and most of their tweets are not related to health. Thus, finding those tweets in a user's timeline that mention specific health-related concepts such as medications requires addressing extreme imbalance. Task 3 called for detecting tweets in a user's timeline that mentions a medication name and, for each detected mention, extracting its span. The organizers made available a corpus consisting of 182 049 tweets publicly posted by 212 Twitter users with all medication mentions manually annotated. The corpus exhibits the natural distribution of positive tweets, with only 442 tweets (0.2%) mentioning a medication. This task was an opportunity for participants to evaluate methods that are robust to class imbalance beyond the simple lexical match. A total of 65 teams registered, and 16 teams submitted a system run. This study summarizes the corpus created by the organizers and the approaches taken by the participating teams for this challenge. The corpus is freely available at https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-3/. The methods and the results of the competing systems are analyzed with a focus on the approaches taken for learning from class-imbalanced data.
UR - http://www.scopus.com/inward/record.url?scp=85153271418&partnerID=8YFLogxK
U2 - 10.1093/database/baac108
DO - 10.1093/database/baac108
M3 - 期刊論文
AN - SCOPUS:85153271418
SN - 1758-0463
VL - 2023
JO - Database : the journal of biological databases and curation
JF - Database : the journal of biological databases and curation
M1 - baac108
ER -