TY - GEN
T1 - MingOfficial
T2 - 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023
AU - Chen, You Jun
AU - Hsieh, Hsin Yi
AU - Lin, Yu Tung
AU - Tian, Yingtao
AU - Chan, Bert Wang Chak
AU - Liu, Yu Sin
AU - Lin, Yi Hsuan
AU - Tsai, Richard Tzong Han
N1 - Publisher Copyright:
©2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - In Chinese studies, understanding the nuanced traits of historical figures, often not explicitly evident in biographical data, has been a key interest. However, identifying these traits can be challenging due to the need for domain expertise, specialist knowledge, and context-specific insights, making the eprocess time-consuming and difficult to scale. Our focus on studying officials from China's Ming Dynasty is no exception. To tackle this challenge, we propose MingOfficial, a large-scale multi-modal dataset consisting of both structured (career records, annotated personnel types) and text (historical texts) data for 13, 031 officials. We further couple the dataset with a graph neural network (GNN) to combine both modalities in order to allow investigation of social structures and provide features to boost down-stream tasks. Experiments show that our proposed MingOfficial could enable exploratory analysis of official identities, and also significantly boost performance in tasks such as identifying nuance identities (e.g. civil officials holding military power) from 24.6% to 98.2% F1 score in holdout test set. By making MingOfficial publicly available at https://data.depositar.io/en/dataset/ming_official as both a dataset and an interactive tool, we aim to stimulate further research into the role of social context and representation learning in identifying individual characteristics, and hope to provide inspiration for computational approaches in other fields beyond Chinese studies.
AB - In Chinese studies, understanding the nuanced traits of historical figures, often not explicitly evident in biographical data, has been a key interest. However, identifying these traits can be challenging due to the need for domain expertise, specialist knowledge, and context-specific insights, making the eprocess time-consuming and difficult to scale. Our focus on studying officials from China's Ming Dynasty is no exception. To tackle this challenge, we propose MingOfficial, a large-scale multi-modal dataset consisting of both structured (career records, annotated personnel types) and text (historical texts) data for 13, 031 officials. We further couple the dataset with a graph neural network (GNN) to combine both modalities in order to allow investigation of social structures and provide features to boost down-stream tasks. Experiments show that our proposed MingOfficial could enable exploratory analysis of official identities, and also significantly boost performance in tasks such as identifying nuance identities (e.g. civil officials holding military power) from 24.6% to 98.2% F1 score in holdout test set. By making MingOfficial publicly available at https://data.depositar.io/en/dataset/ming_official as both a dataset and an interactive tool, we aim to stimulate further research into the role of social context and representation learning in identifying individual characteristics, and hope to provide inspiration for computational approaches in other fields beyond Chinese studies.
UR - http://www.scopus.com/inward/record.url?scp=85184811593&partnerID=8YFLogxK
M3 - 會議論文篇章
AN - SCOPUS:85184811593
T3 - EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings
SP - 4380
EP - 4401
BT - EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings
A2 - Bouamor, Houda
A2 - Pino, Juan
A2 - Bali, Kalika
PB - Association for Computational Linguistics (ACL)
Y2 - 6 December 2023 through 10 December 2023
ER -