Abstract
This paper describes our NYCU-NLP system design for multi-author writing style analysis tasks of the PAN Lab at CLEF 2024. We propose a unified architecture integrating transformer-based models with similarity adjustments to identify author switches within a given multi-author document. We first fine-tune the RoBERTa, DeBERTa and ERNIE transformers to detect differences in writing style in two given paragraphs. The output prediction is then determined by the ensemble mechanism. We also use similarity adjustments to further enhance multi-author analysis performance. The experimental data contains three difficulty levels to reflect simultaneous changes of authorship and topic. Our submission achieved a macro F1-score of 0.964, 0.857 and 0.863 respectively for the easy, medium and hard levels, ranking first and second, respectively for hard and medium levels out of 16 and 17 participating teams.
Original language | English |
---|---|
Pages (from-to) | 2716-2721 |
Number of pages | 6 |
Journal | CEUR Workshop Proceedings |
Volume | 3740 |
State | Published - 2024 |
Event | 25th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF 2024 - Grenoble, France Duration: 9 Sep 2024 → 12 Sep 2024 |
Keywords
- Authorship Analysis
- Embedding Similarity
- Plagiarism Detection
- Pre-trained Language Models