CoNLL#: Fine-grained Error Analysis and a Corrected Test Set for CoNLL-03 English

Andrew Rueda; Elena Alvarez Mellado; Constantine Lignos

Conference proceeding

CoNLL#: Fine-grained Error Analysis and a Corrected Test Set for CoNLL-03 English

Andrew Rueda, Elena Alvarez Mellado and Constantine Lignos

PROCEEDINGS OF THE 2024 JOINT INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, LANGUAGE RESOURCES AND EVALUATION, LREC-COLING 2024, pp.3718-3728

International Conference on Computational Linguistics Language Resources and Evaluation

01/01/2024

Handle:

https://hdl.handle.net/10192/79015

Abstract

Computer Science

Computer Science, Artificial Intelligence

Computer Science, Interdisciplinary Applications

Language & Linguistics

Linguistics

Science & Technology

Social Sciences

Technology

Modern named entity recognition systems have steadily improved performance in the age of larger and more powerful neural models. However, over the past several years, the state-of-the-art has seemingly hit another plateau on the benchmark CoNLL-03 English dataset. In this paper, we perform a deep dive into the test outputs of the highest-performing NER models, conducting a fine-grained evaluation of their performance by introducing new document-level annotations on the test set. We go beyond F1 scores by categorizing errors in order to interpret the true state of the art for NER and guide future work. We review previous attempts at correcting the various flaws of the test set and introduce CoNLL#, a new corrected version of the test set that addresses its systematic and most prevalent errors, allowing for low-noise, interpretable error analysis.

Metrics

1 Record Views

Details

Title: CoNLL#: Fine-grained Error Analysis and a Corrected Test Set for CoNLL-03 English
Creators: Andrew Rueda - Brandeis University
Elena Alvarez Mellado - UNED, Sch Comp Sci, NLP & IR Grp, Madrid, Spain
Constantine Lignos - Brandeis Univ, Michtom Sch Comp Sci, Waltham, MA 02454 USA
Contributors: N Calzolari (Editor)
M Y Kan (Editor)
Hoste (Editor)
A Lenci (Editor)
S Sakti (Editor)
N Xue (Editor)
Publication Details: PROCEEDINGS OF THE 2024 JOINT INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, LANGUAGE RESOURCES AND EVALUATION, LREC-COLING 2024, pp.3718-3728
Series: International Conference on Computational Linguistics Language Resources and Evaluation
Publisher: Assoc Computational Linguistics-Acl
Number of pages: 11
Identifiers: 9924588646001921
Academic Unit: Michtom School of Computer Science; Benjamin and Mae Volen National Center for Complex Systems; Interdepartmental Program in Linguistics and Computational Linguistics
Language: English
Resource Type: Conference proceeding

CoNLL#: Fine-grained Error Analysis and a Corrected Test Set for CoNLL-03 English

Abstract

Metrics

Details

Brandeis University Social media