MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition

David Ifeoluwa Adelani; Graham Neubig; Sebastian Ruder; Shruti Rijhwani; Michael Beukman; Chester Palen-Michel; Constantine Lignos; Jesujoba O Alabi; Shamsuddeen H Muhammad; Peter Nabende; Cheikh M. Bamba Dione; Andiswa Bukula; Rooweither Mabuya; Bonaventure F. P Dossou; Blessing Sibanda; Happy Buzaaba; Jonathan Mukiibi; Godson Kalipe; Derguene Mbaye; Amelia Taylor; Fatoumata Kabore; Chris Chinenye Emezue; Anuoluwapo Aremu; Perez Ogayo; Catherine Gitau; Edwin Munkoh-Buabeng; Victoire M Koagne; Allahsera Auguste Tapo; Tebogo Macucwa; Vukosi Marivate; Elvis Mboning; Tajuddeen Gwadabe; Tosin Adewumi; Orevaoghene Ahia; Joyce Nakatumba-Nabende; Neo L Mokono; Ignatius Ezeani; Chiamaka Chukwuneke; Mofetoluwa Adeyemi; Gilles Q Hacheme; Idris Abdulmumin; Odunayo Ogundepo; Oreen Yousuf; Tatiana Moteu Ngoli; Dietrich Klakow

Preprint

MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition

David Ifeoluwa Adelani, Graham Neubig, Sebastian Ruder, Shruti Rijhwani, Michael Beukman, Chester Palen-Michel, Constantine Lignos, Jesujoba O Alabi, Shamsuddeen H Muhammad, Peter Nabende, …

arXiv

10/22/2022

Abstract

Computer Science - Computation and Language

African languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity recognition (NER). We create the largest human-annotated NER dataset for 20 African languages, and we study the behavior of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points across 20 languages compared to using English. Our results highlight the need for benchmark datasets and models that cover typologically-diverse African languages.

Metrics

25 Record Views

Details

Title: MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition
Creators: David Ifeoluwa Adelani
Graham Neubig
Sebastian Ruder
Shruti Rijhwani
Michael Beukman
Chester Palen-Michel
Constantine Lignos
Jesujoba O Alabi
Shamsuddeen H Muhammad
Peter Nabende
Cheikh M. Bamba Dione
Andiswa Bukula
Rooweither Mabuya
Bonaventure F. P Dossou
Blessing Sibanda
Happy Buzaaba
Jonathan Mukiibi
Godson Kalipe
Derguene Mbaye
Amelia Taylor
Fatoumata Kabore
Chris Chinenye Emezue
Anuoluwapo Aremu
Perez Ogayo
Catherine Gitau
Edwin Munkoh-Buabeng
Victoire M Koagne
Allahsera Auguste Tapo
Tebogo Macucwa
Vukosi Marivate
Elvis Mboning
Tajuddeen Gwadabe
Tosin Adewumi
Orevaoghene Ahia
Joyce Nakatumba-Nabende
Neo L Mokono
Ignatius Ezeani
Chiamaka Chukwuneke
Mofetoluwa Adeyemi
Gilles Q Hacheme
Idris Abdulmumin
Odunayo Ogundepo
Oreen Yousuf
Tatiana Moteu Ngoli
Dietrich Klakow
Publisher: arXiv
Identifiers: 9924170086101921
Academic Unit: Michtom School of Computer Science; Benjamin and Mae Volen National Center for Complex Systems
Language: English
Resource Type: Preprint

MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition

Abstract

Metrics

Details

Brandeis University Social media