Author: "Mohamed, Shafie Abdi" / Language: undetermined - Searchworks@Jio Institute Digital Library Search Results

1. MasakhaNEWS: News Topic Classification for African languages

Author: Adelani, David Ifeoluwa, Masiak, Marek, Azime, Israel Abebe, Alabi, Jesujoba Oluwadara, Tonja, Atnafu Lambebo, Mwase, Christine, Ogundepo, Odunayo, Dossou, Bonaventure F. P., Oladipo, Akintunde, Nixdorf, Doreen, Emezue, Chris Chinenye, al-azzawi, Sana Sabah, Sibanda, Blessing K., David, Davis, Ndolela, Lolwethu, Mukiibi, Jonathan, Ajayi, Tunde Oluwaseyi, Ngoli, Tatiana Moteu, Odhiambo, Brian, Owodunni, Abraham Toluwase, Obiefuna, Nnaemeka C., Muhammad, Shamsuddeen Hassan, Abdullahi, Saheed Salahudeen, Yigezu, Mesay Gemeda, Gwadabe, Tajuddeen, Abdulmumin, Idris, Bame, Mahlet Taye, Awoyomi, Oluwabusayo Olufunke, Shode, Iyanuoluwa, Adelani, Tolulope Anu, Kailani, Habiba Abdulganiy, Omotayo, Abdul-Hakeem, Adeeko, Adetola, Abeeb, Afolabi, Aremu, Anuoluwapo, Samuel, Olanrewaju, Siro, Clemencia, Kimotho, Wangari, Ogbu, Onyekachi Raphael, Mbonu, Chinedu E., Chukwuneke, Chiamaka I., Fanijo, Samuel, Ojo, Jessica, Awosan, Oyinkansola F., Guge, Tadesse Kebede, Sari, Sakayo Toadoum, Nyatsine, Pamela, Sidume, Freedmore, Yousuf, Oreen, Oduwole, Mardiyyah, Kimanuka, Ussen, Tshinu, Kanda Patrick, Diko, Thina, Nxakama, Siyanda, Johar, Abdulmejid Tuni, Gebre, Sinodos, Mohamed, Muhidin, Mohamed, Shafie Abdi, Hassan, Fuad Mire, Mehamed, Moges Ahmed, Ngabire, Evrard, and Stenetorp, Pontus
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa. We provide an evaluation of baseline models by training classical machine learning models and fine-tuning several language models. Furthermore, we explore several alternatives to full fine-tuning of language models that are better suited for zero-shot and few-shot learning such as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern exploiting training (PET), prompting language models (like ChatGPT), and prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API). Our evaluation in zero-shot setting shows the potential of prompting ChatGPT for news topic classification in low-resource African languages, achieving an average performance of 70 F1 points without leveraging additional supervision like MAD-X. In few-shot setting, we show that with as little as 10 examples per label, we achieved more than 90\% (i.e. 86.0 F1 points) of the performance of full supervised training (92.6 F1 points) leveraging the PET approach., Comment: Accepted to AfricaNLP Workshop @ICLR 2023 (non-archival)
Published: 2023
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

1 results on '"Mohamed, Shafie Abdi"'

1. MasakhaNEWS: News Topic Classification for African languages

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Database

1 results on '"Mohamed, Shafie Abdi"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources