Back to Search Start Over

IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models

Authors :
Adelani, David Ifeoluwa
Ojo, Jessica
Azime, Israel Abebe
Zhuang, Jian Yun
Alabi, Jesujoba O.
He, Xuanli
Ochieng, Millicent
Hooker, Sara
Bukula, Andiswa
Lee, En-Shiun Annie
Chukwuneke, Chiamaka
Buzaaba, Happy
Sibanda, Blessing
Kalipe, Godson
Mukiibi, Jonathan
Kabongo, Salomon
Yuehgoh, Foutse
Setaka, Mmasibidi
Ndolela, Lolwethu
Odu, Nkiruka
Mabuya, Rooweither
Muhammad, Shamsuddeen Hassan
Osei, Salomey
Samb, Sokhar
Guge, Tadesse Kebede
Stenetorp, Pontus
Publication Year :
2024

Abstract

Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languages. Additionally, many low-resource languages (e.g. African languages) are often evaluated only on basic text classification tasks due to the lack of appropriate or comprehensive benchmarks outside of high-resource languages. In this paper, we introduce IrokoBench -- a human-translated benchmark dataset for 16 typologically-diverse low-resource African languages covering three tasks: natural language inference~(AfriXNLI), mathematical reasoning~(AfriMGSM), and multi-choice knowledge-based QA~(AfriMMLU). We use IrokoBench to evaluate zero-shot, few-shot, and translate-test settings~(where test sets are translated into English) across 10 open and four proprietary LLMs. Our evaluation reveals a significant performance gap between high-resource languages~(such as English and French) and low-resource African languages. We observe a significant performance gap between open and proprietary models, with the highest performing open model, Aya-101 only at 58\% of the best-performing proprietary model GPT-4o performance. Machine translating the test set to English before evaluation helped to close the gap for larger models that are English-centric, like LLaMa 3 70B. These findings suggest that more efforts are needed to develop and adapt LLMs for African languages.<br />Comment: Under review

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2406.03368
Document Type :
Working Paper