Back to Search Start Over

OLMoE: Open Mixture-of-Experts Language Models

Authors :
Muennighoff, Niklas
Soldaini, Luca
Groeneveld, Dirk
Lo, Kyle
Morrison, Jacob
Min, Sewon
Shi, Weijia
Walsh, Pete
Tafjord, Oyvind
Lambert, Nathan
Gu, Yuling
Arora, Shane
Bhagia, Akshita
Schwenk, Dustin
Wadden, David
Wettig, Alexander
Hui, Binyuan
Dettmers, Tim
Kiela, Douwe
Farhadi, Ali
Smith, Noah A.
Koh, Pang Wei
Singh, Amanpreet
Hajishirzi, Hannaneh
Publication Year :
2024

Abstract

We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input token. We pretrain it on 5 trillion tokens and further adapt it to create OLMoE-1B-7B-Instruct. Our models outperform all available models with similar active parameters, even surpassing larger ones like Llama2-13B-Chat and DeepSeekMoE-16B. We present various experiments on MoE training, analyze routing in our model showing high specialization, and open-source all aspects of our work: model weights, training data, code, and logs.<br />Comment: 61 pages (24 main), 36 figures, 14 tables

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2409.02060
Document Type :
Working Paper