1. jina-embeddings-v3: Multilingual Embeddings With Task LoRA
- Author
-
Sturua, Saba, Mohr, Isabelle, Akram, Mohammad Kalim, Günther, Michael, Wang, Bo, Krimmel, Markus, Wang, Feng, Mastrapas, Georgios, Koukounas, Andreas, Wang, Nan, and Xiao, Han
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Information Retrieval ,68T50 ,I.2.7 - Abstract
We introduce jina-embeddings-v3, a novel text embedding model with 570 million parameters, achieves state-of-the-art performance on multilingual data and long-context retrieval tasks, supporting context lengths of up to 8192 tokens. The model includes a set of task-specific Low-Rank Adaptation (LoRA) adapters to generate high-quality embeddings for query-document retrieval, clustering, classification, and text matching. Evaluation on the MTEB benchmark shows that jina-embeddings-v3 outperforms the latest proprietary embeddings from OpenAI and Cohere on English tasks, while achieving superior performance compared to multilingual-e5-large-instruct across all multilingual tasks. With a default output dimension of 1024, users can flexibly reduce the embedding dimensions to as low as 32 without compromising performance, enabled by Matryoshka Representation Learning., Comment: 20 pages, pp11-13 references, pp14-20 appendix and experiment tables
- Published
- 2024