Back to Search Start Over

Open Sentence Embeddings for Portuguese with the Serafim PT* encoders family

Authors :
Gomes, Luís
Branco, António
Silva, João
Rodrigues, João
Santos, Rodrigo
Publication Year :
2024

Abstract

Sentence encoder encode the semantics of their input, enabling key downstream applications such as classification, clustering, or retrieval. In this paper, we present Serafim PT*, a family of open-source sentence encoders for Portuguese with various sizes, suited to different hardware/compute budgets. Each model exhibits state-of-the-art performance and is made openly available under a permissive license, allowing its use for both commercial and research purposes. Besides the sentence encoders, this paper contributes a systematic study and lessons learned concerning the selection criteria of learning objectives and parameters that support top-performing encoders.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2407.19527
Document Type :
Working Paper