Start Over

Automated categorization of pre-trained models for software engineering: A case study with a Hugging Face dataset

Authors :: Di Sipio, Claudio
Rubei, Riccardo
Di Rocco, Juri
Di Ruscio, Davide
Nguyen, Phuong T.
Publication Year :: 2024
Abstract: Software engineering (SE) activities have been revolutionized by the advent of pre-trained models (PTMs), defined as large machine learning (ML) models that can be fine-tuned to perform specific SE tasks. However, users with limited expertise may need help to select the appropriate model for their current task. To tackle the issue, the Hugging Face (HF) platform simplifies the use of PTMs by collecting, storing, and curating several models. Nevertheless, the platform currently lacks a comprehensive categorization of PTMs designed specifically for SE, i.e., the existing tags are more suited to generic ML categories. This paper introduces an approach to address this gap by enabling the automatic classification of PTMs for SE tasks. First, we utilize a public dump of HF to extract PTMs information, including model documentation and associated tags. Then, we employ a semi-automated method to identify SE tasks and their corresponding PTMs from existing literature. The approach involves creating an initial mapping between HF tags and specific SE tasks, using a similarity-based strategy to identify PTMs with relevant tags. The evaluation shows that model cards are informative enough to classify PTMs considering the pipeline tag. Moreover, we provide a mapping between SE tasks and stored PTMs by relying on model names.<br />Comment: Accepted at The International Conference on Evaluation and Assessment in Software Engineering (EASE), 2024 edition

Subjects :: Computer Science - Software Engineering

Details

Database :: arXiv
Publication Type :: Report
Accession number :: edsarx.2405.13185
Document Type :: Working Paper

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Automated categorization of pre-trained models for software engineering: A case study with a Hugging Face dataset

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Automated categorization of pre-trained models for software engineering: A case study with a Hugging Face dataset

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources