Back to Search
Start Over
Corpus CEFALA-1: Base de dados audiovisual de locutores para estudos de biometria, fonética e fonologia / Corpus CEFALA-1: Audiovisual Database of Speakers for Biometric, Phonetic and Phonology Studies
- Source :
- Revista de Estudos da Linguagem, Vol 27, Iss 1, Pp 191-212 (2019)
- Publication Year :
- 2019
- Publisher :
- Faculdade de Letras da UFMG, 2019.
-
Abstract
- Resumo : A fala humana tem sido estudada em diferentes areas do conhecimento, as quais incluem desde biometria ate fonetica e fonologia. Nas pesquisas realizadas em tais areas, amostras da fala sao recursos necessarios para a obtencao de resultados e validacao de hipoteses. Para isso, amostras de diferentes locutores e conteudos sao armazenadas em arquivos de audio e organizadas em bases de dados. Tais bases de dados permitem a continuidade, praticidade e confiabilidade de pesquisas, eliminando a dificil e demorada etapa de coleta de dados. Alem disso, permitem comparacoes consistentes entre estudos diferentes. Entretanto, bases de acesso livre na lingua portuguesa ou gravadas em ambiente controlado sao raramente encontradas. Dessa forma, o objetivo deste trabalho foi construir uma base de dados publica e gratuita do portugues brasileiro, nomeada Corpus CEFALA-1. A base de dados reune 104 locutores orientados por um protocolo especifico para coleta de amostras audiovisuais de fala gravadas em estudio. Este trabalho apresenta as metodologias de processamento, segmentacao e organizacao as quais as amostras de fala foram submetidas, alem de analises estatisticas, aplicacao a verificacao biometrica e analises fonetico-fonologicas preliminares do corpus. Palavras-chave : corpus de locutores; biometria; fonetica e fonologia; base de dados audiovisual. Abstract : Human speech has been studied in different areas of knowledge, which range from biometry to phonetics and phonology. In research conducted in such areas, speech samples are necessary resources for obtaining results and validating hypotheses. For this, samples of different speakers and contents are stored in audio files and organized into databases. Such databases allow the continuity, practicality and reliability of studies, eliminating the difficult and time consuming step of data collection. Moreover, they allow consistent comparisons between different studies. However, free access databases in the Portuguese language or recorded in controlled environments are rarely found. The objective of this paper is to construct a free and public database of Brazilian Portuguese, named Corpus CEFALA-1. The database comprises 104 speakers guided by a specific protocol for the collection of audiovisual speech samples recorded in a studio. The paper presents the methodologies for processing, segmentation and organization of speech samples, statistical analysis, application to biometric verification and preliminary phonetic-phonological analyses. Keywords : corpus of speakers; biometry; phonetics and phonology; audiovisual database.
- Subjects :
- lcsh:Language and Literature
0209 industrial biotechnology
Linguistics and Language
Biometrics
corpus of speakers
biometry
02 engineering and technology
computer.software_genre
Lingua franca
Language and Linguistics
Education
020901 industrial engineering & automation
fonética e fonologia
lcsh:P1-1091
Brazilian Portuguese
phonetics and phonology
biometria
0202 electrical engineering, electronic engineering, information engineering
corpus de locutores
Statistical analysis
computer.programming_language
Database
Free access
Phonology
audiovisual database
language.human_language
lcsh:Philology. Linguistics
base de dados audiovisual
language
lcsh:P
020201 artificial intelligence & image processing
Audiovisual speech
Psychology
computer
Subjects
Details
- ISSN :
- 22372083 and 01040588
- Volume :
- 27
- Database :
- OpenAIRE
- Journal :
- REVISTA DE ESTUDOS DA LINGUAGEM
- Accession number :
- edsair.doi.dedup.....eaf9851cb5a4cc4ad9fc79e62489e2b6