Start Over

Mūsdienu latgaliešu valodas runas korpusa izveide mazāk lietoto valodu dokumentēšanas kontekstā.

Authors :: Juško-Štekele, Angelika
Kļavinska, Antra
Source :: Letonica; 2022, Issue 47, p226-242, 17p
Publication Year :: 2022
Abstract: According to data of UNESCO, in 2013, Latgalian language with 150,000 users was recognised as one of the world’s endangered and vulnerable languages, as all generations still use the oral form, but the sustainability of the language is seriously jeopardised, since the number of young language users decreases. Pursuant to the EU directives and recommendations for preservation, research and development of regional and endangered languages, as well as the Guidelines for the State Language Policy 2021–2027 regarding development, disclosure on the web and accessibility of varied text corpus, in 2020, a group of researchers of the Rēzekne Academy of Technologies in the Project of State Research Programme Digital Resources of Humanities: Integration and Development (No. VPP-IZM-DH-2020/1-0001) started its work on the development of the Contemporary Latgalian Speech Corpus (MuLaR) aimed at the documentation, research, studies and acquisition of Latgalian. The aim of the article is to identify and analyse the issues that are important in the process of creating MuLaR, applying the referential analysis of the scientific literature and comparative methodology. In turn, applying the analytical-synthetic method and based on the experience accumulated by the corpus creators, there was developed an initial model for the corpus architectonics and technological solutions, covering such issues as ensuring a representative Latgalian speech corpus, bearing in mind the territorial distribution of Latgalian language communities and diversity of Latgalian patois; the most appropriate methods to document natural, spontaneous language: collection of new data, opportunities to use the existing recordings (interviews, TV, radio broadcasts, field research data collections), other databases (reiti.rta.lv); understanding metadata; ethical aspects of the speech corpus; transcribing (software, conventions to reveal the features of spoken text as accurately as possible); creation of an accessible, easy-to-use open-access platform, using the experience of creating oral speech corpuses for lesser-used languages / dialects in other countries. The article declares the main challenges for the corpus development after the initial validation of the corpus data, including in relation to the morphological tagging possibilities of the corpus. [ABSTRACT FROM AUTHOR]

Subjects :: WEB accessibility
ENDANGERED languages
LANGUAGE policy
REGIONAL development
SPEECH
LANGUAGE acquisition
RADIO programs
SELF-disclosure

Details

Language :: Latvian
ISSN :: 14073110
Issue :: 47
Database :: Complementary Index
Journal :: Letonica
Publication Type :: Academic Journal
Accession number :: 160243494
Full Text :: https://doi.org/10.35539/LTNC.2022.0047.A.J.S.A.K.226.243

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Mūsdienu latgaliešu valodas runas korpusa izveide mazāk lietoto valodu dokumentēšanas kontekstā.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Mūsdienu latgaliešu valodas runas korpusa izveide mazāk lietoto valodu dokumentēšanas kontekstā.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources