1,747 results on '"6121 Languages"'
Search Results
2. Demonstrating and checking understanding – Bodily-visual resources in action formation and ascription
- Author
-
Anna-Kaisa Jokipohja, Tampere University, and Language Studies
- Subjects
Linguistics and Language ,Artificial Intelligence ,6121 Languages ,Language and Linguistics - Abstract
This article uses multimodal conversation analysis to investigate bodily-visual resources in action formation and ascription. The focus is on next actions that bring understanding to the interactional surface either through demonstrating or checking understanding. The data come from instructional cooking and gardening activities organized for adult immigrants. The analysis shows how bodily-visual resources systematically couple with confident actions of demonstrating and tentative actions of checking understanding, and are oriented to as distinguishing between them. Checking understanding is characterized by immobility of bodily-visual actions, whereas demonstrating understanding is characterized by a continuous flow of bodily-visual actions. The analysis contributes to research on embodiment in action ascription, and more specifically the role of embodiment in showing understanding and in dealing with trouble in understanding. The article also introduces a novel way to use graphic transcripts to represent the temporal progression of bodily-visual actions and spoken turns. The data are in Finnish with English translations. publishedVersion
- Published
- 2023
3. Interactional functions of therapists’ reformulations in a group session involving French-speaking children with autism spectrum disorder
- Author
-
Wiklund, Mari, Määttä, Simo K., Department of Languages, Faculty Common Matters (Faculty of Arts), Helsinki Inequality Initiative (INEQ), and Translation Studies
- Subjects
reformulations ,Linguistics and Language ,vuorovaikutus ,conversation analysis ,French ,keskustelunanalyysi ,autism ,interaction ,autism spectrum disorder ,autismikirjo ,Language and Linguistics ,Speech and Hearing ,uudelleenmuotoilu ,autismi ,6121 Languages ,ranska - Abstract
Background: In this article, we analyze a group therapy session involving four 11- to 13-year-old French-speaking boys with autism spectrum disorder (ASD) and their two female therapists. We focus on speaking turns in which the therapists reformulate the contents of a preceding turn produced by a child. Method: Methodologically, the study is based mainly on conversation analysis. Results: The analyses show that the therapists clearly aim to achieve meaningful learning outcomes with regard to the topic of conversation, and the reformulations constitute an essential tool in this process. Most often, reformulations are used to provide a more compact and more effective formulation of the turn in terms of the topic of conversation. Sometimes, a reformulation is used to assist a speaker who is experiencing problems with the formulation of their utterances. The reformulations also often include signs of approval and constitute positive feedback for the children. In some contexts, for example, in the case of turns including sensitive content, reformulations can constitute a strategy of avoiding repetition.
- Published
- 2023
4. Helping a language learner gain self-confidence and awareness through advising
- Author
-
Hiney, Grainne and Language Centre
- Subjects
6121 Languages ,General Medicine - Abstract
Learners can develop both their autonomy and reflection skills by becoming involved in advising in language learning. Advising is a relatively new field in language teaching and learning, though the value of advising is increasingly recognised. Using narrative inquiry, this short reflective paper describes an online advising session conducted with a Finnish student in the University of Helsinki, Finland. The session focused on building self-confidence and awareness. Analysis of the session indicated that the student experienced immediate short-term benefit, with strong potential for future long-term benefit, both personally and regarding language learning.
- Published
- 2023
5. When and how to revise?
- Author
-
Annamari Korhonen, Tampere University, and Language Studies
- Subjects
Linguistics and Language ,Communication ,6121 Languages ,Language and Linguistics - Abstract
The translation production team that consists of a translator and a reviser can be investigated as a specific kind of (sub)system of socially distributed cognition, a cognitive dyad; this system is defined as only including the translation professionals who are directly involved in the drafting of the translation. Based on interviews with translation professionals, I argue that this fine-tuned cognitive dyad gets its form not only as a result of its participants’ characteristics, but also under the influence of other factors, some of which vary from one project to the next, leading to the flexible formation of the reviser’s task in particular. The three most important project-specific influencing factors are the text genre, the translator’s experience and competence, and the client’s needs and requirements. While genre and the client’s needs and requirements seem to have a markedly similar impact, mainly influencing the internal task configuration of the cognitive dyad, the translator’s experience and competence often leads to non-revision. Trust is an important element in this process.
- Published
- 2022
6. Topic metaphors in European languages
- Author
-
Granvik, Anton and Spanish Philology
- Subjects
6121 Languages - Abstract
This paper deals with the semantic notion “topic”, understood broadly as the notion of ‘aboutness’ as in “to talk about stars,” and describes its various forms of expression across ten European languages. The aim is to explore and characterise how topic is construed, that is, which underlying conceptualisations are involved in the metaphorical expressions used to refer to this notion. The description is based on ten parallel versions of A. St. Exupéry’s famous novel Le Petit Prince. The analysis highlights the most salient metaphors that are found in European languages and points at some differences in the topic markers used across languages. The most revealing conceptualisation of topic involves what I have called the revolving metaphor, in which the topic is seen as the centre around which our discussion or thoughts revolve, as in my thoughts revolved around her. The paper ends with a tentative discussion of to what degree the different topic metaphors can be considered dead or alive.
- Published
- 2022
7. A Surprise in the Past: The Historical Origins of the Catalan go-past
- Author
-
Silvio Cruschina, Anna Kocher, Department of Languages, and Italian Philology
- Subjects
Linguistics and Language ,Sicilian ,Old catalan ,Verb de moviment ,Language and Linguistics ,Català antic ,Past perfective ,Mirativitat ,Mirativity ,Sicilià ,Motion verb ,6121 Languages ,Grammaticalization ,Perfet de passat ,Gramaticalització - Abstract
Crosslinguistically, the development of the verb go into a future tense is a common path of grammaticalization. In contrast, the past meaning of the go-periphrasis in Catalan is unexpected. Detges (2004) claims that the process of grammaticalization of the Catalan periphrastic perfect went from inchoative to foregrounding to past. We compare data from the Corpus informatitzat del Català antic with modern Sicilian, where a similar go-periphrasis is used with a foregrounding function that resembles that of Old Catalan. This comparison confirms a foregrounding usage but fails to support the origin in an inchoative usage. We propose that the grammaticalization from movement to foregrounding does not require an intermediate inchoative stage, but that it rather results from a modal implicature of surprise and unexpectedness that was associated with the construction. Indeed, the function of go to foreground and express surprise or noteworthiness can be inferentially viewed as movement away from the speaker's expectations. Under this usage, Catalan go-periphrasis was employed to refer to 'surprising' events that took place in the past. Once this additional meaning was lost, the reference to the past was generalized beyond the implicature. El desenvolupament del verb de moviment go en un temps verbal de futur es un procés de gramaticalització comú des d'un punt de vista comparatiu. En canvi, es inesperat que en català la perifrasis amb el verb go+infinitiu adquireixi un sentit de temps passat. Segons Detges (2004) el procés de gramaticalització del passat perifràstic va començar amb un significat incoatiu, llavors va passar a ser usat per ressaltar o destacar esdeveniments i finalment va adquirir el significat de passat. Nosaltres comparem les dades del Corpus informatitzat del Català antic amb el sicilià modern, on s'usa una perifrasi del verb go amb una funció molt semblant a la del català antic com a eina per a ressaltar esdeveniments. La comparació així confirma l'ús de ressaltament però no aporta cap prova d'un origen en un ús incoatiu. Davant d'aquests fets, proposem que la gramaticalització de moviment a ressalt no requereix una etapa intermèdia d'incoatiu. Més aviat, creiem que aquesta gramaticalització resulta d'una implicatura modal de sorpresa que estava associada a la construcció. En efecte, la funció de go de ressaltar esdeveniments com a sorprenents o destacables es pot comprendre com a un moviment que s'allunya de les expectatives del parlant. En aquest ús, la perifrasi catalana va ser emprada per a presentar esdeveniments sorprenents del passat. Un cop perdut aquest significat addicional de sorpresa, la referencia al passat es va generalitzar més enllà d'aquesta implicatura.
- Published
- 2022
8. Temporal Labels and Specifications in Monolingual English Dictionaries
- Author
-
Norri, Juhani, Tampere University, and Language Studies
- Subjects
6121 Languages ,Language and Linguistics - Abstract
The article examines the temporal labels and other specifications of time affixed to twenty-five words in monolingual dictionaries of English. The selection of works studied includes learners’, collegiate, and general-purpose dictionaries, both British and American. In addition, the treatment of the lexemes in the Oxford English Dictionary is noted. The analysis reveals some clear differences between the different types of dictionaries in the overall propensity to furnish temporal labels and other specifications of time. The terminology employed to convey such information varies from one group of dictionaries to another. There is also plenty of variation between the individual volumes inside each group. The target audience of the works examined varies, which explains some of the differences in the treatment of particular lexemes. In general, Osselton’s calls for more consistent terminology in the labelling of old words, presented several decades ago, are still valid. The differences between the labels are not always clear, and the explanations in the front matter of the dictionary may be lacking or unhelpful.
- Published
- 2022
9. Possessive and Caritive in Nivkh
- Author
-
Gruzdeva, Ekaterina, Helsinki Institute of Sustainability Science (HELSUS), Faculty Common Matters (Faculty of Arts), Department of Languages, General Linguistics, and Helsinki Collegium for Advanced Studies
- Subjects
6121 Languages - Abstract
This paper focuses on the various encoding strategies and other relevant issues pertaining to affirmative and negative predicative possession in the two Amuric languages, Amur Nivkh and Sakhalin Nivkh (Nighvng). One of the goals is to demonstrate that possessive negation serves as a major strategy for expressing caritive semantics (= non-involvement). Constructions rendering possessive and caritive meanings represent one of the basic and rather frequent types of clauses in Nivkh, and they are also interconnected both semantically and grammatically with other clause types. Some of the constructions present in Nivkh have analogues in the neighboring languages, while others are specific to Nivkh. The present paper, based on the author’s field materials, represents the first specific study of this topic.
- Published
- 2022
10. Introduction : Excessive Language in Public Discourse
- Author
-
Buchart, Mélanie, Granvik, Anton, Lenk, Hartmut, Department of Languages, Faculty Common Matters (Faculty of Arts), Germanic Philology, French Language and Culture, and Spanish Philology
- Subjects
public discourse ,hate speech ,6121 Languages ,media communication - Abstract
Non
- Published
- 2023
11. Bootstrapping Multi-word Expressions : Reusing MWE-descriptions in machine translation applications of other language pairs
- Author
-
Hurskainen, Arvi and Department of Languages
- Subjects
6121 Languages ,113 Computer and information sciences - Abstract
Experience has shown that word-by-word translation between two languages is a utopia. Although languages are composed of words, the meanings that the words convey cannot be translated to another language by simply translating the individual words. Languages contain many types of word clusters that must be handled as special units, each cluster type requiring its special ways of treatment. Because the compilation of multi-word expressions (MWE) is tedious and time-consuming work, there is motivation to re-use the MWEs in other applications, including the translation between other language pairs. It is evident that only part of the MWEs in one application cand be directly converted to another application. The reason is that a large part of MWEs is dependent on the language pair, for which they have been identified. The translation between closely related languages requires less MWEs than the translation between very different languages. In this report I will investigate whether the MWEs in English to Finnish machine translation can be converted into MWEs in Finnish to English machine translation. It is obvious that only part of the cases are still MWEs after conversion. For example, if a MWE is a many-to-one type, meaning that a word cluster in source language maps to a single word in target language, after conversion it is not a MWE any more. Because the list of MWEs in the current translation system is in the form of Constraint Grammar rules, the conversion process is far from trivial. I will show the conversion process in detail.
- Published
- 2023
12. The Pipeline for Publishing Resources in the Language Bank of Finland
- Author
-
Dieckmann, Ute, Lennes, Mietta, Piitulainen, Jussi, Niemi, Jyrki, Axelson, Erik, Jauhiainen, Tommi, Linden, Krister, Erjavec, Tomaž, Eskevich, Maria, Department of Digital Humanities, Language Technology, and Centre of Excellence in Ancient Near Eastern Empires (ANEE)
- Subjects
6121 Languages ,113 Computer and information sciences - Published
- 2023
13. Unsupervised Feature Selection for Effective Parallel Corpus Filtering
- Author
-
Aulamo, Mikko, De Gibert Bonet, Ona, Virpioja, Sami, Tiedemann, Jörg, Nurminen, Mary, Brenner, Judith, Koponen, et al., Maarit, Department of Digital Humanities, Language Technology, and Mind and Matter
- Subjects
6121 Languages ,113 Computer and information sciences - Abstract
This work presents an unsupervised method of selecting filters and threshold values for the OpusFilter parallel corpus cleaning toolbox. The method clusters sentence pairs into noisy and clean categories and uses the features of the noisy cluster center as filtering parameters. Our approach utilizes feature importance analysis to disregard filters that do not differentiate between clean and noisy data. A randomly sampled subset of a given corpus is used for filter selection and ineffective filters are not run for the full corpus. We use a set of automatic evaluation metrics to assess the quality of translation models trained with data filtered by our method and data filtered with OpusFilter’s default parameters. The trained models cover English-German and English-Ukrainian in both directions. The proposed method outperforms the default parameters in all translation directions for almost all evaluation metrics. This work presents an unsupervised method of selecting filters and threshold values for the OpusFilter parallel corpus cleaning toolbox. The method clusters sentence pairs into noisy and clean categories and uses the features of the noisy cluster center as filtering parameters. Our approach utilizes feature importance analysis to disregard filters that do not differentiate between clean and noisy data. A randomly sampled subset of a given corpus is used for filter selection and ineffective filters are not run for the full corpus. We use a set of automatic evaluation metrics to assess the quality of translation models trained with data filtered by our method and data filtered with OpusFilter’s default parameters. The trained models cover English-German and English-Ukrainian in both directions. The proposed method outperforms the default parameters in all translation directions for almost all evaluation metrics.
- Published
- 2023
14. Sull’uso del passato remoto e prossimo nei rinvii anaforici dei testi scientifici, giuridici e tecnici
- Author
-
Ilpo Kempas and Avdelningen för språk
- Subjects
passato prossimo ,Varia ,italian kieli ,education ,juridinen teksti ,aikamuodot ,lingua italiana ,tekninen teksti ,grammatica ,kielioppi ,tempi verbali ,testo scientifico, giuridico e tecnico ,anaforiset viittaukset ,General Earth and Planetary Sciences ,6121 Languages ,tieteellinen teksti ,passato remoto ,rinvii anaforici ,General Environmental Science - Abstract
L’articolo esamina la scelta tra il passato remoto (PR) e il passato prossimo (PP) nei rinvii anaforici dei testi scientifici, giuridici e tecnici tra il 1500 e il 2000. L’italiano standard a base toscana si caratterizza per la grammaticalizzazione del PP come tempo aoristico e per la sua espansione nel campo semantico occupato prima dal PR. L’uso di questi due tempi nei rinvii anaforici è stato indagato empiricamente, raccogliendo tutti gli esempi di PR (N=146; più tardi ancora 24, cioè 170 in totale) usati in otto predeterminate frasi fatte (con dire e vedere) disponibili tramite Google nel momento di osservazione e un numero identico di esempi di PP, per scoprire le possibili differenze tra i due tempi in rapporto agli anni di pubblicazione dei documenti. Inoltre, sono stati osservati i numeri totali degli esempi di PP per confrontarli con quelli di PR. I risultati rivelano che i cambiamenti avvenuti nella lingua in genere si riflettono anche nell’uso di entrambi i tempi, benché con ritardo. Il periodo ≤1800, soprattutto proprio l’Ottocento, si profila come “l’età dell’oro” del PR, mentre il Novecento conosce un’espansione sostanziale del PP. In totale, il PP risulta predominante nei rinvii anaforici studiati (N=857, 85,4%).
- Published
- 2022
15. Scripted or spontaneous? Two approaches to audio describing visual art in museums
- Author
-
Betta Sofia Saari, Maija I Hirvonen, Tampere University, and Language Studies
- Subjects
Linguistics and Language ,6121 Languages - Abstract
We report on a comparative analysis of two approaches to live audio-describing (AD) visual art in museums: the first case is a tour with scripted AD (the guide reads written descriptions out loud), and the second case is spontaneous AD (AD is intertwined with the guide’s talk). As previous studies have mostly analyzed pre-recorded AD, our aim was to describe how AD occurs in and as direct interaction between a museum guide and visitors, and how interaction affects the art experience of the (blind and partially sighted) visitors. Data were collected from two authentic settings in which groups of blind, partially sighted, and sighted people visited art museums on guided tours. The data consist of video recordings of the tours and retrospective interviews with visitors. The analysis revealed how the interactive constitution of the tour and the AD format enables or disables the visitors’ participation in experiencing visual art. Most importantly, we show how AD-enriched interaction between the guide and visitors facilitates joint meaning-making about vision and art, in which visually disabled visitors actively participate with multifaceted communicative practices and resources. Our study contributes to the research on (live) AD, demonstrating the role of interaction in the process. publishedVersion
- Published
- 2022
16. Political Engagement of the Russian Speakers in Finland
- Author
-
Ekaterina Protassova, Department of Languages, Faculty Common Matters (Faculty of Arts), and Russian language and Literature (Foreign Language)
- Subjects
6121 Languages ,General Medicine - Abstract
Russian-speaking immigrants in Finland, like many other immigrants in the world, are reluctant to express their opinions on politics. They do not consider themselves competent enough to have the right to make a judgment in a situation in which they have not taken part and which they cannot view completely on their own. Gradually, immigrants who were born in various countries are becoming increasingly aware of their place in their new society, but they still feel they cannot fully trust their leaders. This article examines the attitude of Russian speakers to the Finnish elections and the ongoing war in Ukraine as presented in media and social media, interviews, and essays. It is not easy to compare whether they are less involved than the young Finns, or it is a generational thing. The conclusion points out the difficulties in adapting to a different political system than in the country of origin and illustrates the spectrum of opinions among the immigrants of the first and second generation who live in Finland and use Russian among other languages in their everyday life. Russian-language media continue to have a significant influence on Russian speakers, even though secondgeneration representatives rely less on these sources of information.
- Published
- 2022
17. Mediation in FL learning
- Author
-
Kaisa Koskinen, Tuija Kinnunen, Department of Languages, Tampere University, and Language Studies
- Subjects
6121 Languages - Abstract
In this conceptual paper we look at the concept of mediation in foreign language learning from a translation studies perspective. Through an analysis of the most important European language teaching policy document, namely the Common European Framework of Reference for Languages (CEFR), we will study the conceptualizations of mediation and translation in the CEFR and identify elements that are important with respect to understanding translatoriality and its role in the framework. We argue that a narrow concept of translation goes against CEFR’s explicit aims of mediation. We therefore propose that the concept of translatoriality might be used instead to help teachers and learners orient to a wide variety of translatorial mediation practices while still also benefitting from well-established and widely studied strategies of professional translation and interpreting. Further collaboration between translation and interpreting trainers and foreign language teachers will be needed, as well as fieldwork research on best classroom practices, and a solid and shared conceptual basis will enhance the possibilities of combining the accumulating findings collected through fieldwork.
- Published
- 2022
18. Prosodic features in Finnish-speaking adults with Parkinson´s disease
- Author
-
Nelly Penttilä, Lauri Tavi, Marianne Hyppönen, Katariina Rontu, Leena Rantala, Stefan Werner, Tampere University, and Welfare Sciences
- Subjects
Speech and Hearing ,Linguistics and Language ,515 Psychology ,6121 Languages ,Language and Linguistics - Abstract
The aim of this study was to assess prosodic features in Finnish speakers with (n=16) and without (n=20) Parkinson’s disease (PD), as there are no published studies to date of prosodic features in Finnish speakers with PD. Chosen metrics were articulation rate (syllables/second), pitch (mean F0) and pitch variability (standard deviation F0), energy proportion below 1kHz (epb1kHz), normalized pairwise variability index (nPVI), and a novel syllabic prosody index (SPI). Four statistically significant results were found: 1) energy was distributed more to lower frequencies in speakers with PD compared to control speakers, 2) male PD speakers had higher pitch and 3) higher syllabic prosody index compared to control males, and 4) female PD speakers had narrower pitch variability than controls. In this study, PD was manifested as less emphatic and breathier voice. Interestingly, male PD speakers’ dysprosody was manifested as an effortful speaking style, whereas female PD speakers exhibited dysprosody with a monotonous speaking style. A novel syllable-based prosody index was found to be a potential tool in analyzing prosody in disordered speech. publishedVersion
- Published
- 2022
19. Kinbank : A global database of kinship terminology
- Author
-
Sam Passmore, Wolfgang Barth, Simon J. Greenhill, Kyla Quinn, Catherine Sheard, Paraskevi Argyriou, Joshua Birchall, Claire Bowern, Jasmine Calladine, Angarika Deb, Anouk Diederen, Niklas P. Metsäranta, Luis Henrique Araujo, Rhiannon Schembri, Jo Hickey-Hall, Terhi Honkola, Alice Mitchell, Lucy Poole, Péter M. Rácz, Sean G. Roberts, Robert M. Ross, Ewan Thomas-Colquhoun, Nicholas Evans, Fiona M. Jordan, Department of Finnish, Finno-Ugrian and Scandinavian Studies, Kinura, and Redhead, D.
- Subjects
Multidisciplinary ,6121 Languages ,113 Computer and information sciences - Abstract
Publisher Copyright: Copyright: © 2023 Passmore et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. For a single species, human kinship organization is both remarkably diverse and strikingly organized. Kinship terminology is the structured vocabulary used to classify, refer to, and address relatives and family. Diversity in kinship terminology has been analyzed by anthropologists for over 150 years, although recurrent patterning across cultures remains incompletely explained. Despite the wealth of kinship data in the anthropological record, comparative studies of kinship terminology are hindered by data accessibility. Here we present Kinbank, a new database of 210,903 kinterms from a global sample of 1,229 spoken languages. Using open-access and transparent data provenance, Kinbank offers an extensible resource for kinship terminology, enabling researchers to explore the rich diversity of human family organization and to test longstanding hypotheses about the origins and drivers of recurrent patterns. We illustrate our contribution with two examples. We demonstrate strong gender bias in the phonological structure of parent terms across 1,022 languages, and we show that there is no evidence for a coevolutionary relationship between cross-cousin marriage and bifurcate-merging terminology in Bantu languages. Analysing kinship data is notoriously challenging; Kinbank aims to eliminate data accessibility issues from that challenge and provide a platform to build an interdisciplinary understanding of kinship.
- Published
- 2023
20. Dialect Representation Learning with Neural Dialect-to-Standard Normalization
- Author
-
Kuparinen, Olli Vilhelm, Scherrer, Yves, Scherrer, Yves, Jauhiainen, Tommi, Ljubešić, Nikola, Nakov, Preslav, Tiedemann, Jörg, Zampieri, Marcos, Language Technology, Department of Digital Humanities, and Faculty Common Matters (Faculty of Arts)
- Subjects
6121 Languages ,113 Computer and information sciences - Abstract
Language label tokens are often used in multilingual neural language modeling and sequence-to-sequence learning to enhance the performance of such models. An additional product of the technique is that the models learn representations of the language tokens, which in turn reflect the relationships between the languages. In this paper, we study the learned representations of dialects produced by neural dialect-to-standard normalization models. We use two large datasets of typologically different languages, namely Finnish and Norwegian, and evaluate the learned representations against traditional dialect divisions of both languages. We find that the inferred dialect embeddings correlate well with the traditional dialects. The methodology could be further used in noisier settings to find new insights into language variation.
- Published
- 2023
21. Murreviikko - A Dialectologically Annotated and Normalized Dataset of Finnish Tweets
- Author
-
Kuparinen, Olli Vilhelm, Scherrer, Yves, Jauhiainen, Tommi, Ljubešić, Nikola, Nakov, Preslav, Tiedemann, Jörg, Zampieri, Marcos, Language Technology, and Department of Digital Humanities
- Subjects
6121 Languages ,113 Computer and information sciences - Abstract
This paper presents Murreviikko, a dataset of dialectal Finnish tweets which have been dialectologically annotated and manually normalized to a standard form. The dataset can be used as a test set for dialect identification and dialect-to-standard normalization, for instance. We evaluate the dataset on the normalization task, comparing an existing normalization model built on a spoken dialect corpus and three newly trained models with different architectures. We find that there are significant differences in normalization difficulty between the dialects, and that a character-level statistical machine translation model performs best on the Murreviikko tweet dataset.
- Published
- 2023
22. Lemmatization Experiments on Two Low-Resourced Languages : Low Saxon and Occitan
- Author
-
Miletic Haddad, Aleksandra, Siewert, Janine, Scherrer, Yves, Jauhiainen, Tommi, Ljubešić, Nikola, Nakov, Preslav, Tiedemann, Jörg, Zampieri, Marcos, Language Technology, Department of Digital Humanities, and Digital Humanities
- Subjects
6121 Languages ,113 Computer and information sciences - Abstract
We present lemmatization experiments on the unstandardized low-resourced languages Low Saxon and Occitan using two machine-learning-based approaches represented by MaChAmp and Stanza. We show different ways to increase training data by leveraging historical corpora, small amounts of gold data and dictionary information, and discuss the usefulness of this additional data. In the results, we find some differences in the performance of the models depending on the language. This variation is likely to be partly due to differences in the corpora we used, such as the amount of internal variation. However, we also observe common tendencies, for instance that sequential models trained only on gold-annotated data often yield the best overall performance and generalize better to unknown tokens.
- Published
- 2023
23. Automatic text simplification of Russian texts using control tokens
- Author
-
Dmitrieva, Anna, Piskorski, Jakub, Marcińczuk, Michał, Nakov, et al., Preslav, Department of Digital Humanities, and Language Technology
- Subjects
6121 Languages ,113 Computer and information sciences - Abstract
This paper describes the research on the possibilities to control automatic text simplification with special tokens that allow modifying the length, paraphrasing degree, syntactic complexity, and the CEFR (Common European Framework of Reference) grade level of the output texts, i.e. the level of language proficiency a non-native speaker would need to understand them. The project is focused on Russian texts and aims to continue and broaden the existing research on controlled Russian text simplification. It is done by exploring available datasets for monolingual Russian machine translation (paraphrasing and simplification), experimenting with various model architectures, and adding control tokens that have not been used on Russian texts previously.
- Published
- 2023
24. TV series as disseminators of emerging vocabulary : Non-codified expressions in the TV Corpus
- Author
-
Daniela Landert, Tanja Säily, Mika Hämäläinen, Department of Languages, and Language Technology
- Subjects
word formation ,corpus linguistics ,lexicography ,fiction ,6121 Languages ,non-codified lexis - Abstract
This study presents a method for identifying words that appear in corpus data earlier than their first date of attestation in dictionaries. We demonstrate the application of this method based on a large diachronic corpus, the TV Corpus, and the Oxford English Dictionary (OED). Combining automatic extraction of candidate terms from the TV Corpus with comprehensive manual analysis and verification, the method identifies 32 words that were used in TV series before their first attestation in the OED. We present a detailed discussion of these words, analysing their distribution across decades and genres of the TV Corpus, their origins, semantic domains and word-formation processes. We also present extracts with their first uses in the TV Corpus and analyse how the words were presented to the large and anonymous mass audience. Our study shows that the method we present is suitable for identifying early attestations of words in large corpora, even though in the case of the TV Corpus, a great deal of manual analysis and verification is needed. In addition, we argue that TV series and other types of fictional texts are an important resource for studying the coinage and spread of terms, due to their function and the fact that they address a mass audience.
- Published
- 2023
25. Detection and attribution of quotes in Finnish news media: BERT vs. rule-based approach
- Author
-
Janicki, Maciej, Kanner, Antti, Mäkelä, Eetu, Alumäe, Tanel, Fishel, Mark, Department of Digital Humanities, Digital Humanities, Helsinki Computational History Group, Human Sciences – Computing Interaction, Department of Finnish, Finno-Ugrian and Scandinavian Studies, and Language Technology
- Subjects
NoDaLiDa 2023 ,6121 Languages ,113 Computer and information sciences - Abstract
We approach the problem of recognition and attribution of quotes in Finnish news media. Solving this task would create possibilities for large-scale analysis of media wrt. the presence and styles of presentation of different voices and opinions. We describe the annotation of a corpus of media texts, numbering around 1500 articles, with quote attribution and coreference information. Further, we compare two methods for automatic quote recognition: a rule-based one operating on dependency trees and a machine learning one built on top of the BERT language model. We conclude that BERT provides more promising results even with little training data, achieving 95% F-score on direct quote recognition and 84% for indirect quotes. Finally, we discuss open problems and further associated tasks, especially the necessity of resolving speaker mentions to entity references.
- Published
- 2023
26. Slav-NER : the 4th Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic languages
- Author
-
Yangarber, Roman, Piskorski, Jakub, Dmitrieva, Anna, Marcińczuk, Michał, Přibáň, Pavel, Rybak, Piotr, Steinberger, Josef, Piskorski, Jakub, Marcińczuk, Michał, Nakov, et al., Preslav, Department of Digital Humanities, and Language Technology
- Subjects
6121 Languages ,113 Computer and information sciences - Abstract
This paper describes Slav-NER: the 4th Multilingual Named Entity Challenge in Slavic languages. The tasks involve recognizing mentions of named entities in Web documents, normalization of the names, and cross-lingual linking. This version of the Challenge covers three languages and five entity types. It is organized as part of the 9th Slavic Natural Language Processing Workshop, co-located with the EACL 2023 Conference.Seven teams registered and three participated actively in the competition. Performance for the named entity recognition and normalization tasks reached 90% F1 measure, much higher than reported in the first edition of the Challenge, but similar to the results reported in the latest edition. Performance for the entity linking task for individual language reached the range of 72-80% F1 measure. Detailed evaluation information is available on the Shared Task web page. Non
- Published
- 2023
27. Findings of the VarDial Evaluation Campaign 2023
- Author
-
Aepli, Noëmi, Cöltekin, Çağrı, Van Der Goot, Rob, Jauhiainen, Tommi, Kazzaz, Mourhaf, Ljube\vić, Nikola, North, Kai, Plank, Barbara, Scherrer, Yves, Zampieri, Marcos, Scherrer , Yves, Jauhiainen , Tommi, Ljubešić, Nikola, Nakov , Preslav, Tiedemann, Jörg, Zampieri, Marcos, Department of Digital Humanities, Language Technology, Centre of Excellence in Ancient Near Eastern Empires (ANEE), Faculty Common Matters (Faculty of Arts), and University of Zurich
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,10105 Institute of Computational Linguistics ,410 Linguistics ,6121 Languages ,000 Computer science, knowledge & systems ,113 Computer and information sciences ,Computation and Language (cs.CL) - Abstract
This report presents the results of the shared tasks organized as part of the VarDial Evaluation Campaign 2023. The campaign is part of the tenth workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with EACL 2023. Three separate shared tasks were included this year: Slot and intent detection for low-resource language varieties (SID4LR), Discriminating Between Similar Languages -- True Labels (DSL-TL), and Discriminating Between Similar Languages -- Speech (DSL-S). All three tasks were organized for the first time this year.
- Published
- 2023
28. LexSIC : A quick vocabulary test for Sicilian
- Author
-
Kupisch, Tanja, Arona, Sebastiano, Besler, Alexandra, Cruschina, Silvio, Ferin, Maria, Gyllstad, Henrik, Venagli, Ilaria, Department of Languages, and Italian Philology
- Subjects
6121 Languages - Published
- 2023
29. Assigning Finnish words with inflection codes
- Author
-
Hurskainen, Arvi and Department of Languages
- Subjects
6121 Languages ,113 Computer and information sciences - Abstract
Rule-based description of the language such as Finnish requires a careful encoding of all inflecting words. This includes verbs, nouns, adjectives, pronouns, and numerals. The Institute for the Languages of Finland (KOTUS) has worked for several years for classifying Finnish words into individual inflection classes. The classes are marked with combinations of letters and numbers. The classification of nominals starts from 1 and continues until 51. Many of these classes have also sub-classes, and they are marked with an additional code, using capital letters, such as A, B, C etc. The classification of verbs starts from 52 and continues until 73. In addition, KOTUS marks word classes of the words. Here I mark nominals with numbers preceded by the letter N, and the verbs with the letter V. Also verbs have sub-classes, marked with capital letters. The KOTUS list does not mark the inflection classes in compound words. The user should find the inflection code from the last word of the compound word. The wordlist has currently more than 100,000 words, and the number is increasing over the years. It is not exhaustive, but it covers a substantial number of words occurring in speech and writing. More than half of the words are compound words, and they are without inflection codes. The list is defective in that it does not include information on the front/back inflection patterns. The vowels of the word stem define whether the word should have a front or back vowel inflection. Particularly problematic are compound words, because the last member of the compound defines the inflection pattern, and it is not easy to define computationally where the boundary is. In this report I describe how the word list can be enhanced, so that it also has information on front/back inflection. Because back vowel inflection is more common, I have defined it as default and left without marking. Only front vowel inflection is marked. I also show how front/back inflection codes can be added to such compound words that are not in the KOTUS list. Rule-based description of the language such as Finnish requires a careful encoding of all inflecting words. This includes verbs, nouns, adjectives, pronouns, and numerals. The Institute for the Languages of Finland (KOTUS) has worked for several years for classifying Finnish words into individual inflection classes. The classes are marked with combinations of letters and numbers. The classification of nominals starts from 1 and continues until 51. Many of these classes have also sub-classes, and they are marked with an additional code, using capital letters, such as A, B, C etc. The classification of verbs starts from 52 and continues until 73. In addition, KOTUS marks word classes of the words. Here I mark nominals with numbers preceded by the letter N, and the verbs with the letter V. Also verbs have sub-classes, marked with capital letters. The KOTUS list does not mark the inflection classes in compound words. The user should find the inflection code from the last word of the compound word. The wordlist has currently more than 104,000 words, and the number is increasing over the years. It is not exhaustive, but it covers a substantial number of words occurring in speech and writing. More than half of the words are compound words, and they are without inflection codes. The list is defective in that it does not include information on the front/back inflection patterns. The vowels of the word stem define whether the word should have a front or back vowel inflection. Particularly problematic are compound words, because the last member of the compound defines the inflection pattern, and it is not easy to define computationally where the boundary is. In this report I describe how the word list can be enhanced, so that it also has information on front/back inflection. Because back vowel inflection is more common, I have defined it as default and left without marking. Only front vowel inflection is marked. I also show how front/back inflection codes can be added to such compound words that are not in the KOTUS list.
- Published
- 2023
30. Spelling correctness as a witness of changing documentary culture in Tuscia (eighth–ninth centuries)
- Author
-
Timo Korkiakangas, Department of Languages, and Latin and Literature of Rome
- Subjects
digital linguistics ,History ,Latin ,diplomatics ,Arts and Humanities (miscellaneous) ,spelling variation ,Early Middle Ages ,Geography, Planning and Development ,documentary culture ,6121 Languages ,Early Medieval ,615 History and Archaeology - Abstract
This paper discusses the evolution of documentary culture in early medieval Tuscia by quantitatively examining the Latin spelling of charter scribes in relation to the following factors: time, the distinction between the formulaic and non-formulaic parts of the document, the scribe’s domicile, the scribe’s professional status, and the document type. The paper asks what the spelling of charters tells us about administrative and socio-cultural changes in charter production and in scribal education. The research data is 997 charters from the Late Latin Charter Treebank, and the approach that of philological corpus linguistics.
- Published
- 2023
31. Early Subacute White Matter Hyperintensities and Recovery of Language After Stroke
- Author
-
Veronika Vadinova, Aleksi J. Sihvonen, Kimberley L. Garden, Laura Ziraldo, Tracy Roxbury, Kate O’Brien, David A. Copland, Katie L. McMahon, Sonia L. E. Brownsett, Centre of Excellence in Music, Mind, Body and Brain, Department of Psychology and Logopedics, Music, Ageing and Rehabilitation Team, Cognitive Brain Research Unit, and Brain, Music and Learning
- Subjects
language ,3112 Neurosciences ,leukoaraiosis ,6121 Languages ,General Medicine ,cerebral small vessel diseases ,white matter hyperintensities ,stroke ,aphasia - Abstract
Background White matter hyperintensities (WMH) are considered to contribute to diminished brain reserve, negatively impacting on stroke recovery. While WMH identified in the chronic phase after stroke have been associated with post-stroke aphasia, the contribution of premorbid WMH to the early recovery of language across production and comprehension has not been investigated. Objective To investigate the relationship between premorbid WMH severity and longitudinal comprehension and production outcomes in aphasia, after controlling for stroke lesion variables. Methods Longitudinal behavioral data from individuals with a left-hemisphere stroke were included at the early subacute (n = 37) and chronic (n = 28) stage. Spoken language comprehension and production abilities were assessed at both timepoints using word and sentence-level tasks. Magnetic resonance imaging (MRI) was performed at the early subacute stage to derive stroke lesion variables (volume and proportion damage to critical regions) and WMH severity rating. Results The presence of severe WMH explained an additional 18% and 25% variance in early subacute ( t = −3.00, p = .004) and chronic ( t = −3.60, P = .001) language comprehension abilities respectively, after controlling for stroke lesion variables. WMH did not predict additional variance of language production scores. Conclusions Subacute clinical MRI can be used to improve prognoses of recovery of aphasia after stroke. We demonstrate that severe early subacute WMH add to the prediction of impaired longitudinal language recovery in comprehension, but not production. This emphasizes the need to consider different domains of language when investigating novel neurobiological predictors of aphasia recovery.
- Published
- 2023
32. Rule-based language technology for African languages
- Author
-
Hurskainen, Arvi, Hurskainen, Arvi, Koskenniemi, Kimmo, Pirinen, Tommi, and Department of Languages
- Subjects
6121 Languages ,113 Computer and information sciences - Abstract
Africa is such a language area, where rule-based language technology could have a strong influence on the status of local languages. As statistical and neural approaches require large masses of text for training the language model, rule-based methods can be applied also to languages with no traditional language resources. The development of language technology systems for minor languages would not only provide useful tools for language users. It would also contribute to the elevated status of those languages and thus help in maintaining those languages to be alive. The chapter looks at the current situation in Africa particularly from the viewpoint of rule-based language technology.
- Published
- 2023
33. SALAMA
- Author
-
Hurskainen, Arvi, Hurskainen, Arvi, Koskenniemi, Kimmo, Pirinen, Tommi, and Department of Languages
- Subjects
6121 Languages ,113 Computer and information sciences - Abstract
This chapter describes the language processing environment called Salama. It is a purely rule-based approach with the exception that it also allows for out-of- vocabulary guessing. Salama makes use of such development environments as TWOL and CG2, and its aim is principally to test the possibilities and limits of purely rule-based language technology for constructing various user applications, including machine translation between linguistically different and structurally complex languages. The basic components were developed using finite-state methods in morphological analysis and constraint grammar in disambiguation and syntactic mapping. Constraint grammar was used also in other tasks, where context- sensitive disambiguation was needed. Also, such languages as Perl and Beta were extensively used. Several user applications were developed, and they are described here briefly.
- Published
- 2023
34. Multi-word expressions in SALAMA
- Author
-
Hurskainen, Arvi, Hurskainen, Arvi, Koskenniemi, Kimmo, Nieminen, Tommi, and Department of Languages
- Subjects
6121 Languages ,113 Computer and information sciences - Abstract
Multi-word expressions (MWE) are clusters of words that cannot be disambiguated, syntactically analysed, or translated into another language on the word-by-word basis. There are several types of MWEs, starting from fixed clusters to inflecting clusters and to non-concatenative clusters. It also depends on the type of the target language which clusters should be treated as MWEs. Closely related languages with similar syntax require less MWEs than morphologically and syntactically very different languages. Because of the complexity in handling MWEs, it is necessary that MWEs are isolated in such a development environment, which allows for writing constraint rules for rule application. In the Salama system, I have used Constraint Grammar for isolating MWEs. It would be possible to describe fixed clusters also in the morphological lexicon. Yet it is easier to handle all MWEs in the same isolation environment. Some need constraining while others do not need.
- Published
- 2023
35. Finite-state description, developing mental awareness
- Author
-
Rueter, Jack and Department of Digital Humanities
- Subjects
Võro language ,Komi-Zyrian language ,regular morphology ,6121 Languages ,finite-state morphology ,Lushootseed language ,Skolt Saami language ,Moksha language ,Erzya language - Abstract
In this article, we approach finite-state description practices that must be instilled in the developer. Thoughts are presented accompanied by reference to concrete experiences with different languages and their description. We contend that finite-state description of languages leads to development in the describer-developer. This presupposes regular interaction with developers of upstream and downstream technologies. And as more languages are described, the developer learns what to choose as a starting point, hopefully with the help of a researcher, research documentation or native speaker well versed in the workings of the language. We maintain that finite-state work should serve more than one purpose or audience, and that, as linguists, we should be raising the bar by applying the knowledge of research to description, so that our understanding of the linguistic phenomena can be attested by others or proven false. We are providing a methodology for repeatable experimentation and rule making. We see that each language provides something unique, while sharing some recognizable features with other languages. We stress the necessity to avoid generating characters from epsilons and offer examples where it is possible to write rules that reduce characters to epsilons instead. We also stress the need to describe the predictable infinite set of all native phenomena, whereas the unknown and random qualities introduced through language contact cannot form a foundation for our descriptions. Finally, we call for a playful approach to phenomena in a language, because that might bring us closer to how a child would learn the language – through repetition, mistakes and self-correction.
- Published
- 2023
36. Päikkinommâtotkee hundârušmeh : Kost láá Viestâr-Aanaar anarâš päikkinoomah?
- Author
-
Valtonen, Taarna Aura Inari, Past Present Sustainability (PAES), and Forskningsprogrammet för ekosystem och miljö
- Subjects
6121 Languages - Published
- 2023
37. HFST Training Environment and Recent Additions
- Author
-
Axelson, Erik, Hardwick, Sam, Linden, Krister, and Department of Digital Humanities
- Subjects
6121 Languages ,113 Computer and information sciences - Abstract
HFST - the Helsinki Finite-State Technology toolkit was launched in 2009 (Lindén & al, 2009) and has since been used for developing a number of rule-based morphologies for processing natural language. To promote the uptake of the toolkit a training environment for linguists to learn how to use HFST has been designed in Jupyter. This paper presents an overview of the training environment and some of the recent features that have been added to HFST to keep the run-time size of the transducer reasonably small despite exceptions and negative constraints that need to be added during practical FST development.
- Published
- 2023
38. Is epileptiform activity related to developmental language disorder? Findings from the HelSLI study
- Author
-
Hanna-Reetta Lajunen, Marja Laasonen, Pekka Lahti-Nuuttila, Miika Leminen, Sini Smolander, Sari Kunnari, Eva Arkkila, Leena Lauronen, Kliinisen neurofysiologian yksikkö, HUS Helsinki and Uusimaa Hospital District, Faculty Common Matters (Faculty of Education), HUS Head and Neck Center, Korva-, nenä- ja kurkkutautien klinikka, Helsinki University Hospital Area, Department of Psychology and Logopedics, Department of Neurosciences, HUS Medical Imaging Center, Clinicum, and BioMag Laboratory
- Subjects
Neurology ,Pregnancy ,Physiology (medical) ,Smoking ,3112 Neurosciences ,Interictal epileptiform discharge (IED) ,6121 Languages ,Spike ,Neurology (clinical) ,EEG ,Developmental language disorder (DLD) - Abstract
Publisher Copyright: © 2023 International Federation of Clinical Neurophysiology Objective: To study if interictal epileptiform discharges (IEDs) are associated with language performance or pre-/perinatal factors in children with developmental language disorder (DLD). Methods: We recorded routine EEG in wake and sleep in 205 children aged 2.9–7.1 years with DLD, without neurologic diseases or intellectual disability. We examined the language performance of the children and collected data on pre-/perinatal factors. Results: Interictal epileptiform discharges were not associated with lower language performance. Children with so-called “rolandic”, i.e. centrotemporoparietal, IEDs had better language skills, but age explained this association. Most pre-/perinatal factors evaluated did not increase the risk of rolandic IEDs, except for maternal smoking (OR 4.4, 95% CI 1.4–14). We did not find electrical status epilepticus during slow-wave sleep (ESES)/spike-and-wave activation in sleep (SWAS) in any children. Conclusions: Interictal epileptiform discharges are not associated with lower language performance, and ESES/SWAS is not common in children with DLD. Significance: Routine EEGs do not bring additional information about language performance in children with DLD who do not have any neurologic diseases, seizures, intellectual disability, or regression of language development.
- Published
- 2023
39. I am a plurilingual speaker, but can I teach plurilingual speakers? : Contradictions in student teacher discourses on plurilingualism in Spain, Slovenia, and Finland
- Author
-
Llompart, Júlia, Dražnik, Tjaša, Bergroth, Mari, Björklund, Siv, Björklund, Mikaela, Department of Education, and Diversity, multilingualism and social justice in education
- Subjects
516 Educational sciences ,6121 Languages - Abstract
This study investigates student teacher discourses on plurilingualism in four European ITE institutions, located in in Spain, Slovenia, and Finland. As a part of the European project called Linguistically sensitive teaching in all classrooms, we collected student group reflections using reflection instruments based on SWOT-analysis. Data from 173 student teachers enrolled in initial teacher education at four universities located in Barcelona, Ljubljana, Vaasa and Jyväskylä are explored using qualitative analysis. By analyzing the strengths, weaknesses, opportunities, and threats expressed by student teachers we could identify certain contradictions regarding plurilingualism and the use of plurilingual pedagogies. These contradictions relate to the positioning as “being a plurilingual speaker” and “becoming a teacher dealing with plurilingualism” We discuss the similarities and differences between student voices in the light of the wider linguistic landscapes in the three countries and the four universities. This chapter investigates student teacher discourses on plurilingualism in four European initial teacher education (ITE) institutions located in Spain (Catalonia), Slovenia and Finland. As part of the European project called Linguistically Sensitive Teaching in All Classrooms, we collected student group thoughts using reflection instruments based on strengths, weaknesses, opportunities and threats (SWOT) analysis. Data from 173 student teachers enrolled in ITE at four universities located in Barcelona, Ljubljana, Vaasa and Jyväskylä were explored using qualitative analysis. By analysing the SWOT characteristics expressed by student teachers, we identified certain contradictions regarding plurilingualism and the use of plurilingual pedagogies. These contradictions relate to the positioning as ‘being a plurilingual speaker’ and ‘becoming a teacher dealing with plurilingualism’. We discuss the similarities and differences between student voices in the light of the wider linguistic landscapes in the three countries and four universities.
- Published
- 2023
40. Supporting Multilingual Learning in Educational Contexts
- Author
-
Otwinowska, Agnieszka, Bergroth, Mari, Zyzik, Eve, Björklund, Siv, Björklund, Mikaela, Department of Education, and Diversity, multilingualism and social justice in education
- Subjects
516 Educational sciences ,6121 Languages - Abstract
The chapter addresses how multilingual learning can be supported in educational contexts. We argue that all children need support for their languages and opportunities to become familiar with linguistic diversity. We briefly define multilingualism and highlight selected linguistic and cognitive features of multilingual children. Then we zoom in on educational solutions in three very different contexts, two in the EU (Poland and Finland), one in the US (California). With the provided contextual background we discuss some of the challenges that learners might experience at school depending on how support for multilingual learning is implemented in a given context. Finally, we argue that supporting multilingual learning can be enhanced in everyday practices and discuss solutions for supporting multilingual learning from the perspective of teachers and teacher training. The chapter addresses how multilingual learning can be supported in educational contexts. We argue that all children need support for their languages and opportunities to become familiar with linguistic diversity. We briefly define multilingualism and highlight selected linguistic and cognitive features of multilingual children. Then we zoom in on educational solutions in three very different contexts, two in the EU (Poland and Finland) and one in the USA (California). With the provided contextual background we discuss some of the challenges that learners might experi- ence at school, depending on how support for multilingual learning is implemented in a given context. Finally, we argue that supporting multilingual learning can be enhanced in everyday practices and discuss solutions for supporting multilingual learning from the perspective of teachers and teacher training.
- Published
- 2023
41. Celebrating 30 Years of the Nordic Journal of African Studies
- Author
-
Hurskainen, Arvi, Lodhi, Abdulaziz Y., Fanego Palat, Axel, Crane, Thera Marie, Katto, Jonna, Department of Languages, African and Middle Eastern languages, Helsinki Institute of Sustainability Science (HELSUS), and Faculty of Arts
- Subjects
5141 Sociology ,6121 Languages - Abstract
With its first issue appearing in the spring of 1992, the Nordic Journal of African Studies recently reached its 30-year publication milestone. In celebration of the journal's history, and looking forward to its future, we offer this brief collection of memories about the journal's history, written by the journal's founders and several of its editors-in-chief. Non
- Published
- 2023
42. Exploring language norms in podcasts distributed by a public service broadcaster in the Finland-Swedish mediascape
- Author
-
Levälahti, Minna, Henricson, Sofie, Huhtamäki, Martina, Lindström, Jan, De Ridder, Reglindis, Department of Finnish, Finno-Ugrian and Scandinavian Studies, Faculty Common Matters (Faculty of Arts), and Scandinavian languages
- Subjects
Pluricentric languages ,podcasts ,6121 Languages ,Finland Swedish ,Language norms - Abstract
Swedish has been described as a pluricentric language with a dominant variety in Sweden and a non-dominant variety in Finland, finlandssvenska [Finland Swedish], where it enjoys official status, but is only spoken by a relatively small minority of the population. Both national varieties of Swedish have their own national spoken standards, which are used, for instance, by the public service broadcaster Svenska Yle in Finland in its TV and radio news. Podcasts are a fairly recent addition to the mediascape. The questions we address in this chapter include whether podcasts, nowadays also part of public service broadcasting, are changing the conceptions of the Finland-Swedish spoken standard, and how podcasters position themselves in relation to standards in Sweden and Finland, but also to more regional and local varieties. Our analysis considers the opinions of both media producers and consumers collected through interviews and a web survey. The results suggest there is more openness towards linguistic diversity in audio media, while at the same time the spoken standard of the non-dominant variety, Finland Swedish, continues to be relevant and is generally highly esteemed.
- Published
- 2023
43. Tokenisation in rule-based machine translation
- Author
-
Hurskainen, Arvi and Department of Languages
- Subjects
6121 Languages ,113 Computer and information sciences - Abstract
Tokenisation is a process, where text is converted into such form, where each item is separated from the rest of the text. Words, for example, are such items, and they must be separated from punctuation marks and diacritics. The most convenient way to do this is to add an empty space on both sides of the item. Tokenisation applies also to diacritics and punctuation marks, and each of them must be separated using empty spaces. It is then easy to verticalize the text, so that the morphological analysis can be performed for each item. In rule-based language technology, we retain the words in their inflected forms. However, we do two operations for them. We rewrite contracted word-forms, used in English, into non-contracted forms. We also convert upper-case letters into lower case, placing an asterisk '*' in front of each converted letter, so that they can later be converted back to upper case.
- Published
- 2023
44. Code mixing in Tazanian Parliament discussions
- Author
-
Hurskainen, Arvi and Department of Languages
- Subjects
6121 Languages - Abstract
Code mixing is a common phenomenon in human communication. It is particularly prevalent in situations, where people with different mother tongues communicate and where they do not share one such language, where all of them are fluent. Code mixing occurs also in such situations, where participants share a common language, but where code mixing, and even code switching, is used for emphasizing a status or a particular reference group. In this report I make a statistical survey of code mixing in Tanzanian Parliament discussions from the years 2004-2006. The language used in discussions is Swahili, and the intervening language is English. Two types of conde mixing can be observed in Parliament discussions. One type is such where the whole word is English without inflecting prefixes or suffixes of Swahili. Another type are such words, where the stem is English, but the inflecting affixes are Swahili. In transcriptions if the parliamentary discussions, the latter types of words are usually marked with a dash between the stem and the inflection morphemes. However, there are several cases, where the dash is missing. This is probably due to inaccuracy in transcription, and I consider both types identical and treat them as one group.
- Published
- 2023
45. Idiosyncratic frequency as a measure of derivation vs. inflection
- Author
-
Copot, Maria, Mickus, Timothee, Bonami, Olivier, Department of Digital Humanities, and Language Technology
- Subjects
Linguistics and Language ,Modeling and Simulation ,6121 Languages ,Computer Science Applications - Abstract
There is ongoing discussion about how to conceptualize the nature of the distinction between inflection and derivation. A common approach relies on qualitative differences in the semantic relationship between inflectionally versus derivationally related words: inflection yields ways to discuss the same concept in different syntactic contexts, while derivation gives rise to words for related concepts. This differential can be expected to manifest in the predictability of word frequency between words that are related derivationally or inflectionally: predicting the token frequency of a word based on information about its base form or about related words should be easier when the two words are in an inflectional relationship, rather than a derivational one. We compare prediction error magnitude for statistical models of token frequency based on distributional and frequency information of inflectionally or derivationally related words in French. The results conform to expectations: it is easier to predict the frequency of a word from properties of an inflectionally related word than from those of a derivationally related word. Prediction error provides a quantitative, continuous method to explore differences between individual processes and differences yielded by employing different predicting information, which in turn can be used to draw conclusions about the nature and manifestation of the inflection–derivation distinction.
- Published
- 2023
46. Reflexively speaking : Metadiscourse in english as a lingua Franca
- Author
-
Anna Mauranen, Faculty of Arts, and English Philology
- Subjects
Metadiscourse ,English as a lingua Franca ,6121 Languages ,Discourse reflexivity - Abstract
Publisher Copyright: © 2023 the author(s), published by Walter de Gruyter GmbH, Berlin/Boston. All rights reserved. Reflexive language - the capacity of language to speak about itself - is unique to human languages; yet little is known of its use in actual dialogue. Fundamental features of language are manifest in dialogic speech and in lingua francas. Both are taken on board in this book, which radically widens our conception of reflexivity in discourse. Reflexivity, or metadiscourse, is central to successful communication. It is also vital in understanding academic argumentation, essential to academic self-understanding, and at the same time it has wide applications.
- Published
- 2023
47. On the verbs of ingestion and partitive function in Erzya
- Author
-
Rueter, Jack and Department of Digital Humanities
- Subjects
Uralic languages ,verbs of ingestion ,ablative case ,dual-object ,Universal Dependencies ,partitive function ,6121 Languages ,finite-state morphology ,Erzya language ,syntax ,ablative adjunct - Published
- 2023
48. Work engagement and its antecedents in remote work: A person-centered view
- Author
-
Anne Mäkikangas, Soile Juutinen, Jaana-Piia Mäkiniemi, Kirsi Sjöblom, Atte Oksanen, Tampere University, Unit of Social Research, Language Studies, and Welfare Sciences
- Subjects
515 Psychology ,5141 Sociology ,6121 Languages ,Applied Psychology - Abstract
publishedVersion
- Published
- 2022
49. The effect of a dyslexia-specific Cyrillic font, LexiaD, on reading speed: further exploration in adolescents with and without dyslexia
- Author
-
Svetlana Alexeeva, Vladislav Zubov, Alena Konina, and Department of Languages
- Subjects
PERCEPTION ,READERS ,Russian ,CHILDREN ,FREQUENCY ,eye tracker ,printed text ,TEXT ,dyslexia ,BENEFIT ,6121 Languages ,font ,EYE-MOVEMENTS ,Applied Psychology - Abstract
The current study aims to test the assumption that a specially designed Cyrillic font, LexiaD, can assist adolescents with persistent reading problems and facilitate their reading experience. LexiaD was compared with the widely used Arial font. Two groups of adolescents with dyslexia (N = 34) and without dyslexia (N = 28) silently read 144 sentences from the Russian Sentence Corpus (Laurinavichyute et al., 2019), some of which were presented in LexiaD, and others in Arial, while their eye movements were recorded. LexiaD did not show the desired effect for adolescents at the beginning of the experiment: Arial outperformed it in reading speed in both participant groups. By the end of the experiment, LexiaD outperformed Arial, and although the speed of the higher-level cognitive processing (e.g., lexical access) in both fonts did not differ significantly, the feature extraction was found to be better in LexiaD than in Arial. Thus, we found some positive effect of LexiaD when participants with and without dyslexia got accustomed to it. A follow-up study with an explicit exposure session is needed to confirm this conclusion.
- Published
- 2022
50. Storytelling beyond body text in popular science books: a multimodal analysis
- Author
-
Mikko T. Virtanen, Department of Finnish, Finno-Ugrian and Scandinavian Studies, and Finnish Language
- Subjects
Linguistics and Language ,6122 Literature studies ,Communication ,6121 Languages - Abstract
This article examines multimodal storytelling performed outside the main body text in book-length popular science. By employing a case study approach, the study examines three kinds of visually detached stories: i) fragmentary case stories of lay experience, ii) fictional stories of prehistorical lifeworld and iii) scientists’ personal stories with everyday photography. The article shows that detached stories are digressions from the locally, or also globally, dominant ways of making meaning. These digressions can be modality shifts (from verbal to multimodal) but also more complex shifts that engage with cultural boundaries of (popular) science. Furthermore, the article complements prior studies of photographic storytelling by discussing the affordances of everyday photography in public occasions. The analytic framework combines Systemic-Functional genre studies, Social Semiotic and related models of visual design as well as approaches to factuality and fictionality in Narrative Studies. The data consists of contemporary Finnish popular science books representing the natural sciences.
- Published
- 2022
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.