5,990,588 results on '"Languages"'
Search Results
2. Syllable Theory and Diachronic Phonology: Vocalism and Consonantism in Turkic Languages
- Author
-
Zeinep Bazarbayeva, Nazgul Ospangaziyeva, Akshay Zhalalova, Kulpash Koptleuova, and Ainur Karshigayeva
- Abstract
Languages that have complex syllable patterns also share linguistic features with each other. These features can be identified through diachronic paths developed by these syllable patterns this study aimed to show the universality of syllabemes in Kazakh and other languages, focusing on questions like evolution of syllables in the Turkic languages; whether a syllable can be called universal in Turkic languages, and whether CV-type syllable be called universal. The study used a qualitative research design to reconstruct linguistic forms in the Turkic languages. This approach is highly valuable for diachronic phonology, which studies existing models of phonological structures and retrospectively determine the proto-language model characteristic of modern languages. This method helps to restore the phonological system of a proto language, by bringing together synchronous slice of one language or different synchronous slices of several related languages. This method is comparative and typological; and focused on both ancient and modern languages including Bulgarian, Chuvsh, Yakut (ancient) and New Turkic languages like Azerbaijani, Gagauz, Uzbek, Turkmen, Kazakh and Tatar. The data revealed the dynamism of the Turkic languages, showing that they constantly changed, developed, and improved. A comparative analysis of closely related languages morpheme was also done to make an etymological reconstruction. The results suggest that highly complex syllable structure is a linguistic type distinct from but sharing some characteristics with other proposed holistic phonological types, including stress-timed and consonantal languages. The study contributes to understanding the syllable theory in diachronic development of syllable patterns and syllable structures.
- Published
- 2024
3. Lanuages for All: Reclaim Your Joy! 2024 Report of the Central States Conference on the Teaching of Foreign Languages
- Author
-
Central States Conference on the Teaching of Foreign Languages (CSCTFL), Cassandra Glynn, and Allison Spenader
- Abstract
The 2024 Central States Conference was held in Minneapolis, Minnesota on March 14-16. This year's theme, Reclaim Your Joy!, reflects the choice we make every day as educators to bring the joy of acquiring a language to our students. Even though the last several years have been fraught with challenges, we are finding ways to bring back the joy into our professional lives through collaborating with our colleagues at conferences as well as engaging with our students and families. Researcher and author, Brené Brown, notes that joy "comes to us in moments -- often ordinary moments." Educators were able to find joy in the ordinary moments of networking at the conference, attending the Friendly Luncheon, honoring the winners of awards, grants and scholarships, and participating in workshops and sessions. The 2024 keynote speech was given by Ryan Smith of Reno, Nevada. Thirteen workshops were offered this year, in addition to the Central States Leadership Academy and the Central States Extension Workshop. We were excited to bring back the Language Immersion Workshops, sponsored by Xperitas, where language educators were able to participate in full-day immersive excursions in the Minneapolis-St. Paul area. More than 170 sessions were offered focusing on joy, proficiency, social justice, classroom activities, teaching strategies, curriculum development, assessment, intercultural competence, advocacy, and technology. Presenters came from over 25 states across the country to share their expertise and knowledge. Additionally, attendees were able to find joy in the Health and Wellness sessions that were offered in order to find the balance between learning and self-care. The Central States Conference Report 2024, Reclaim Your Joy! is a call to language educators to take back what they love the most in educating students on the language, culture and heritage of the languages they teach.
- Published
- 2024
4. An Ethnolinguistic Analysis of Jewellery Names Common in Turkic Languages
- Author
-
Gulsara Kozhakhmetova, Saule Tazhibayeva, Gulgaysha Sagidolda, Lyazzat Beisenbayeva, and Nurgul Abeshova
- Abstract
The jewellery names and the ethnic identity of the Kazakh culture are lexically correlated as clearly evident from various ethnolinguistic analyses of jewellery vocabulary. This study aimed to analyze some common jewellery names as jüzük (ring), biläzük (bracelet), sïr?a (earring), moncak (necklace, beads), tügma (button), belba?, qur, qadis (belt) and jewellery for braids common in Turkic languages. This linguistic journey attempted to uncover the meaning of these jewellery names in different Turkic languages and identify their functions and distinctive features through a comparative method. A qualitative research design with an ethnocultural approach was used to understand the ethnogenetic and cultural aspects of these jewellery names from 26 Turkic languages. The content analysis method was used to categorize them according to their origin and cultural significance. The findings revealed that the Turkic jewellery was of different types, and known by several names in different ancient Turkic languages. It also had sacred power, brought wealth and fertility, possessed healing properties and protected people from evil spirits. This study would help to expand knowledge about the traditional culture of the Turkic peoples.
- Published
- 2024
5. Systematization of the Teaching of the Turkic Languages in Higher Education
- Author
-
Arailym Sarbalina, Zharkynbike Suleimenova, Kunipa Ashinova, Zhaidarkul Belassarova, Balkiya Kassym, and Aiman Koblanova
- Abstract
This study aims to analyze the factors influencing the typology of Turkic words to examine the specifics of the way students learn Turkic languages in higher education institutions. A hypotheticdeductive, survey, and comparative method was used for the study. Results showed that the learners have trouble constructing oral discourse and do not achieve coherence and cohesion in the field of communication they produce, as they do not integrate the parts into the whole under a structured system. Results also suggest that the systematization of Turkic words and expressions forms the trunk of the numerous Eastern Mediterranean and Asia languages, where most of them are very similar. It is concluded that categorizing words and expressions systematically by teachers through linguistic and pedagogical activities can contribute to one's learning and teaching of Turkic languages.
- Published
- 2024
6. WHAT ABOUT FOREIGN LANGUAGES.
- Author
-
Minnesota State Dept. of Education, St. Paul. and GARTNER, JUDITH
- Abstract
THE IMPORTANCE OF LEARNING A FOREIGN LANGUAGE IS STRESSED IN A BRIEF BROCHURE DESIGNED FOR STUDENTS OF ALL LEVELS, PARENTS, TEACHERS, COUNSELORS, AND ADMINISTRATORS. INFORMATION IS GIVEN ON WHEN TO BEGIN A LANGUAGE, THE IMPORTANCE OF BEING ABLE TO SPEAK A LANGUAGE, USES FOR A FOREIGN LANGUAGE AT HOME, LANGUAGE JOB OPPORTUNITIES, AND LANGUAGE LEARNING AND THE NATIONAL INTEREST. A BIBLIOGRAPHY SUGGESTS SOURCES FOR FURTHER INFORMATION. (AF)
- Published
- 2024
7. The Manifestation of Mythical Cognition in Toponyms: On the Material of the Turkic Languages
- Author
-
Sadirova Kulzat Kanievna, Zhazykova Raushan Balgalievna, Yessenova Kalbike Umirbaevna, Sapina Sabira Minataevna, Mirov Mukhtar Orynbasaruly, and Abdirova Sholpan Gaidarovna
- Abstract
In linguistics, onomastics is the science that studies the history and origin of toponyms, along with their structural aspects. This study aimed to determine the origin of toponyms by comparing their linguistic and ethnocultural, as well as mythical, information. A qualitative research design guided this study. A few toponyms were identified through random sampling method including Yrgyz (Irgiz), Burkanbulak, Esik (Yssyk), Auliesu, Zhaiyk (Ural) and Zher-Su, which were collected from etymological, explanatory and mythological dictionaries and collections of mythical texts. The etymological and component analysis methods were applied to study these names. The criteria to select these toponyms were that all should be names of rivers or hydronyms, since river-water was a mythological symbol (the source of life, death and disorder); and that they should occur not only in one language, but in several related languages. The findings revealed that the archetype of each word conformed to phonetic changes. There were also structural connections between these words. Besides, each word had symbolic connotations. This study would provide useful insights about ethnocultural and mythical information of these words and help in broader understanding of the cultural characteristics.
- Published
- 2024
8. 'Languages Are Not the Barriers': Learning Together through Multilingual Cross-Curricular Poetry in the ESL Classroom
- Author
-
Eyad Kalthoum
- Abstract
The evolving linguistic landscape in 21st century classrooms necessitates a re-evaluation of pedagogical approaches, exploring the potential of multilingual writing techniques within TESOL settings. This article draws on my self-study as a TESOL educator navigating contexts and shifting from an English-only approach in the classroom to an openness of language(s) approach (Ortega, 2019). Following Hamilton's (2018) case study approach, I investigate the feasibility of implementing a multilingual pedagogy in an international school in Toronto and explore its influence on students, teachers, and the learning process across the domains of (CMLA) (Prasad & Lory, 2020). For this paper, I focus on data that highlight and reflect the impact of multilingual pedagogy on students, teachers, and the teaching/learning process. I performed a qualitative thematic analysis and found that multilingual pedagogies benefited students on many levels. I conclude with a personal reflection on both the affordances and challenges of implementing multilingual pedagogies.
- Published
- 2024
9. Vademecum of Artificial Intelligence Tools Applied to the Teaching of Languages
- Author
-
Belén Mateos-Blanco, Eva Álvarez-Ramos, Leyre Alejaldre-Biel, and Milagrosa Parrado-Collantes
- Abstract
A qualitative documentary research of software and multimedia artificial intelligences was chosen to enable the ontological understanding of the object of study that aims to explore the potential for learning through Artificial Intelligence (AI). Besides, it is shown how AI, as a novel component within the digital education landscape, contributes to educational technological capital. In light of our research area, the focus is on determining how AI can be harnessed to enhance the development of communicative competence. This entails identifying and categorizing AI tools relevant to language didactics. The sample size for this study is 120 AI applications, sourcing and compiling data on AI tools from specialized websites. This sample serves as a paradigm justifying a mixed-method study capable of combining quantitative and qualitative data. The variables supporting this study consider four characteristics related to the specific typify of the digital tool in the pedagogical context. The first aspect pertains to the classification of generative AI tools with potential educational use. AI tools enriches and enhances the dimensions of learning underscoring the urgent need for literacy in this technology. The second variable was linked to the Pedagogical Competences of Teachers and the areas. The category in language education pertains to educators' ability to create, adapt, and employ digital resources that enhance language teaching and learning. The third and fourth identify the skills related to language learning. In the context of learning environments with AI tools, it is essential to contemplate the role of linguistic competence as subordinate to communicative competence.
- Published
- 2024
10. Mapping School-Level Language Policies across Multilingual Secondary Schools in England: An Ecology of English, Modern Languages and Community Languages Policies
- Author
-
Karen Forbes and Nicola Morea
- Abstract
Language plays a crucial role in education; yet, while issues of language are undoubtedly relevant to all teachers, school-level language policies, which aim to provide explicit guidance underpinned by a clear set of principles, are too often conspicuous by their absence. In a range of educational contexts around the world it has been found that where such policies do exist, they are frequently fragmented and underpinned by monolingual ideologies that do not reflect the linguistic diversity of schools today. The aim of this study, therefore, is to map the provision of school-level policies from a representative sample of secondary schools in England (n = 998) and explore the extent to which they address (either implicitly or explicitly) the following dimensions of language: (a) English, both as the language of instruction and in relation to support for English as an additional language (EAL) learners; (b) modern languages in the curriculum; and (c) other home or community languages. Drawing on an ecologically informed approach, where these three dimensions of language are conceptualised as systems, analysis was conducted to identify areas of divergence and (potential for) intersection. Findings suggest that policies relating to languages, where they exist, are largely compartmentalised and tensions emerged between the various systems. However, we also note several promising points of intersection which indicate that there is scope for developing cohesive and holistic languages policies at a whole-school level.
- Published
- 2024
- Full Text
- View/download PDF
11. On the Universality of the Subject Preference in the Acquisition of Relative Clauses across Languages
- Author
-
Nozomi Tanaka, Elaine Lau, and Alan L. F. Lee
- Abstract
Subject relative clauses (RCs) have been shown to be acquired earlier, comprehended more accurately, and produced more easily than object RCs by children. While this subject preference is often claimed to be a universal tendency, it has largely been investigated piecemeal and with low-powered experiments. To address these issues, this meta-analysis follows an established and rigorous scientific method to test the generalizability of the subject preference in RC acquisition by evaluating the collective evidence. While the results show a significant crosslinguistic subject preference, there is a large amount of heterogeneity in the data. The manifestation of this subject preference may not be uniform across languages, depending on typological properties such as language headedness, RC headedness, and main clause similarity. The true impact of these features, however, requires research on more typologically diverse languages.
- Published
- 2024
- Full Text
- View/download PDF
12. The Languages of China
- Author
-
RAMSEY, S. ROBERT and RAMSEY, S. ROBERT
- Published
- 2024
- Full Text
- View/download PDF
13. Individual Word and Phrase Frequency Effects in Collocational Processing: Evidence from Typologically Different Languages, English and Turkish
- Author
-
Dogus Öksüz, Vaclav Brezina, Padraic Monaghan, and Patrick Rebuschat
- Abstract
Collocations are understood to be integral building blocks of language processing, alongside individual words, but thus far evidence for the psychological reality of collocations has tended to be confined to English. In contrast to English, Turkish is an agglutinating language, utilizing productive morphology to convey complex meanings using a single word. Given this, we expected Turkish speakers to be less sensitive to phrasal frequencies than English speakers. In Study 1, we conducted a corpus analysis of translation-equivalent adjective-noun collocations (e.g., front door) and found differences between the two languages in frequency counts. In Study 2, we conducted a reaction time experiment to determine the sensitivity of native speakers of English and Turkish to the frequency of adjectives, nouns, and whole collocations. Turkish speakers were less sensitive to whole-phrase frequencies, as predicted, indicating that collocations are processed less holistically in Turkish than English. Both groups demonstrated that processing collocations involves combining information about individual words and phrases. Taken together, we show that speakers are sensitive to frequency information at multiple grain sizes that are attuned to the typology of different languages.
- Published
- 2024
- Full Text
- View/download PDF
14. Teacher Perspectives on the Introduction of Linguistics in the Languages Classroom: Evidence from a Co-Creation Project on French, German and Spanish
- Author
-
Michelle Sheehan, Anna D. Havinga, Jonathan R. Kasstan, Sascha Stollhans, Alice Corr, and Peter Gillman
- Abstract
Linguistics is conspicuously absent from language teaching in UK schools. A-level cultural topics cover a range of themes such as cyber-society, cultural heritage and multiculturalism, but the approach taken to these topics is not informed by linguistics. In previous work, we have argued that this is an unfortunate omission not only because linguistics is appealing to many language students and perceived by them to be useful, but also because the existing cultural topics could be significantly enriched by the inclusion of the critical/analytical study of language itself. In this paper, we provide concrete examples of how linguistics can be integrated into the existing A-level curriculum for Modern Foreign Languages (MFL) in England and Wales. Reporting on a project in which teachers trialled linguistics materials co-created by us (a group of academics) and experienced languages teachers, we present evidence that linguistics materials are perceived to be both highly novel and nonetheless compatible with the existing A-level curriculum. Data from questionnaires and semi-structured interviews with participating teachers also show that: (i) these new materials can be taught with little or no prior experience of linguistics; and (ii) adding linguistics materials to the curriculum leads to significant impacts on teacher and pupil attitudes towards language(s). Despite some challenges, which we also discuss, the results highlight again the great potential of linguistics as a component of language teaching and the contribution that it can make to the enrichment of the discipline.
- Published
- 2024
- Full Text
- View/download PDF
15. Different Languages, Different Mathematics Learning
- Author
-
Margarida César and Ricardo Machado
- Abstract
Culture shapes pupils' mathematical learning, their performances and life trajectories of participation (César, 2013a, 2013b). It also contributes to the senses they attribute to mathematical learning (Bakhtin, 1929/1981). Using collaborative work and interempowerment mechanisms facilitates knowledge appropriation (César, 2009). This is particularly important for pupils participating in minority cultures, socially undervalued and whose L1 is not the instruction language. Bi-univocal culture mediation (César, 2017b) is important regarding empowerment. We used an instrument to evaluate pupils' abilities and competencies (IACC), conceived by the "Interaction and Knowlwdge" (IK) team (Machado, 2014), and other mathematical tasks. The goal we address is to trace the differences between their approaches to problems, mathematical reasoning and solving strategies used by pupils whose L1 is ideographic (Creole, Cape Verde) or phonetic (Portuguese). We developed an intrinsic case study (Stake, 1995). The main participants are the pupils from almost 600 classes (all over Portugal and Cape Verde) who participated in the IK. The analysis of some examples illustrates that L1 shapes pupils' approaches to problems, mathematical reasoning and solving strategies. This evidence plays an important role in their access to school achievement and in teachers' understanding about how they can promote pupils' mathematical learning.
- Published
- 2024
16. Multi-Lingual Development & Programming Languages Interoperability: An Empirical Study
- Author
-
Cherny-Shahar, Tsvi and Yehudai, Amiram
- Subjects
Computer Science - Programming Languages - Abstract
As part of a research on a novel in-process multiprogramming-language interoperability system, this study investigates the interoperability and usage of multiple programming languages within a large dataset of GitHub projects and Stack Overflow Q\&A. It addresses existing multi-lingual development practices and interactions between programming languages, focusing on in-process multi-programming language interoperability. The research examines a dataset of 414,486 GitHub repositories, 22,156,001 Stack Overflow questions from 2008-2021 and 173 interoperability tools. The paper's contributions include a comprehensive dataset, large-scale analysis, and insights into the prevalence, dominant languages, interoperability tools, and related issues in multi-language programming. The paper presents the research results, shows that C is a central pillar in programming language interoperability, and outlines \emph{simple interoperability} guidelines. These findings and guidelines contribute to our multi-programming language interoperability system research, also laying the groundwork for other systems and tools by suggesting key features for future interoperability tools., Comment: 26 pages, includes supplement
- Published
- 2024
17. The Equivalence Problem of E-Pattern Languages with Regular Constraints is Undecidable
- Author
-
Nowotka, Dirk and Wiedenhöft, Max
- Subjects
Computer Science - Formal Languages and Automata Theory ,Computer Science - Computational Complexity ,Mathematics - Combinatorics ,68R15 ,F.4.3 - Abstract
Patterns are words with terminals and variables. The language of a pattern is the set of words obtained by uniformly substituting all variables with words that contain only terminals. Regular constraints restrict valid substitutions of variables by associating with each variable a regular language representable by, e.g., finite automata. Pattern languages with regular constraints contain only words in which each variable is substituted according to a set of regular constraints. We consider the membership, inclusion, and equivalence problems for erasing and non-erasing pattern languages with regular constraints. Our main result shows that the erasing equivalence problem, one of the most prominent open problems in the realm of patterns, becomes undecidable if regular constraints are allowed in addition to variable equality., Comment: 13 pages with references, 1 table, accepted and published at CIAA 2024. arXiv admin note: substantial text overlap with arXiv:2411.06904
- Published
- 2024
18. The Equivalence Problem of E-Pattern Languages with Length Constraints is Undecidable
- Author
-
Nowotka, Dirk and Wiedenhöft, Max
- Subjects
Computer Science - Formal Languages and Automata Theory ,Computer Science - Computational Complexity ,Mathematics - Combinatorics ,68R15 ,F.4.3 - Abstract
Patterns are words with terminals and variables. The language of a pattern is the set of words obtained by uniformly substituting all variables with words that contain only terminals. Length constraints restrict valid substitutions of variables by associating the variables of a pattern with a system (or disjunction of systems) of linear diophantine inequalities. Pattern languages with length constraints contain only words in which all variables are substituted to words with lengths that fulfill such a given set of length constraints. We consider membership, inclusion, and equivalence problems for erasing and non-erasing pattern languages with length constraints. Our main result shows that the erasing equivalence problem, one of the most prominent open problems in the realm of patterns-becomes undecidable if length constraints are allowed in addition to variable equality. Additionally, it is shown that the terminal-free inclusion problem-another prominent open problem in the realm of patterns-is also undecidable in this setting. It is also shown that considering regular constraints, i.e., associating variables also with regular languages as additional restrictions together with length constraints for valid substitutions, results in undecidability of the non-erasing equivalence problem. This sets a first upper bound on constraints to obtain undecidability in this case, as this problem is trivially decidable in the case of no constraints and as it has unknown decidability if only regular- or only length-constraints are considered., Comment: 32 pages including appendix, 2 tables, submitted to CPM 2025
- Published
- 2024
19. Topoi of automata I: Four topoi of automata and regular languages
- Author
-
Hora, Ryuya
- Subjects
Computer Science - Formal Languages and Automata Theory ,Mathematics - Category Theory ,Mathematics - Logic ,18F10, 68Q70, 20M35, 18B20 - Abstract
Both topos theory and automata theory are known for their multi-faceted nature and relationship with topology, algebra, logic, and category theory. This paper aims to clarify the topos-theoretic aspects of automata theory, particularly demonstrating through two main theorems how regular (and non-regular) languages arise in topos-theoretic calculation. First, it is shown that the four different notions of automata form four types of Grothendieck topoi, illustrating how the technical details of automata theory are described by topos theory. Second, we observe that the four characterizations of regular languages (DFA, Myhill-Nerode theorem, finite monoids, profinite words) provide Morita-equivalent definitions of a single Boolean-ringed topos, situating this within the context of Olivia Caramello's 'Toposes as Bridges.' This paper also serves as a preparation for follow-up papers, which deal with the relationship between hyperconnected geometric morphisms and algebraic/geometric aspects of formal language theory., Comment: 16 pages, comments welcome, v2: a reference is added
- Published
- 2024
20. Scheduling Languages: A Past, Present, and Future Taxonomy
- Author
-
Hall, Mary, Oancea, Cosmin, Elster, Anne C., Rasch, Ari, Joshi, Sameeran, Tavakkoli, Amir Mohammad, and Schulze, Richard
- Subjects
Computer Science - Programming Languages ,Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Performance - Abstract
Scheduling languages express to a compiler a sequence of optimizations to apply. Compilers that support a scheduling language interface allow exploration of compiler optimizations, i.e., exploratory compilers. While scheduling languages have become a common feature of tools for expert users, the proliferation of these languages without unifying common features may be confusing to users. Moreover, we recognize a need to organize the compiler developer community around common exploratory compiler infrastructure, and future advances to address, for example, data layout and data movement. To support a broader set of users may require raising the level of abstraction. This paper provides a taxonomy of scheduling languages, first discussing their origins in iterative compilation and autotuning, noting the common features and how they are used in existing frameworks, and then calling for changes to increase their utility and portability.
- Published
- 2024
21. On Quantum Programming Languages
- Author
-
Valiron, Benoît
- Subjects
Computer Science - Logic in Computer Science ,Computer Science - Programming Languages - Abstract
This thesis (Habilitation \`a diriger des recherches) presents some of my research contributions since my Ph.D defense in 2008. I have had the chance to participate in the development of quantum programming languages since their early developments: the presentation aims to present my point of view on the evolution of the subject, my contributions, and the current research trends in the community. The target audience is a graduate student interested in pointers to the field of quantum programming languages., Comment: 127 pages. French "Habilitation \`a diriger des recherche" (HDR), presented on September 24, 2024 at Universit\'e Paris Saclay. On Hal repository: tel-04740855
- Published
- 2024
22. Expressivity of Linear Temporal Logic for Pomset Languages of Higher Dimensional Automata
- Author
-
Clement, Emily, Erlich, Enzo, and Ledent, Jérémy
- Subjects
Computer Science - Formal Languages and Automata Theory - Abstract
Temporal logics are a powerful tool to specify properties of computational systems. For concurrent programs, Higher Dimensional Automata (HDA) are a very expressive model of non-interleaving concurrency. HDA recognize languages of partially ordered multisets, or pomsets. Recent work has shown that Monadic Second Order (MSO) logic is as expressive as HDA for pomset languages. In this paper, we investigate the class of pomset languages that are definable in First Order (FO) logic. As expected, this is a strict subclass of MSO-definable languages. In the case of words, Kamp's theorem states that FO is as expressive as Linear Temporal Logic (LTL). Our aim is to prove a variant of Kamp's theorem for pomset languages. Thus, we define a temporal logic called Sparse Pomset Temporal Logic (SPTL), and show that it is equivalent to FO, when there is no autoconcurrency., Comment: 18 pages + references + 2 pages appendix, submitted to FoSSaCS 2025
- Published
- 2024
23. Subsequence Matching and Analysis Problems for Formal Languages
- Author
-
Fazekas, Szilárd Zsolt, Koß, Tore, Manea, Florin, Mercaş, Robert, and Specht, Timo
- Subjects
Computer Science - Formal Languages and Automata Theory ,Computer Science - Data Structures and Algorithms ,68Q45 ,F.4.3 ,F.2.2 - Abstract
In this paper, we study a series of algorithmic problems related to the subsequences occurring in the strings of a given language, under the assumption that this language is succinctly represented by a grammar generating it, or an automaton accepting it. In particular, we focus on the following problems: Given a string $w$ and a language $L$, does there exist a word of $L$ which has $w$ as subsequence? Do all words of $L$ have $w$ as a subsequence? Given an integer $k$ alongside $L$, does there exist a word of $L$ which has all strings of length $k$, over the alphabet of $L$, as subsequences? Do all words of $L$ have all strings of length $k$ as subsequences? For the last two problems, efficient algorithms were already presented in [Adamson et al., ISAAC 2023] for the case when $L$ is a regular language, and efficient solutions can be easily obtained for the first two problems. We extend that work as follows: we give sufficient conditions on the class of input-languages, under which these problems are decidable; we provide efficient algorithms for all these problems in the case when the input language is context-free; we show that all problems are undecidable for context-sensitive languages. Finally, we provide a series of initial results related to a class of languages that strictly includes the regular languages and is strictly included in the class of context-sensitive languages, but is incomparable to the of class context-free languages; these results deviate significantly from those reported for language-classes from the Chomsky hierarchy., Comment: Abstract to be published in the proceedings of ISAAC 2024
- Published
- 2024
24. Leroy: Library Learning for Imperative Programming Languages
- Author
-
Bellur, Abhiram, Alghamdi, Razan, Workneh, Kidus, and Izraelevitz, Joseph
- Subjects
Computer Science - Programming Languages - Abstract
Library learning is the process of building a library of common functionalities from a given set of programs. Typically, this process is applied in the context of aiding program synthesis: concise functions can help the synthesizer produce modularized code that is smaller in size. Previous work has focused on functional Lisp-like languages, as their regularity makes them more amenable to extracting repetitive structures. Our work introduces Leroy, which extends existing library learning techniques to imperative higher-level programming languages, with the goal of facilitating reusability and ease of maintenance. Leroy wraps the existing Stitch framework for library learning and converts imperative programs into a Lisp-like format using the AST. Our solution uses Stitch to do a top-down, corpus-guided extraction of repetitive expressions. Further, we prune abstractions that cannot be implemented in the programming language and convert the best abstractions back to the original language. We implement our technique in a tool for a subset of the Python programming language and evaluate it on a large corpus of programs. Leroy achieves a compression ratio of 1.04x of the original code base, with a slight expansion when the library is included. Additionally, we show that our technique prunes invalid abstractions., Comment: Presented at the 5th Intl. Wkshp. on Human Aspects of Types and Reasoning Assistants (HATRA). Pasadena, CA, USA. 2024
- Published
- 2024
25. It's Not Easy Being Green: On the Energy Efficiency of Programming Languages
- Author
-
van Kempen, Nicolas, Kwon, Hyuk-Je, Nguyen, Dung Tuan, and Berger, Emery D.
- Subjects
Computer Science - Programming Languages ,Computer Science - Performance - Abstract
Does the choice of programming language affect energy consumption? Previous highly visible studies have established associations between certain programming languages and energy consumption. A causal misinterpretation of this work has led academics and industry leaders to use or support certain languages based on their claimed impact on energy consumption. This paper tackles this causal question directly. It first corrects and improves the measurement methodology used by prior work. It then develops a detailed causal model capturing the complex relationship between programming language choice and energy consumption. This model identifies and incorporates several critical but previously overlooked factors that affect energy usage. These factors, such as distinguishing programming languages from their implementations, the impact of the application implementations themselves, the number of active cores, and memory activity, can significantly skew energy consumption measurements if not accounted for. We show -- via empirical experiments, improved methodology, and careful examination of anomalies -- that when these factors are controlled for, notable discrepancies in prior work vanish. Our analysis suggests that the choice of programming language implementation has no significant impact on energy consumption beyond execution time., Comment: 18 pages
- Published
- 2024
26. Exploring Error Types in Formal Languages Among Students of Upper Secondary Education
- Author
-
Schmellenkamp, Marko, Stanglmair, Dennis, Michaeli, Tilman, and Zeume, Thomas
- Subjects
Computer Science - Computers and Society ,Computer Science - Formal Languages and Automata Theory - Abstract
Foundations of formal languages, as subfield of theoretical computer science, are part of typical upper secondary education curricula. There is very little research on the potential difficulties that students at this level have with this subject. In this paper, we report on an exploratory study of errors in formal languages among upper secondary education students. We collect the data by posing exercises in an intelligent tutoring system and analyzing student input. Our results suggest a) instances of non-functional understanding of concepts such as the empty word or a grammar as a substitution system; b) strategic problems such as lack of foresight when deriving a word or confounding formal specifications with real-world knowledge on certain aspects; and c) various syntactic problems. These findings can serve as a starting point for a broader understanding of how and why students struggle with this topic.
- Published
- 2024
27. Various Types of Comet Languages and their Application in External Contextual Grammars
- Author
-
Ködding, Marvin and Truthe, Bianca
- Subjects
Computer Science - Formal Languages and Automata Theory ,F.4.2 ,F.4.3 - Abstract
In this paper, we continue the research on the power of contextual grammars with selection languages from subfamilies of the family of regular languages. We investigate various comet-like types of languages and compare such language families to some other subregular families of languages (finite, monoidal, nilpotent, combinational, (symmetric) definite, ordered, non-counting, power-separating, suffix-closed, commutative, circular, or union-free languages). Further, we compare the language families defined by these types for the selection with each other and with the families of the hierarchy obtained for external contextual grammars. In this way, we extend the existing hierarchy by new language families., Comment: In Proceedings NCMA 2024, arXiv:2409.06120
- Published
- 2024
- Full Text
- View/download PDF
28. Operational State Complexity of Block Languages
- Author
-
Duarte, Guilherme, Moreira, Nelma, Prigioniero, Luca, and Reis, Rogério
- Subjects
Computer Science - Formal Languages and Automata Theory - Abstract
In this paper we consider block languages, namely sets of words having the same length, and study the deterministic and nondeterministic state complexity of several operations on these languages. Being a subclass of finite languages, the upper bounds of operational state complexity known for finite languages apply for block languages as well. However, in several cases, smaller values were found. Block languages can be represented as bitmaps, which are a good tool to study their minimal finite automata and their operations, as we illustrate here., Comment: In Proceedings NCMA 2024, arXiv:2409.06120
- Published
- 2024
- Full Text
- View/download PDF
29. Benchmarking LLM Code Generation for Audio Programming with Visual Dataflow Languages
- Author
-
Zhang, William, Leon, Maria, Xu, Ryan, Cardenas, Adrian, Wissink, Amelia, Martin, Hanna, Srikanth, Maya, Dorogi, Kaya, Valadez, Christian, Perez, Pedro, Grijalva, Citlalli, Zhang, Corey, and Santolucito, Mark
- Subjects
Computer Science - Software Engineering ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Programming Languages - Abstract
Node-based programming languages are increasingly popular in media arts coding domains. These languages are designed to be accessible to users with limited coding experience, allowing them to achieve creative output without an extensive programming background. Using LLM-based code generation to further lower the barrier to creative output is an exciting opportunity. However, the best strategy for code generation for visual node-based programming languages is still an open question. In particular, such languages have multiple levels of representation in text, each of which may be used for code generation. In this work, we explore the performance of LLM code generation in audio programming tasks in visual programming languages at multiple levels of representation. We explore code generation through metaprogramming code representations for these languages (i.e., coding the language using a different high-level text-based programming language), as well as through direct node generation with JSON. We evaluate code generated in this way for two visual languages for audio programming on a benchmark set of coding problems. We measure both correctness and complexity of the generated code. We find that metaprogramming results in more semantically correct generated code, given that the code is well-formed (i.e., is syntactically correct and runs). We also find that prompting for richer metaprogramming using randomness and loops led to more complex code.
- Published
- 2024
30. Random Graph Generation in Context-Free Graph Languages
- Author
-
Vastarini, Federico and Plump, Detlef
- Subjects
Computer Science - Logic in Computer Science ,Computer Science - Formal Languages and Automata Theory - Abstract
We present a method for generating random hypergraphs in context-free hypergraph languages. It is obtained by adapting Mairson's generation algorithm for context-free string grammars to the setting of hyperedge replacement grammars. Our main results are that for non-ambiguous hyperedge replacement grammars, the method generates hypergraphs uniformly at random and in quadratic time. We illustrate our approach by a running example of a hyperedge replacement grammar generating term graphs., Comment: In Proceedings DCM 2023, arXiv:2409.19298
- Published
- 2024
- Full Text
- View/download PDF
31. A Bigger Picture of Early Literacy and Biliteracy Acquisition in Abugidas: Perspectives from Asian and African Languages
- Author
-
Jialin Lai, Juan F. Quinonez-Beltran, and R. Malatesha Joshi
- Abstract
With the overwhelming "Anglocentric" or "alphabetocentric" science of reading, the current review aimed to add to the science of reading acquisition from the perspective of abugidic writing system, distinct from the well-research alphabetic writing system in multiple dimensions of orthographic complexity, as proposed by Daniels and Share (2018), such as linguistic distance, spatial arrangement and non-linearity, and omission of phonological elements. Abugidic writing system is featured with scripts where each base consonant symbol denotes a consonant with an inherent vowel (/a/) and has billions of users in south Asia (e.g., India, Nepal, Sri Lanka), southeast Asia (e.g., Thailand, Laos, Cambodia), east Asia (parts of China) and Africa (Ethiopia and Eritrea). The current review describes the orthographic feature of Indic (Brahmi-derived) and Ethiopic (Ge'ez) scripts within the abugidic writing system and synthesizes existing findings on the literacy acquisition patterns specific to each script. Further, we elaborate on the multilingual and biscriptal language and literacy environment featured with the abugida-writing societies and discuss the theoretical implication for considering multilingualism and biscriptality as an inseparable sociolinguistic factor when understanding the literacy acquisition of the abugidic writing system in particular and literacy in general.
- Published
- 2024
- Full Text
- View/download PDF
32. Deegen: A JIT-Capable VM Generator for Dynamic Languages
- Author
-
Xu, Haoran and Kjolstad, Fredrik
- Subjects
Computer Science - Programming Languages - Abstract
Building a high-performance JIT-capable VM for a dynamic language has traditionally required a tremendous amount of time, money, and expertise. We present Deegen, a meta-compiler that allows users to generate a high-performance JIT-capable VM for their own language at an engineering cost similar to writing a simple interpreter. Deegen takes in the execution semantics of the bytecodes implemented as C++ functions, and automatically generates a two-tier VM execution engine with a state-of-the-art interpreter, a state-of-the-art baseline JIT, and the tier-switching logic that connects them into a self-adaptive system. We are the first to demonstrate the automatic generation of a JIT compiler, and the automatic generation of an interpreter that outperforms the state of the art. Our performance comes from a long list of optimizations supported by Deegen, including bytecode specialization and quickening, register pinning, tag register optimization, call inline caching, generic inline caching, JIT polymorphic IC, JIT IC inline slab, type-check removal and strength reduction, type-based slow-path extraction and outlining, JIT hot-cold code splitting, and JIT OSR-entry. These optimizations are either employed automatically, or guided by the language implementer through intuitive APIs. As a result, the disassembly of the Deegen-generated interpreter, baseline JIT, and the generated JIT code rivals the assembly code hand-written by experts in state-of-the-art VMs. We implement LuaJIT Remake (LJR), a standard-compliant Lua 5.1 VM, using Deegen. Across 44 benchmarks, LJR's interpreter is on average 179% faster than the official PUC Lua interpreter, and 31% faster than LuaJIT's interpreter. LJR's baseline JIT has negligible startup delay, and its execution performance is on average 360% faster than PUC Lua and only 33% slower (but faster on 13/44 benchmarks) than LuaJIT's optimizing JIT.
- Published
- 2024
33. Geodesic languages for rational subsets and conjugates in virtually free groups
- Author
-
Carvalho, André and Silva, Pedro V.
- Subjects
Mathematics - Group Theory ,Computer Science - Formal Languages and Automata Theory - Abstract
We prove that a subset of a virtually free group is rational if and only if the language of geodesic words representing its elements (in any generating set) is rational and that the language of geodesics representing conjugates of elements in a rational subset of a virtually free group is context-free. As a corollary, the doubly generalized conjugacy problem is decidable for rational subsets of finitely generated virtually free groups: there is an algorithm taking as input two rational subsets $K_1$ and $K_2$ of a virtually free group that decides whether there is one element of $K_1$ conjugate to an element of $K_2$. For free groups, we prove that the same problem is decidable with rational constraints on the set of conjugators., Comment: 16 pages, comments are welcome
- Published
- 2024
34. Crux, a Precise Verifier for Rust and Other Languages
- Author
-
Pernsteiner, Stuart, Diatchki, Iavor S., Dockins, Robert, Dodds, Mike, Hendrix, Joe, Ravich, Tristan, Redmond, Patrick, Scott, Ryan, and Tomb, Aaron
- Subjects
Computer Science - Programming Languages - Abstract
We present Crux, a cross-language verification tool for Rust and C/LLVM. Crux targets bounded, intricate pieces of code that are difficult for humans to get right: for example, cryptographic modules and serializer / deserializer pairs. Crux builds on the same framework as the mature SAW-Cryptol toolchain, but Crux provides an interface where proofs are phrased as symbolic unit tests. Crux is designed for use in production environments, and has already seen use in industry. In this paper, we focus on Crux-MIR, our verification tool for Rust. Crux-MIR provides a bit-precise model of safe and unsafe Rust which can be used to check both inline properties about Rust code, and extensional equality to executable specifications written in Cryptol or in the hacspec dialect of Rust. Notably, Crux-MIR supports compositional reasoning, which is necessary to scale to even moderately complex proofs. We demonstrate Crux-MIR by verifying the Ring library implementations of SHA1 and SHA2 against pre-existing functional specifications. Crux is available at https://crux.galois.com.
- Published
- 2024
35. A Qualitative Analysis of Pearson-Assured Accreditation Processes in Schools of Foreign Languages in Türkiye
- Author
-
Sebahat Çakirlar and Demet Yayli
- Abstract
The purpose of this study is to find out the views of a group of instructors and the administrators of the School of Foreign Languages (SFLs) on the Pearson Assured (PA) accreditation process in terms of the quality management of their institutions. To achieve this purpose, we employed document review and semi-structured interviews. To analyze the data; therefore, we used both document analysis and content analysis. The document analysis showed that the PA accreditation provides basic quality measurements with examples so that institutions can present their quality performances with evidence in order to ensure that the requirements in several headings are met. The qualitative content analysis of the verbal data captured in interviews revealed a change in the participants' views in time. Despite the partially negative opinions on the process held before and during accreditation, the high workload and immense amount of time required to provide necessary evidence, the participants generally had favorable opinions of PA accreditation and stated their wishes for it to continue, believing that accreditation process contributes to the quality management of their institutions as a whole.
- Published
- 2024
36. BPE and morphologically segmented phrase based statistical machine translation system for Indian languages to resource constrained language Bodo
- Author
-
Narzary, Sanjib, Brahma, Maharaj, Nandi, Sukumar, and Som, Bidisha
- Published
- 2024
- Full Text
- View/download PDF
37. Theory languages in designing artificial intelligence
- Author
-
Saariluoma, Pertti and Karvonen, Antero
- Published
- 2024
- Full Text
- View/download PDF
38. Languages and social cohesion: A transdisciplinary literature review
- Published
- 2024
39. RAN and Two Languages: A Meta-Analysis of the RAN-Reading Relationship in Bilingual Children
- Author
-
Victoria Kishchak, Anna Ewert, Paulina Halczak, Pawel Kleka, and Marcin Szczerbinski
- Abstract
RAN (Rapid Automatized Naming) is known to be a robust predictor of reading development in different languages. Much less is known about RAN predictive power in bilingual contexts. This is the first meta-analysis of research with bilingual children, assessing the strength of the RAN-reading relationship both within and across languages. It also explored the moderators that may affect this relationship. The search identified 38 published studies of bilingual children with 47 samples, 313 effect sizes and 5312 participants. Analyses of random-effects models with robust variance estimation revealed weak-to-moderate overall effect sizes of RAN and reading concurrently (r = -0.39) and longitudinally (r = -0.38). Moderator analyses of concurrent correlations revealed that RAN correlated more strongly with reading fluency (r = -0.56) than accuracy (r = -0.38). Alphanumeric RAN tasks (digits r = -0.39, letters r = -0.42) showed stronger associations with reading than non-alphanumeric RAN tasks (objects r = -0.38, colors r = -0.25). RAN-reading correlation was statistically significant both within and across languages. It was somewhat weaker when the two skills were measured in different languages (rL1RAN-L2 reading = -0.34, rL2RAN-L1 reading = -0.36) compared to when they were measured in the same language (rL1 = -0.40, rL2 = -0.44), though those differences failed to reach statistical significance. In addition, the type of bilingualism was found to be a potential moderator of the RAN-reading relationship longitudinally, with its magnitude being the strongest in simultaneous bilinguals. Our results suggest that, as a predictor, RAN taps into general, language-independent processes underlying reading.
- Published
- 2024
- Full Text
- View/download PDF
40. Universal and Language-Specific Connected Speech Characteristics of Bilingual Speakers with Alzheimer's Disease: Insights from Case Studies of Structurally Distinct Languages
- Author
-
Manaswita Dutta, Tina M. D. Mello, Yesi Cheng, Niladri Sekhar Dash, Ranita Nandi, Aparna Dutt, and Arpita Bose
- Abstract
Purpose: Connected speech analysis has been effectively utilized for the diagnosis and disease monitoring of individuals with Alzheimer's disease (AD). Existing research has been conducted mostly in monolingual English speakers with a noticeable lack of evidence from bilinguals and non-English speakers, particularly in non-European languages. Using a case study approach, we characterized connected speech profiles of two Bengali--English bilingual speakers with AD to determine the universal features of language impairments in both languages, identify language-specific differences between the languages, and explore language impairment characteristics of the participants with AD in relation to their bilingual language experience. Method: Participants included two Bengali--English bilingual speakers with AD and a group of age-, gender-, education-, and language-matched neurologically healthy controls. Connected speech samples were collected in first language (L1; Bengali) and second language (L2; English) using a novel storytelling task (i.e., Frog, Where Are You?). These samples were analyzed using an augmented quantitative production analysis and correct information unit analyses for productivity, fluency, syntactic and morphosyntactic features, and lexical and semantic characteristics. Results: Irrespective of the language, AD impacted speech productivity (speech rate and fluency) and semantic characteristics in both languages. Unique language-specific differences were noted on syntactic measures (reduced sentence length in Bengali), lexical distribution (fewer pronouns and absence of reduplication in Bengali), and inflectional properties (no difficulties with noun or verb inflections in Bengali). Among the two participants with AD, the individual who showed lower proficiency and usage in L2 (English) demonstrated reduced syntactic complexity and morphosyntactic richness in English. Conclusions: Evidence from these case studies suggests that language impairment features in AD are not universal across languages, particularly in comparison to impairments typically associated with language breakdowns in English. This study underscores the importance of establishing connected speech profiles in AD for non--English-speaking populations, especially for structurally different languages. This would in turn lead to the development of language-specific markers that can facilitate early detection of language deterioration and aid in improving diagnosis of AD in individuals belonging to underserved linguistically diverse populations.
- Published
- 2024
- Full Text
- View/download PDF
41. Distribution of Reconfiguration Languages maintaining Tree-like Communication Topology
- Author
-
Hausmann, Daniel, Lehaut, Mathieu, and Piterman, Nir
- Subjects
Computer Science - Formal Languages and Automata Theory - Abstract
We study how to distribute trace languages in a setting where processes communicate via reconfigurable communication channels. That is, the different processes can connect and disconnect from channels at run time. We restrict attention to communication via tree-like communication architectures. These allow channels to connect more than two processes in a way that maintains an underlying spanning tree and keeps communication continuous on the tree. We make the reconfiguration explicit in the language allowing both a centralized automaton as well as the distributed processes to share relevant information about the current communication configuration. We show that Zielonka's seminal result regarding distribution of regular languages for asynchronous automata can be generalized in this setting, incorporating both reconfiguration and more than binary tree architectures.
- Published
- 2024
42. Unsafe Impedance: Safe Languages and Safe by Design Software
- Author
-
Barney, Lee and Neto, Adolfo
- Subjects
Computer Science - Programming Languages - Abstract
In December 2023, security agencies from five countries in North America, Europe, and the south Pacific produced a document encouraging senior executives in all software producing organizations to take responsibility for and oversight of the security of the software their organizations produce. In February 2024, the White House released a cybersecurity outline, highlighting the December document. In this work we review the safe languages listed in these documents, and compare the safety of those languages with Erlang and Elixir, two BEAM languages. These security agencies' declaration of some languages as safe is necessary but insufficient to make wise decisions regarding what language to use when creating code. We propose an additional way of looking at languages and the ease with which unsafe code can be written and used. We call this new perspective \em{unsafe impedance}. We then go on to use unsafe impedance to examine nine languages that are considered to be safe. Finally, we suggest that business processes include what we refer to as an Unsafe Acceptance Process. This Unsafe Acceptance Process can be used as part of the memory safe roadmaps suggested by these agencies. Unsafe Acceptance Processes can aid organizations in their production of safe by design software., Comment: Accepted for Erlang Workshop 2024
- Published
- 2024
43. On state complexity for subword-closed languages
- Author
-
Guyot, Jérôme
- Subjects
Computer Science - Formal Languages and Automata Theory - Abstract
This paper investigates the state complexities of subword-closed and superword-closed languages, comparing them to regular languages. We focus on the square root operator and the substitution operator. We establish an exponential lower bound for superword-closed languages for the k-th root. For subword-closed languages we analyze in detail a specific instance of the square root problem for which a quadratic complexity is proven. For the substitution operator, we show an exponential lower bound for the general substitution. We then find some conditions for which we prove a quadratic upper bound.
- Published
- 2024
44. Toward Programming Languages for Reasoning: Humans, Symbolic Systems, and AI Agents
- Author
-
Marron, Mark
- Subjects
Computer Science - Programming Languages ,Computer Science - Software Engineering - Abstract
Integration, composition, mechanization, and AI assisted development are the driving themes in the future of software development. At their core these concepts are rooted in the increasingly important role of computing in our world, the desire to deliver functionality faster, with higher quality, and to empower more people to benefit from programmatic automation. These themes, and how they impact the human developers driving them, are the foundations for the next generation of programming languages. At first glance the needs of mechanization tools, AI agents, and human developers along with the various goals around development velocity, software quality, and software democratization are a broad and seemingly diverse set of needs. However, at their core is a single challenge that, once resolved, enables us to make radical progress in all of these areas. Our hypothesis is that, fundamentally, software development is a problem of reasoning about code and semantics. This is true for human developers implementing a feature, symbolic tools building models of application behavior, and even for language based AI agents as they perform tasks. While the particular aspects of reasoning that each agent struggles with varies to some degree, they share many common themes and, surprisingly, most mainstream languages extensively employ (anti)features that make this task harder or infeasible! This paper proposes a novel approach to this challenge -- instead of new language features or logical constructs, that add more complexity to what is already a problem of complexity, we propose radical simplification in the form of the Bosque platform and language.
- Published
- 2024
- Full Text
- View/download PDF
45. Regular Expressions with Backreferences on Multiple Context-Free Languages, and the Closed-Star Condition
- Author
-
Nogami, Taisei and Terauchi, Tachio
- Subjects
Computer Science - Formal Languages and Automata Theory - Abstract
Backreference is a well-known practical extension of regular expressions and most modern programming languages, such as Java, Python, JavaScript and more, support regular expressions with backreferences (rewb) in their standard libraries for string processing. A difficulty of backreference is non-regularity: unlike some other extensions, backreference strictly enhances the expressive power of regular expressions and thus rewbs can describe non-regular (in fact, even non-context-free) languages. In this paper, we investigate the expressive power of rewbs by comparing rewbs to multiple context-free languages (MCFL) and parallel multiple context-free languages (PMCFL). First, we prove that the language class of rewbs is a proper subclass of unary-PMCFLs. The class of unary-PMCFLs coincides with that of EDT0L languages, and our result strictly improves the known upper bound of rewbs. Additionally, we show that, however, the language class of rewbs is not contained in that of MCFLs even when restricted to rewbs with only one capturing group and no captured references. Therefore, in general, the parallelism seems essential for rewbs. Backed by these results, we define a novel syntactic condition on rewbs that we call closed-star and observe that it provides an upper bound on the number of times a rewb references the same captured string. The closed-star condition allows dispensing with the parallelism: that is, we prove that the language class of closed-star rewbs falls inside the class of unary-MCFLs, which is equivalent to that of EDT0L systems of finite index. Furthermore, as additional evidence for the robustness of the condition, we show that the language class of closed-star rewbs also falls inside the class of nonerasing stack languages (NESL)., Comment: 26 pages
- Published
- 2024
46. Evaluating LLM-driven User-Intent Formalization for Verification-Aware Languages
- Author
-
Lahiri, Shuvendu K.
- Subjects
Computer Science - Programming Languages ,Computer Science - Machine Learning ,Computer Science - Software Engineering ,D.2.1 ,F.4.1 ,I.2.2 - Abstract
Verification-aware programming languages such as Dafny and F* provide means to formally specify and prove properties of a program. Although the problem of checking an implementation against a specification can be defined mechanically, there is no algorithmic way of ensuring the correctness of the {\it user-intent formalization for programs}, expressed as a formal specification. This is because intent or requirement is expressed {\it informally} in natural language and the specification is a formal artefact. Despite, the advent of large language models (LLMs) has made tremendous strides bridging the gap between informal intent and formal program implementations recently, driven in large parts by benchmarks and automated metrics for evaluation. Recent work has proposed a framework for evaluating the {\it user-intent formalization} problem for mainstream programming languages~\cite{endres-fse24}. However, such an approach does not readily extend to verification-aware languages that support rich specifications (using quantifiers and ghost variables) that cannot be evaluated through dynamic execution. Previous work also required generating program mutants using LLMs to create the benchmark. We advocate an alternate, perhaps simpler approach of {\it symbolically testing specifications} to provide an intuitive metric for evaluating the quality of specifications for verification-aware languages. We demonstrate that our automated metric agrees closely on a human-labeled dataset of Dafny specifications for the popular MBPP code-generation benchmark, yet demonstrates cases where the human labeling is not perfect. We also outline formal verification challenges that need to be addressed to apply the technique more widely. We believe our work provides a stepping stone to enable the establishment of a benchmark and research agenda for the problem of user-intent formalization for programs., Comment: Proceedings of the 24th Conference on Formal Methods in Computer Aided Design (FMCAD 2024)
- Published
- 2024
- Full Text
- View/download PDF
47. Compilation Quotient (CQ): A Metric for the Compilation Hardness of Programming Languages
- Author
-
Szabo, Vince, Winterer, Dominik, and Su, Zhendong
- Subjects
Computer Science - Programming Languages ,Computer Science - Software Engineering - Abstract
Today's programmers can choose from an exceptional range of programming languages, each with its own traits, purpose, and complexity. A key aspect of a language's complexity is how hard it is to compile programs in the language. While most programmers have an intuition about compilation hardness for different programming languages, no metric exists to quantify it. We introduce the compilation quotient (CQ), a metric to quantify the compilation hardness of compiled programming languages. The key idea is to measure the compilation success rates of programs sampled from context-free grammars. To this end, we fairly sample over 12 million programs in total. CQ ranges between 0 and 100, where 0 indicates that no programs compile, and 100 means that all programs compile. Our findings on 12 popular compiled programming languages show high variation in CQ. C has a CQ of 48.11, C++ has 0.60, Java has 0.27 and Haskell has 0.13. Strikingly, Rust's CQ is nearly 0, and for C, even a large fraction of very sizable programs compile. We believe CQ can help understand the differences of compiled programming languages better and help language designers.
- Published
- 2024
48. Synthetic Programming Elicitation for Text-to-Code in Very Low-Resource Programming and Formal Languages
- Author
-
Mora, Federico, Wong, Justin, Lepe, Haley, Bhatia, Sahil, Elmaaroufi, Karim, Varghese, George, Gonzalez, Joseph E., Polgreen, Elizabeth, and Seshia, Sanjit A.
- Subjects
Computer Science - Programming Languages ,Computer Science - Machine Learning - Abstract
Recent advances in large language models (LLMs) for code applications have demonstrated remarkable zero-shot fluency and instruction following on challenging code related tasks ranging from test case generation to self-repair. Unsurprisingly, however, models struggle to compose syntactically valid programs in programming languages unrepresented in pre-training, referred to as very low-resource Programming Languages (VLPLs). VLPLs appear in crucial settings, including domain-specific languages for internal tools, tool-chains for legacy languages, and formal verification frameworks. Inspired by a technique called natural programming elicitation, we propose designing an intermediate language that LLMs "naturally" know how to use and which can be automatically compiled to a target VLPL. When LLMs generate code that lies outside of this intermediate language, we use compiler techniques to repair the code into programs in the intermediate language. Overall, we introduce \emph{synthetic programming elicitation and compilation} (SPEAC), an approach that enables LLMs to generate syntactically valid code even for VLPLs. We empirically evaluate the performance of SPEAC in a case study for the UCLID5 formal verification language and find that, compared to existing retrieval and fine-tuning baselines, SPEAC produces syntactically correct programs more frequently and without sacrificing semantic correctness., Comment: 14 pages, 6 figures, 1 table
- Published
- 2024
49. Multilingual speech recognition initiative for African languages
- Author
-
Abdou Mohamed, Naira, Allak, Anass, Gaanoun, Kamel, Benelallam, Imade, Erraji, Zakarya, and Bahafid, Abdessalam
- Published
- 2024
- Full Text
- View/download PDF
50. On categories associated with crisp deterministic automata with fuzzy rough outputs and fuzzy rough languages
- Author
-
Kumari, Mausam, Yadav, Vijay K., Ruhela, Shainky, and Tiwari, S. P.
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.