Author: "Bouma, Gosse" / Language: undetermined - Searchworks@Jio Institute Digital Library Search Results

1. Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer

Author: Üstün, Ahmet, Bisazza, Arianna, Bouma, Gosse, van Noord, Gertjan, and Ruder, Sebastian
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)
Abstract: Massively multilingual models are promising for transfer learning across tasks and languages. However, existing methods are unable to fully leverage training data when it is available in different task-language combinations. To exploit such heterogeneous supervision, we propose Hyper-X, a single hypernetwork that unifies multi-task and multilingual learning with efficient adaptation. This model generates weights for adapter modules conditioned on both tasks and language embeddings. By learning to combine task and language-specific knowledge, our model enables zero-shot transfer for unseen languages and task-language combinations. Our experiments on a diverse set of languages demonstrate that Hyper-X achieves the best or competitive gain when a mixture of multiple resources is available, while being on par with strong baselines in the standard scenario. Hyper-X is also considerably more efficient in terms of parameters and resources compared to methods that train separate adapters. Finally, Hyper-X consistently produces strong results in few-shot scenarios for new languages, showing the versatility of our approach beyond zero-shot transfer., Comment: Accepted at EMNLP 2022 (Main Conference)
Published: 2022
Full Text: View/download PDF

2. POS tagging, lemmatization and dependency parsing of West Frisian

Author: Heeringa, Wilbert, Bouma, Gosse, Hofman, Martha, Drenth, Eduard, Wijffels, Jan, and Van de Velde, Hans
Subjects: FOS: Computer and information sciences, Computer Science - Computation and Language, Statistics - Machine Learning, Machine Learning (stat.ML), J.5, Computation and Language (cs.CL), 68U15
Abstract: We present a lemmatizer/POS-tagger/dependency parser for West Frisian using a corpus of 44,714 words in 3,126 sentences that were annotated according to the guidelines of Universal Dependency version 2. POS tags were assigned to words by using a Dutch POS tagger that was applied to a literal word-by-word translation, or to sentences of a Dutch parallel text. Best results were obtained when using literal translations that were created by using the Frisian translation program Oersetter. Morphologic and syntactic annotations were generated on the basis of a literal Dutch translation as well. The performance of the lemmatizer/tagger/annotator when it was trained using default parameters was compared to the performance that was obtained when using the parameter values that were used for training the LassySmall UD 2.5 corpus. A significant improvement was found for `lemma'. The Frisian lemmatizer/PoS tagger/dependency parser is released as a web app and as a web service., Comment: 6 pages, 2 figures, 6 tables
Published: 2021
Full Text: View/download PDF

3. Corpus-Evidence for True Long-Distance Dependencies in Dutch

Author: Bouma, Gosse
Published: 2018
Full Text: View/download PDF

4. Variation and change in the use of hestitation markers in Germanic languages

Author: Wieling, Martijn, Grieve, Jack, Bouma, Gosse, Fruehwald, Josef, Coleman, John, and Liberman, Mark
Abstract: Data, methods and resultsIn this study, we investigate crosslinguistic patterns in the alternation between UM, a hesitation marker consisting of a neutral vowel followed by a final labial nasal, and UH, a hesitation marker consisting of a neutral vowel in an open syllable. Based on a quantitative analysis of a range of spoken and written corpora, we identify clear and consistent patterns of change in the use of these forms in various Germanic languages (English, Dutch, German, Norwegian, Danish, Faroese) and dialects (American English, British English), with the use of UM increasing over time relative to the use of UH. We also find that this pattern of change is generally led by women and more educated speakers. Finally, we propose a series of possible explanations for this surprising change in hesitation marker usage that is currently taking place across Germanic languages.
Published: 2016
Full Text: View/download PDF

5. The IMIX Demonstrator: an Information Search Assistant for the Medical Domain

Author: Hofs, D.H.W., van Schooten, B.W., op den Akker, Hendrikus J.A., van den Bosch, Antal, and Bouma, Gosse
Subjects: World Wide Web, EWI-20808, Computer science, Human–computer interaction, IR-80541, Speech input, Information Retrieval, Question answering, Dialog box, Question Answering, Domain (software engineering), METIS-289631
Abstract: In the course of the IMIX project a system was developed to demonstrate how the research performed in the various subprojects could contribute to the development of practical multimodal question answering dialog systems. This chapter describes the IMIX Demonstrator, an information search assistant for the medical domain. The functionalities and the architecture of the system are described, as well as its role in the IMIX project.
Published: 2011

6. Vidiam: Corpus-based Development of a Dialogue Manager for Multimodal Question Answering

Author: van Schooten, B.W., op den Akker, Hendrikus J.A., van den Bosch, Antal, and Bouma, Gosse
Subjects: Typology, Multi-modal dialogue management, business.industry, Computer science, iterative question answering, Base (topology), Dialogue management, computer.software_genre, METIS-286266, Development (topology), EWI-20809, Question answering, Corpus based, IR-80467, Artificial intelligence, business, computer, Natural language processing
Abstract: This chapter describes the Vidiam project, which covered the development of a dialogue management system for multimodal question answering (QA) dialogues, as carried out in the IMIX project. The approach followed was datadriven, i.e., corpus-based. Since research in QA dialogue of multimodal information retrieval is still new, no suitable corpora were available to base a system on. This chapter reports on the collection and analysis of three QA dialogue corpora, involving textual follow-up utterances, multimodal follow-up questions, and speech dialogues. Based on the data, a dialogue act typology was created, which helps translate user utterances to practical interactive QA strategies.
Published: 2011
Full Text: View/download PDF

7. Evaluation of the NLP Components of the OVIS2 Spoken Dialogue System

Author: van Zanten, Gert Veldhuijzen, Bouma, Gosse, Sima'an, Khalil, van Noord, Gertjan, and Bonnema, Remko
Subjects: H.5.2, FOS: Computer and information sciences, Computer Science - Computation and Language, H.5.1, I.2.7, H.4.0, Computation and Language (cs.CL)
Abstract: The NWO Priority Programme Language and Speech Technology is a 5-year research programme aiming at the development of spoken language information systems. In the Programme, two alternative natural language processing (NLP) modules are developed in parallel: a grammar-based (conventional, rule-based) module and a data-oriented (memory-based, stochastic, DOP) module. In order to compare the NLP modules, a formal evaluation has been carried out three years after the start of the Programme. This paper describes the evaluation procedure and the evaluation results. The grammar-based component performs much better than the data-oriented one in this comparison., Comment: Proceedings of CLIN 99
Published: 1999
Full Text: View/download PDF

8. Semi-Automatic Construction of Text-to-SQL Data for Domain Transfer

Author: Tianyi Li, Mark Steedman, Sujian Li, Oepen, Stephan, Sagae, Kenji, Tsarfaty, Reut, Bouma, Gosse, Seddah, Djamé, and Zeman, Daniel
Subjects: SQL, Information retrieval, Parsing, Computer science, InformationSystems_DATABASEMANAGEMENT, Construct (python library), Asset (computer security), computer.software_genre, Pipeline (software), Domain (software engineering), Annotation, Multiple time dimensions, computer, computer.programming_language
Abstract: Strong and affordable in-domain data is a desirable asset when transferring trained semantic parsers to novel domains. As previous methods for semi-automatically constructing such data cannot handle the complexity of realistic SQL queries, we propose to construct SQL queries via context-dependent sampling, and introduce the concept of topic. Along with our SQL query construction method, we propose a novel pipeline of semi-automatic Text-to-SQL dataset construction that covers the broad space of SQL queries. We show that the created dataset is comparable with expert annotation along multiple dimensions, and is capable of improving domain transfer performance for SOTA semantic parsers.
Published: 2021
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

8 results on '"Bouma, Gosse"'

1. Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer

2. POS tagging, lemmatization and dependency parsing of West Frisian

3. Corpus-Evidence for True Long-Distance Dependencies in Dutch

4. Variation and change in the use of hestitation markers in Germanic languages

5. The IMIX Demonstrator: an Information Search Assistant for the Medical Domain

6. Vidiam: Corpus-based Development of a Dialogue Manager for Multimodal Question Answering

7. Evaluation of the NLP Components of the OVIS2 Spoken Dialogue System

8. Semi-Automatic Construction of Text-to-SQL Data for Domain Transfer

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Database

Publisher

8 results on '"Bouma, Gosse"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources