Author: "Castilho, Sheila" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Castilho, Sheila"' showing total 267 results

Start Over Author "Castilho, Sheila"

267 results on '"Castilho, Sheila"'

1. How Much Data is Enough Data? Fine-Tuning Large Language Models for In-House Translation: Performance Evaluation Across Multiple Dataset Sizes

Author: Vieira, Inacio, Allred, Will, Lankford, Séamus, Castilho, Sheila, and Way, Andy
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Decoder-only LLMs have shown impressive performance in MT due to their ability to learn from extensive datasets and generate high-quality translations. However, LLMs often struggle with the nuances and style required for organisation-specific translation. In this study, we explore the effectiveness of fine-tuning Large Language Models (LLMs), particularly Llama 3 8B Instruct, leveraging translation memories (TMs), as a valuable resource to enhance accuracy and efficiency. We investigate the impact of fine-tuning the Llama 3 model using TMs from a specific organisation in the software sector. Our experiments cover five translation directions across languages of varying resource levels (English to Brazilian Portuguese, Czech, German, Finnish, and Korean). We analyse diverse sizes of training datasets (1k to 207k segments) to evaluate their influence on translation quality. We fine-tune separate models for each training set and evaluate their performance based on automatic metrics, BLEU, chrF++, TER, and COMET. Our findings reveal improvement in translation performance with larger datasets across all metrics. On average, BLEU and COMET scores increase by 13 and 25 points, respectively, on the largest training set against the baseline model. Notably, there is a performance deterioration in comparison with the baseline model when fine-tuning on only 1k and 2k examples; however, we observe a substantial improvement as the training dataset size increases. The study highlights the potential of integrating TMs with LLMs to create bespoke translation models tailored to the specific needs of businesses, thus enhancing translation quality and reducing turn-around times. This approach offers a valuable insight for organisations seeking to leverage TMs and LLMs for optimal translation outcomes, especially in narrower domains.
Published: 2024

2. Deep Dive Machine Translation

Author: Skadiņa, Inguna, Vasiḷjevs, Andrejs, Pinnis, Mārcis, Bērziņš, Aivars, Aranberri, Nora, Bogaert, Joachim Van den, O’Connor, Sally, García-Martínez, Mercedes, Goenaga, Iakes, Hajič, Jan, Herranz, Manuel, Lieske, Christian, Popel, Martin, Popović, Maja, Castilho, Sheila, Gaspari, Federico, Rosa, Rudolf, Superbo, Riccardo, Way, Andy, Sonntag, Daniel, Editor-in-Chief, Rehm, Georg, editor, and Way, Andy, editor
Published: 2023
Full Text: View/download PDF

3. A Set of Recommendations for Assessing Human-Machine Parity in Language Translation

Author: Läubli, Samuel, Castilho, Sheila, Neubig, Graham, Sennrich, Rico, Shen, Qinlan, and Toral, Antonio
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The quality of machine translation has increased remarkably over the past years, to the degree that it was found to be indistinguishable from professional human translation in a number of empirical investigations. We reassess Hassan et al.'s 2018 investigation into Chinese to English news translation, showing that the finding of human-machine parity was owed to weaknesses in the evaluation design - which is currently considered best practice in the field. We show that the professional human translations contained significantly fewer errors, and that perceived quality in human evaluation depends on the choice of raters, the availability of linguistic context, and the creation of reference translations. Our results call for revisiting current best practices to assess strong machine translation systems in general and human-machine parity in particular, for which we offer a set of recommendations based on our empirical findings.
Published: 2020
Full Text: View/download PDF

4. Deep Dive Machine Translation

Author: Skadiņa, Inguna, primary, Vasiḷjevs, Andrejs, additional, Pinnis, Mārcis, additional, Bērziņš, Aivars, additional, Aranberri, Nora, additional, Bogaert, Joachim Van den, additional, O’Connor, Sally, additional, García-Martínez, Mercedes, additional, Goenaga, Iakes, additional, Hajič, Jan, additional, Herranz, Manuel, additional, Lieske, Christian, additional, Popel, Martin, additional, Popović, Maja, additional, Castilho, Sheila, additional, Gaspari, Federico, additional, Rosa, Rudolf, additional, Superbo, Riccardo, additional, and Way, Andy, additional
Published: 2023
Full Text: View/download PDF

5. Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation

Author: Toral, Antonio, Castilho, Sheila, Hu, Ke, and Way, Andy
Subjects: Computer Science - Computation and Language
Abstract: We reassess a recent study (Hassan et al., 2018) that claimed that machine translation (MT) has reached human parity for the translation of news from Chinese into English, using pairwise ranking and considering three variables that were not taken into account in that previous study: the language in which the source side of the test set was originally written, the translation proficiency of the evaluators, and the provision of inter-sentential context. If we consider only original source text (i.e. not translated from another language, or translationese), then we find evidence showing that human parity has not been achieved. We compare the judgments of professional translators against those of non-experts and discover that those of the experts result in higher inter-annotator agreement and better discrimination between human and machine translations. In addition, we analyse the human translations of the test set and identify important translation issues. Finally, based on these findings, we provide a set of recommendations for future human evaluations of MT., Comment: WMT 2018
Published: 2018

6. Perceptions of Educators on MTQA Curriculum and Instruction

Author: Cavalheiro Camargo, João Lucas, Castilho, Sheila, Moorkens, Joss, Cavalheiro Camargo, João Lucas, Castilho, Sheila, and Moorkens, Joss
Abstract: This paper reports the results of a survey aimed at identifying and exploring the attitudes and recommendations of machine translation quality assessment (MTQA) educators. Drawing upon elements from the literature on MTQA teaching, the survey explores themes that may pose a challenge or lead to successful implementation of human evaluation, as the literature shows that there has not been enough design and reporting. Results show educators’ awareness of the topic, awareness stemming from the recommendations of the literature on MT evaluation, and reports new challenges and issues.
Published: 2024

7. Re-thinking Machine Translation Post-Editing Guidelines

Author: Moorkens, Joss, Castilho, Sheila, Gaspari, Federico, Toral, Antonio, Popović, Maja, Rico Pérez, Celia, Moorkens, Joss, Castilho, Sheila, Gaspari, Federico, Toral, Antonio, Popović, Maja, and Rico Pérez, Celia
Abstract: Machine Translation Post-Editing (MTPE) is a challenging task. It frequently creates tension between what the industry expects in terms of quality and what translators are willing to deliver as an end product. Conventional approaches to MTPE take as a point of departure the distinction between light and full MPTE, but the division gets blurred when implemented in an actual MTPE project where translators find difficulties in differentiating between essential and preferential changes. At the time MTPE guidelines were designed, the role of the human translator in the MT process was perceived as ancillary, a view inherited from the first days of MT research aiming at the so-called "Fully Automatic High Quality Machine Translation" (FAHQMT). My proposal challenges the traditional division of MTPE levels and presents a new way of looking at MTPE guidelines. In view of the latest developments in neural machine translation and the higher quality level of its output, it is my contention that the traditional division of MTPE levels is no longer valid. In this contribution I advance a proposal for redefining MTPE guidelines in the framework of an ecosystem specifically designed for this purpose., Depto. de Estudios Románicos, Franceses, Italianos y Traducción, Fac. de Filología, Instituto Universitario de Lenguas Modernas y Traductores (IULMyT), TRUE, pub
Published: 2024

8. On Education and Training in Translation Quality Assessment

Author: Doherty, Stephen, Moorkens, Joss, Gaspari, Federico, Castilho, Sheila, Way, Andy, Editor-in-Chief, Moorkens, Joss, editor, Castilho, Sheila, editor, Gaspari, Federico, editor, and Doherty, Stephen, editor
Published: 2018
Full Text: View/download PDF

9. Introduction

Author: Moorkens, Joss, Castilho, Sheila, Gaspari, Federico, Doherty, Stephen, Way, Andy, Editor-in-Chief, Moorkens, Joss, editor, Castilho, Sheila, editor, Gaspari, Federico, editor, and Doherty, Stephen, editor
Published: 2018
Full Text: View/download PDF

10. Approaches to Human and Machine Translation Quality Assessment

Author: Castilho, Sheila, Doherty, Stephen, Gaspari, Federico, Moorkens, Joss, Way, Andy, Editor-in-Chief, Moorkens, Joss, editor, Castilho, Sheila, editor, Gaspari, Federico, editor, and Doherty, Stephen, editor
Published: 2018
Full Text: View/download PDF

11. Proposal for a Triple Bottom Line for Translation Automation and Sustainability

Author: Moorkens, Joss, primary, Castilho, Sheila, additional, Gaspari, Federico, additional, Toral, Antonio, additional, and Popović, Maja, additional
Published: 2024
Full Text: View/download PDF

12. Editors' foreword to the special issue on human factors in neural machine translation

Author: Castilho, Sheila, Gaspari, Federico, Moorkens, Joss, Popović, Maja, and Toral, Antonio
Published: 2019
Full Text: View/download PDF

13. Evaluating MT for massive open online courses: A multifaceted comparison between PBSM and NMT systems

Author: Castilho, Sheila, Moorkens, Joss, Gaspari, Federico, Sennrich, Rico, Way, Andy, and Georgakopoulou, Panayota
Published: 2018

14. Chan Sin-Wai (ed): The human factor in machine translation: Routledge Studies in Translation Technology, Routledge, 1st edition (2018), xii+256pp, ISBN 781-13-855-1213 (Hardback), 978-13-151-4753-6 (e-book)

Author: Castilho, Sheila
Published: 2019
Full Text: View/download PDF

15. Deep dive machine translation

Author: Rehm, Georg, Way, Andy, Skadiņa, Inguna, Vasiḷjevs, Andrejs, Pinnis, Mārcis, Bērziņš, Aivars, Aranberri, Nora, Van den Bogaert, Joachim, O'Connor, Sally, García-Martínez, Mercedes, Goenaga, Iakes, Hajič, Jan, Herranz, Manuel, Lieske, Christian, Popel, Martin, Popović, Maja, Castilho, Sheila, Gaspari, Federico, Rosa, Rudolf, Superbo, Riccardo, Rehm, Georg, Way, Andy, Skadiņa, Inguna, Vasiḷjevs, Andrejs, Pinnis, Mārcis, Bērziņš, Aivars, Aranberri, Nora, Van den Bogaert, Joachim, O'Connor, Sally, García-Martínez, Mercedes, Goenaga, Iakes, Hajič, Jan, Herranz, Manuel, Lieske, Christian, Popel, Martin, Popović, Maja, Castilho, Sheila, Gaspari, Federico, Rosa, Rudolf, and Superbo, Riccardo
Abstract: Machine Translation (MT) is one of the oldest language technologies having been researched for more than 70 years. However, it is only during the last decade that it has been widely accepted by the general public, to the point where in many cases it has become an indispensable tool for the global community, supporting communication between nations and lowering language barriers. Still, there remain major gaps in the technology that need addressing before it can be successfully applied in under-resourced settings, can understand context and use world knowledge. This chapter provides an overview of the current state-of-the-art in the field of MT, offers technical and scientific forecasting for 2030, and provides recommendations for the advancement of MT as a critical technology if the goal of digital language equality in Europe is to be achieved.
Published: 2023
Full Text: View/download PDF

16. Do online machine translation systems care for context? What about a GPT model?

Author: Castilho, Sheila, Mallon, Clodagh, Meister, Rahel, Yue, Shengya, Castilho, Sheila, Mallon, Clodagh, Meister, Rahel, and Yue, Shengya
Abstract: This paper addresses the challenges of evaluating document-level machine translation (MT) in the context of recent advances in context-aware neural machine translation (NMT). It investigates how well online MT systems deal with six contextrelated issues, namely lexical ambiguity, grammatical gender, grammatical number, reference, ellipsis, and terminology, when a larger context span containing the solution for those issues is given as input. Results are compared to the translation outputs from the online ChatGPT. Our results show that, while the change of punctuation in the input yields great variability in the output translations, the context position does not seem to have a great impact. Moreover, the GPT model seems to outperform the NMT systems but performs poorly for Irish.
Published: 2023

17. Results of WMT23 Metrics Shared Task: Metrics Might Be Guilty but References Are Not Innocent

Author: Freitag, Markus, primary, Mathur, Nitika, additional, Lo, Chi-kiu, additional, Avramidis, Eleftherios, additional, Rei, Ricardo, additional, Thompson, Brian, additional, Kocmi, Tom, additional, Blain, Frederic, additional, Deutsch, Daniel, additional, Stewart, Craig, additional, Zerva, Chrysoula, additional, Castilho, Sheila, additional, Lavie, Alon, additional, and Foster, George, additional
Published: 2023
Full Text: View/download PDF

18. Sharing high-quality language resources in the legal domain to develop neural machine translation for under-resourced European languages

Author: Bago, Petra, Castilho, Sheila, Celeste, Edoardo, Dunne, Jane, Gaspari, Federico, Gíslason, Níels Rúnar, Kåsen, Andre, Klubička, Filip, Kristmannsson, Gauti, McHugh, Helen, Moran, Róisín, Ní Loinsigh, Órla, Olsen, Jon Arild, Parra Escartín, Carla, Ramesh, Akshai, Resende, Natalia, Sheridan, Páraic, Way, Andy, Bago, Petra, Castilho, Sheila, Celeste, Edoardo, Dunne, Jane, Gaspari, Federico, Rúnar Gíslason, Níel, Kåsen, Andre, Klubička, Filip, Kristmannsson, Gauti, Mchugh, Helen, Moran, Róisín, Ní Loinsigh, Órla, Arild Olsen, Jon, Parra Escartín, Carla, Ramesh, Akshai, Resende, Natalia, Sheridan, Páraic, and Way, Andy
Subjects: evaluation, Language resource, neural machine translation (MT), Language resources, low-resource languages, legal translation, Machine translating, Law, low-resource language
Abstract: This article reports some of the main achievements of the European Union-funded PRINCIPLE project in collecting high-quality language resources (LRs) in the legal domain for four under-resourced European languages: Croatian, Irish, Norwegian, and Icelandic. After illustrating the significance of this work for developing translation technologies in the context of the European Union and the European Economic Area, the article outlines the main steps of data collection, curation, and sharing of the LRs gathered with the support of public and private data contributors. This is followed by a description of the development pipeline and key features of the state-of-the-art, bespoke neural machine translation (MT) engines for the legal domain that were built using this data. The MT systems were evaluated with a combination of automatic and human methods to validate the quality of the LRs collected in the project, and the high-quality LRs were subsequently shared with the wider community via the ELRC-SHARE repository. The main challenges encountered in this work are discussed, emphasising the importance and the key benefits of sharing high-quality digital LRs. Petra;Sheila;Edoardo;Jane;Federico;Níels ;Andre;Filip;Gauti;Helen;Róisín;Órla ;Jon;Carla ;Akshai;Natalia;Páraic;Andy Way
Published: 2022

19. Reproducing a manual evaluation of the simplicity of text simplification system outputs

Author: Popović, Maja, Castilho, Sheila, Huidrom, Rudali, and Belz, Anya
Subjects: Computational linguistics
Abstract: In this paper we describe our reproduction study of the human evaluation of text simplic- ity reported by Nisioi et al. (2017). The work was carried out as part of the ReproGen Shared Task 2022 on Reproducibility of Evaluations in NLG. Our aim was to repeat the evaluation of simplicity for nine automatic text simplification systems with a different set of evaluators. We describe our experimental design together with the known aspects of the original experimental design and present the results from both studies. Pearson correlation between the original and reproduction scores is moderate to high (0.776). Inter-annotator agreement in the reproduction study is lower (0.40) than in the original study (0.66). We discuss challenges arising from the unavailability of certain aspects of the origi- nal set-up, and make several suggestions as to how reproduction of similar evaluations can be made easier in future.
Published: 2022

20. Reproducing a manual evaluation of simplicity in text simplification system outputs

Author: Popović, Maja, Huidrom, Rudali, Castilho, Sheila, Belz, Anya, Shaikh, Samira, Ferreira, Thiago, and Stent, Amanda
Subjects: Computational linguistics
Abstract: In this paper we describe our reproduction study of the human evaluation of text simplicity reported by Nisioi et al. (2017). The work was carried out as part of the ReproGen Shared Task 2022 on Reproducibility of Evaluations in NLG. Our aim was to repeat the evaluation of simplicity for nine automatic text simplification systems with a different set of evaluators. We describe our experimental design together with the known aspects of the original experimental design and present the results from both studies. Pearson correlation between the original and reproduction scores is moderate to high (0.776). Inter-annotator agreement in the reproduction study is lower (0.40) than in the original study (0.66). We discuss challenges arising from the unavailability of certain aspects of the original set-up, and make several suggestions as to how reproduction of similar evaluations can be made easier in future.
Published: 2022

21. How much context span is enough? Examining context-related issues for document-level MT

Author: Castilho, Sheila
Subjects: Translating and interpreting, document-level, context span, Computational linguistics, Linguistics, Machine translating, Language
Abstract: This paper analyses how much context span is necessary to solve different context-related issues, namely, reference, ellipsis, gender, number, lexical ambiguity, and terminology when translating from English into Portuguese. We use the DELA corpus, which consists of 60 documents and six different domains (subtitles, literary, news, reviews, medical, and legislation). We find that the shortest context span to disambiguate issues can appear in different positions in the document including preceding, following, global, world knowledge; and that the average length depends on the issue types as well as the domain. Additionally, we show that the standard approach of relying on only two preceding sentences as context might not be enough depending on the domain and issue types.
Published: 2022

22. Achievements of the PRINCIPLE project: promoting MT for Croatian, Icelandic, Irish and Norwegian

Author: Bago, Petra, Castilho, Sheila, Dunne, Jane, Gaspari, Federico, Kåsen, Andre, Kristmannsson, Gauti, Olsen, Jon Arild, Resende, Natalia, Gíslason, Níels Rúnar, Sheridan, Dana D., Sheridan, Páraic, Tinsley, John, and Way, Andy
Subjects: Artificial intelligence, Translating and interpreting, Language
Abstract: This paper provides an overview of the main achievements of the completed PRINCIPLE project, a 2-year action funded by the European Commission under the Connecting Europe Facility (CEF) programme. PRINCIPLE focused on collecting high-quality language resources for Croatian, Icelandic, Irish and Norwegian, which are severely low-resource languages, especially for building effective machine translation (MT) systems. We report the achievements of the project, primarily, in terms of the large amounts of data collected for all four low-resource languages and of promoting the uptake of neural MT (NMT) for these languages.
Published: 2022

23. Reproducing a manual evaluation of simplicity in text simplification system outputs

Author: Shaikh, Samira, Ferreira, Thiago, Stent, Amanda, Popović, Maja, Huidrom, Rudali, Castilho, Sheila, Belz, Anya, Shaikh, Samira, Ferreira, Thiago, Stent, Amanda, Popović, Maja, Huidrom, Rudali, Castilho, Sheila, and Belz, Anya
Abstract: In this paper we describe our reproduction study of the human evaluation of text simplicity reported by Nisioi et al. (2017). The work was carried out as part of the ReproGen Shared Task 2022 on Reproducibility of Evaluations in NLG. Our aim was to repeat the evaluation of simplicity for nine automatic text simplification systems with a different set of evaluators. We describe our experimental design together with the known aspects of the original experimental design and present the results from both studies. Pearson correlation between the original and reproduction scores is moderate to high (0.776). Inter-annotator agreement in the reproduction study is lower (0.40) than in the original study (0.66). We discuss challenges arising from the unavailability of certain aspects of the original set-up, and make several suggestions as to how reproduction of similar evaluations can be made easier in future.
Published: 2022

24. Post-Editese in Literary Translations

Author: Castilho, Sheila, primary and Resende, Natália, additional
Published: 2022
Full Text: View/download PDF

25. DELA Corpus - A Document-Level Corpus Annotated with Context-Related Issues

Author: Castilho, Sheila, Cavalheiro Camargo, João Lucas, Menezes, Miguel, and Way, Andy
Subjects: Translating and interpreting, Computational linguistics, Machine translating, machine translation evaluation, document-level MT, corpus, annotation, Language
Abstract: Recently, the Machine Translation (MT) community has become more interested in document-level evaluation especially in light of reactions to claims of "human parity", since examining the quality at the level of the document rather than at the sentence level allows for the assessment of suprasentential context, providing a more reliable evaluation. This paper presents a document-level corpus annotated in English with context-aware issues that arise when translating from English into Brazilian Portuguese, namely ellipsis, gender, lexical ambiguity, number, reference, and terminology, with six different domains. The corpus can be used as a challenge test set for evaluation and as a training/testing corpus for MT as well as for deep linguistic analysis of context issues. To the best of our knowledge, this is the first corpus of its kind.
Published: 2021

26. Post-editese in Literary Translations

Author: Castilho, Sheila, primary and Resende, Natalia, additional
Published: 2021
Full Text: View/download PDF

27. Towards document-level human MT evaluation: on the issues of annotator agreement, effort and misevaluation

Author: Castilho, Sheila
Subjects: Translating and interpreting, machine translation evaluation, document-level MT evlauation, human evaluation, Machine translating
Abstract: Document-level human evaluation of machine translation (MT) has been raising interest in the community. However, little is known about the issues of using document-level methodologies to assess MT quality. In this article, we compare the inter-annotator agreement (IAA) scores, the effort to assess the quality in different document-level methodologies, and the issue of misevaluation when sentences are evaluated out of context.
Published: 2021

28. The human factor in machine translation Routledge Studies in Translation Technology, 1st edition Chan Sin-Wai

Author: Castilho, Sheila
Published: 2019
Full Text: View/download PDF

29. On the same page? Comparing inter-annotator agreement in sentence and document level human machine translation evaluation

Author: Castilho, Sheila
Subjects: Translating and interpreting, Machine translating
Abstract: Document-level evaluation of machine translation has raised interest in the community especially since responses to the claims of “human parity” (Toral et al., 2018; L¨aubli et al.,2018) with document-level human evaluations have been published. Yet, little is known about best practices regarding human evaluation of machine translation at the documentlevel. This paper presents a comparison of the differences in inter-annotator agreement between quality assessments using sentence and document-level set-ups. We report results of the agreement between professional translators for fluency and adequacy scales, error annotation, and pair-wise ranking, along with the effort needed to perform the different tasks. To best of our knowledge, this is the first study of its kind.
Published: 2020

30. A human evaluation of English-Irish statistical and neural machine translation

Author: Dowling, Meghan, Castilho, Sheila, Moorkens, Joss, Lynn, Teresa, and Way, Andy
Subjects: Translating and interpreting, neural machine translation, statistical machine translation, human evaluation, machine translation post-editing, Irish language, Computational linguistics, Machine translating
Abstract: With official status in both Ireland and the EU, there is a need for high-quality English-Irish (EN-GA) machine translation (MT) systems which are suitable for use in a professional translation environment. While we have seen recent research on improving both statistical MT and neural MT for the EN-GA pair, the results of such systems have always been reported using automatic evaluation metrics. This paper provides the first human evaluation study of EN-GA MT using professional translators and in-domain (public administration) data for a more accurate depiction of the translation quality available via MT.
Published: 2020

31. On context span needed for machine translation evaluation

Author: Castilho, Sheila, Popović, Maja, and Way, Andy
Subjects: Translating and interpreting, MT evaluation, document-level MT evaluation, human evaluation, Machine translating
Abstract: Despite increasing efforts to improve evaluation of machine translation (MT) by going beyond the sentence level to the document level, the definition of what exactly constitutes a ``document level'' is still not clear. This work deals with the context span necessary for a more reliable MT evaluation. We report results from a series of surveys involving three domains and 18 target languages designed to identify the necessary context span as well as issues related to it. Our findings indicate that, despite the fact that some issues and spans are strongly dependent on domain and on the target language, a number of common patterns can be observed so that general guidelines for context-aware MT evaluation can be drawn.
Published: 2020

32. Document-level machine translation evaluation project: methodology, effort and inter-annotator agreement

Author: Castilho, Sheila
Subjects: Translating and interpreting, machine translation evaluation, human evaluation of machine translation, Machine translating
Abstract: Recently, document-level (doc-level) human evaluation of machine translation (MT) has raised interest in the community after a few attempts have disproved claims of “human parity” (Toral et al., 2018; Laubli et al., 2018). However, lit- ¨ tle is still known about best practices regarding doc-level human evaluation. This project aims to identify methodologies to better cope with i) the current state-of-theart (SOTA) human metrics, ii) a possible complexity when assigning a single score to a text consisted of ‘good’ and ‘bad’ sentences, iii) a possible tiredness bias in doc-level set-ups, and iv) the difference in inter-annotator agreement (IAA) between sentence and doc-level set-ups.
Published: 2020

33. A Set of Recommendations for Assessing Human–Machine Parity in Language Translation

Author: Läubli, Samuel; https://orcid.org/0000-0001-5362-4106, Castilho, Sheila, Neubig, Graham, Sennrich, Rico; https://orcid.org/0000-0002-1438-4741, Shen, Qinlan, Toral, Antonio, Läubli, Samuel; https://orcid.org/0000-0001-5362-4106, Castilho, Sheila, Neubig, Graham, Sennrich, Rico; https://orcid.org/0000-0002-1438-4741, Shen, Qinlan, and Toral, Antonio
Abstract: The quality of machine translation has increased remarkably over the past years, to the degree that it was found to be indistinguishable from professional human translation in a number of empirical investigations. We reassess Hassan et al.'s 2018 investigation into Chinese to English news translation, showing that the finding of human–machine parity was owed to weaknesses in the evaluation design—which is currently considered best practice in the field. We show that the professional human translations contained significantly fewer errors, and that perceived quality in human evaluation depends on the choice of raters, the availability of linguistic context, and the creation of reference translations. Our results call for revisiting current best practices to assess strong machine translation systems in general and human–machine parity in particular, for which we offer a set of recommendations based on our empirical findings.
Published: 2020

34. A Set of Recommendations for Assessing Human–Machine Parity in Language Translation

Author: Läubli, Samuel, primary, Castilho, Sheila, additional, Neubig, Graham, additional, Sennrich, Rico, additional, Shen, Qinlan, additional, and Toral, Antonio, additional
Published: 2020
Full Text: View/download PDF

35. What Level of Quality Can Neural Machine Translation Attain on Literary Text?

Author: Toral, Antonio, Way, Andy, Moorkens, Joss, Castilho, Sheila, Gaspari, Federico, Doherty, Stephen, Moorkens, Joss, Castilho, Sheila, Gaspari, Federico, and Doherty, S
Subjects: 050101 languages & linguistics, Phrase, Machine translation, Computer science, media_common.quotation_subject, 02 engineering and technology, computer.software_genre, Pairwise ranking, 0202 electrical engineering, electronic engineering, information engineering, 0501 psychology and cognitive sciences, Quality (business), Set (psychology), Literature translation, Neural machine translation, Phrase-based statistical machine translation, Principles to practice, media_common, business.industry, 06 humanities and the arts, Translation quality assessment, 020201 artificial intelligence & image processing, Text types, Artificial intelligence, business, Machine translating, computer, Natural language processing
Abstract: Given the rise of the new neural approach to machine translation (NMT) and its promising performance on different text types, we assess the translation quality it can attain on what is perceived to be the greatest challenge for MT: literary text. Specifically, we target novels, arguably the most popular type of literary text. We build a literary-adapted NMT system for the English-to-Catalan translation direction and evaluate it against a system pertaining to the previous dominant paradigm in MT: statistical phrase-based MT (PBSMT). To this end, for the first time we train MT systems, both NMT and PBSMT, on large amounts of literary text (over 100 million words) and evaluate them on a set of 12 widely known novels spanning from the 1920s to the present day. According to the BLEU automatic evaluation metric, NMT is significantly better than PBSMT (p
Published: 2018
Full Text: View/download PDF

36. What influences post-editese features? A preliminary study

Author: Castilho, Sheila, Resende, Natália, and Mitkov, Ruslan
Subjects: Translating and interpreting, Translationese, Post-editese, Machine translating
Abstract: While a number of studies have shown evidence of translationese phenomena, that is, statistical differences between original texts and translated texts (Gellerstam, 1986), results of studies searching for translationese features in postedited texts (what has been called ”posteditese” (Daems et al., 2017)) have presented mixed results. This paper reports a preliminary study aimed at identifying the presence of post-editese features in machine-translated post-edited texts and at understanding how they differ from translationese features. We test the influence of factors such as post-editing (PE) levels (full vs. light), translation proficiency (professionals vs. students) and text domain (news vs. literary). Results show evidence of post-editese features, especially in light PE texts and in certain domains
Published: 2019

37. Challenge Test Sets for MT Evaluation -- tutorial at MT Summit XVII

Author: Popovic, Maja and Castilho, Sheila
Published: 2019
Full Text: View/download PDF

38. Large-scale machine translation evaluation of the iADAATPA project

Author: Castilho, Sheila, Resende, Natália, Gaspari, Federico, Way, Andy, O'Dowd, Tony, Mazur, Marek, Herranz, Manuel, Helle, Alex, Ramirez-Sanchez, Gema, Sanchez-Cartagena, Victor, Pinnis, Marcis, Sics, Valters, Forcada, Mikel, Way, Andy, Tinsley, John, Shterionov, Dimitar, Rici, Celia, and Gaspari, Federico
Subjects: Machine translating
Abstract: This paper reports the results of an indepth evaluation of 34 state-of-the-art domain-adapted machine translation (MT) systems that were built by four leading MT companies as part of the EU-funded iADAATPA project. These systems support a wide variety of languages for several domains. The evaluation combined automatic metrics and human methods, namely assessments of adequacy, ﬂuency, and comparative ranking. The paper also discusses the most effective techniques to build domain-adapted MT systems for the relevant language combinations and domains.
Published: 2019

39. Are ambiguous conjunctions problematic for machine translation?

Author: Mitkov, Ruslan, Angelova, Galia, Popović, Maja, Castilho, Sheila, Mitkov, Ruslan, Angelova, Galia, Popović, Maja, and Castilho, Sheila
Abstract: The translation of ambiguous words still poses challenges for machine translation. In this work, we carry out a systematic quantitative analysis regarding the ability of different machine translation systems to disambiguate the source language conjunctions “but” and “and”. We evaluate specialised test sets focused on the translation of these two conjunctions. The test sets contain source languages that do not distinguish different variants of the given conjunction, whereas the target languages do. In total, we evaluate the conjunction “but” on 20 translation outputs, and the conjunction “and” on 10. All machine translation systems almost perfectly recognise one variant of the target conjunction, especially for the source conjunction “but”. The other target variant, however, represents a challenge for machine translation systems, with accuracy varying from 50% to 95% for “but” and from 20% to 57% for “and”. The major error for all systems is replacing the correct target variant with the opposite one.
Published: 2019
Full Text: View/download PDF

40. Large-scale machine translation evaluation of the iADAATPA project

Author: Forcada, Mikel, Way, Andy, Tinsley, John, Shterionov, Dimitar, Rici, Celia, Gaspari, Federico, Castilho, Sheila, Resende, Natália, O'Dowd, Tony, Mazur, Marek, Herranz, Manuel, Helle, Alex, Ramirez-Sanchez, Gema, Sanchez-Cartagena, Victor, Pinnis, Marcis, Sics, Valters, Forcada, Mikel, Way, Andy, Tinsley, John, Shterionov, Dimitar, Rici, Celia, Gaspari, Federico, Castilho, Sheila, Resende, Natália, O'Dowd, Tony, Mazur, Marek, Herranz, Manuel, Helle, Alex, Ramirez-Sanchez, Gema, Sanchez-Cartagena, Victor, Pinnis, Marcis, and Sics, Valters
Abstract: This paper reports the results of an indepth evaluation of 34 state-of-the-art domain-adapted machine translation (MT) systems that were built by four leading MT companies as part of the EU-funded iADAATPA project. These systems support a wide variety of languages for several domains. The evaluation combined automatic metrics and human methods, namely assessments of adequacy, ﬂuency, and comparative ranking. The paper also discusses the most effective techniques to build domain-adapted MT systems for the relevant language combinations and domains.
Published: 2019

41. Evaluating MT for massive open online courses

Author: Castilho, Sheila, Moorkens, Joss, Gaspari, Federico, Sennrich, Rico, Way, Andy, and Georgakopoulou, Panayota
Abstract: This article reports a multifaceted comparison between statistical and neural machine translation (MT) systems that were developed for translation of data from massive open online courses (MOOCs). The study uses four language pairs: English to German, Greek, Portuguese, and Russian. Translation quality is evaluated using automatic metrics and human evaluation, carried out by professional translators. Results show that neural MT is preferred in side-by-side ranking, and is found to contain fewer overall errors. Results are less clear-cut for some error categories, and for temporal and technical post-editing effort. In addition, results are reported based on sentence length, showing advantages and disadvantages depending on the particular language pair and MT paradigm.
Published: 2018
Full Text: View/download PDF

42. Reading Comprehension of Machine Translation Output: What Makes for a Better Read?

Author: Castilho, Sheila, Guerberof Arenas, Ana, Pérez-Ortiz, Juan Antonio, Sánchez-Martínez, Felipe, Esplá-Gomis, Miquel, and Popović, Maja
Subjects: Machine Translation, Lenguajes y Sistemas Informáticos, Machine translating
Abstract: This paper reports on a pilot experiment that compares two different machine translation (MT) paradigms in reading comprehension tests. To explore a suitable methodology, we set up a pilot experiment with a group of six users (with English, Spanish and Simplified Chinese languages) using an English Language Testing System (IELTS), and an eye-tracker. The users were asked to read three texts in their native language: either the original English text (for the English speakers) or the machine-translated text (for the Spanish and Simplified Chinese speakers). The original texts were machine-translated via two MT systems: neural (NMT) and statistical (SMT). The users were also asked to rank satisfaction statements on a 3-point scale after reading each text and answering the respective comprehension questions. After all tasks were completed, a post-task retrospective interview took place to gather qualitative data. The findings suggest that the users from the target languages completed more tasks in less time with a higher level of satisfaction when using translations from the NMT system. This research was supported by the Edge Research Fellowship programme that has received funding from the European Unions Horizon 2020 and innovation programme under the Marie Sklodowska-Curie grant agreement No. 713567, and by the ADAPT Centre for Digital Content Technology, funded under the SFI Research Centres Programme (Grant 13/RC/2106) and co-funded under the European Regional Development Fund.
Published: 2018

43. Translation Crowdsourcing: Creating a Multilingual Corpus of Online Educational Content

Author: Sosoni, Vilelmini, Kermanidis, Katia Lida, Stasimioti, Maria, Naskos, Thanasis, Takoulidou, Eirini, Zaanen, Menno Van, Castilho, Sheila, Georgakopoulou, Panayota, Kordoni, Valia, Egg, Markus, chair), Nicoletta Calzolari (Conference, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Hasida, Koiti, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Hélène, Moreno, Asuncion, Odijk, Jan, Piperidis, Stelios, Tokunaga, Takenobu, and Cognitive Science & AI
Published: 2018

44. Project PiPeNovel: Pilot on Post-editing Novels

Author: Toral, Antonio, Wieling, Martijn, Castilho, Sheila, Moorkens, Joss, and Way, Andy
Subjects: Machine Translation, Lenguajes y Sistemas Informáticos
Abstract: Given (i) the rise of a new paradigm to machine translation based on neural networks that results in more fluent and less literal output than previous models and (ii) the maturity of machine-assisted translation via post-editing in industry, project PiPeNovel studies the feasibility of the post-editing workflow for literary text conducting experiments with professional literary translators. PiPeNovel is funded by the European Association for Machine Translation through its 2015 sponsorship of activities programme. The ADAPT Centre at Dublin City University is funded under the Science Foundation Ireland Research Centres Programme (Grant 13/RC/2106).
Published: 2018

45. What level of quality can neural machine translation attain on literary text?

Author: Moorkens, Joss, Castilho, Sheila, Gaspari, Federico, Doherty, S, Toral, Antonio, Way, Andy, Moorkens, Joss, Castilho, Sheila, Gaspari, Federico, Doherty, S, Toral, Antonio, and Way, Andy
Abstract: Given the rise of the new neural approach to machine translation (NMT) and its promising performance on different text types, we assess the translation quality it can attain on what is perceived to be the greatest challenge for MT: literary text. Specifically, we target novels, arguably the most popular type of literary text. We build a literary-adapted NMT system for the English-to-Catalan translation direction and evaluate it against a system pertaining to the previous dominant paradigm in MT: statistical phrase-based MT (PBSMT). To this end, for the first time we train MT systems, both NMT and PBSMT, on large amounts of literary text (over 100 million words) and evaluate them on a set of 12 widely known novels spanning from the 1920s to the present day. According to the BLEU automatic evaluation metric, NMT is significantly better than PBSMT (p
Published: 2018
Full Text: View/download PDF

46. Quality expectations of machine translation

Author: Moorkens, Joss, Castilho, Sheila, Gaspari, Federico, Doherty, Stephen, Way, Andy, Moorkens, Joss, Castilho, Sheila, Gaspari, Federico, Doherty, Stephen, and Way, Andy
Abstract: Machine Translation (MT) is being deployed for a range of use-cases by millions of people on a daily basis. There should, therefore, be no doubt as to the utility of MT. However, not everyone is convinced that MT can be useful, especially as a productivity enhancer for human translators. In this chapter, I address this issue, describing how MT is currently deployed, how its output is evaluated and how this could be enhanced, especially as MT quality itself improves. Central to these issues is the acceptance that there is no longer a single ‘gold standard’ measure of quality, such that the situation in which MT is deployed needs to be borne in mind, especially with respect to the expected ‘shelf-life’ of the translation itself.
Published: 2018
Full Text: View/download PDF

47. Improving machine translation of educational content via crowdsourcing

Author: McCrae, John P., Chiarcos, Christian, Declerck, Thierry, Gracia, Jorge, Klimek, Bettina, Behnke, Maximiliana, Miceli Barone, Antonio Valerio, Sennrich, Rico, Sosoni, Vilelmini, Naskos, Thanasis, Takoulidou, Eirini, Stasimioti, Maria, Menno, van Zaanen, Castilho, Sheila, Gaspari, Federico, Georgakopoulou, Panayota, Kordoni, Valia, Egg, Markus, Kermanidis, Katia Lida, McCrae, John P., Chiarcos, Christian, Declerck, Thierry, Gracia, Jorge, Klimek, Bettina, Behnke, Maximiliana, Miceli Barone, Antonio Valerio, Sennrich, Rico, Sosoni, Vilelmini, Naskos, Thanasis, Takoulidou, Eirini, Stasimioti, Maria, Menno, van Zaanen, Castilho, Sheila, Gaspari, Federico, Georgakopoulou, Panayota, Kordoni, Valia, Egg, Markus, and Kermanidis, Katia Lida
Abstract: The limited availability of in-domain training data is a major issue in the training of application-specific neural machine translation models. Professional outsourcing of bilingual data collections is costly and often not feasible. In this paper we analyze the influence of using crowdsourcing as a scalable way to obtain translations of target in-domain data having in mind that the translations can be of a lower quality. We apply crowdsourcing with carefully designed quality controls to create parallel corpora for the educational domain by collecting translations of texts from MOOCs from English to eleven languages, which we then use to fine-tune neural machine translation models previously trained on general-domain data. The results from our research indicate that crowdsourced data collected with proper quality controls consistently yields performance gains over general-domain baseline systems, and systems fine-tuned with pre-existing in-domain corpora.
Published: 2018

48. Reading comprehension of machine translation output: what makes for a better read?

Author: Pérez-Ortiz, Juan Antonio, Sánchez-Martínez, Felipe, Esplá-Gomis, Miquel, Popović, Maja, Castilho, Sheila, Guerberof Arenas, Ana, Pérez-Ortiz, Juan Antonio, Sánchez-Martínez, Felipe, Esplá-Gomis, Miquel, Popović, Maja, Castilho, Sheila, and Guerberof Arenas, Ana
Abstract: This paper reports on a pilot experiment that compares two different machine translation (MT) paradigms in reading comprehension tests. To explore a suitable methodology, we set up a pilot experiment with a group of six users (with English, Spanish and Simplified Chinese languages) using an English Language Testing System (IELTS), and an eye-tracker. The users were asked to read three texts in their native language: either the original English text (for the English speakers) or the machine-translated text (for the Spanish and Simplified Chinese speakers). The original texts were machine-translated via two MT systems: neural (NMT) and statistical (SMT). The users were also asked to rank satisfaction statements on a 3-point scale after reading each text and answering the respective comprehension questions. After all tasks were completed, a post-task retrospective interview took place to gather qualitative data. The findings suggest that the users from the target languages completed more tasks in less time with a higher level of satisfaction when using translations from the NMT system.
Published: 2018

49. Translation dictation vs. post-editing with cloud-based voice recognition: a pilot experiment

Author: Zapata, Julián, Castilho, Sheila, Moorkens, Joss, Yamada, Masaru, and Seligman, Mark
Subjects: Machine translating
Abstract: In this paper, we report on a pilot mixed-methods experiment investigating the effects on productivity and on the translator experience of integrating machine translation (MT) postediting (PE) with voice recognition (VR) and translation dictation (TD). The experiment was performed with a sample of native Spanish participants. In the quantitative phase of the experiment, they performed four tasks under four different conditions, namely (1) conventional TD; (2) PE in dictation mode; (3) TD with VR; and (4) PE with VR (PEVR). In the follow-on qualitative phase, the participants filled out an online survey, providing details of their perceptions of the task and of PEVR in general. Our results suggest that PEVR may be a usable way to add MT to a translation workflow, with some caveats. When asked about their experience with the tasks, our participants preferred translation without the ‘constraint’ of MT, though the quantitative results show that PE tasks were generally more efficient. This paper provides a brief overview of past work exploring VR for from-scratch translation and PE purposes, describes our pilot experiment in detail, presents an overview and analysis of the data collected, and outlines avenues for future work.
Published: 2017

50. A Comparative Quality Evaluation of PBSMT and NMT using Professional Translators

Author: Castilho, Sheila, Moorkens, Joss, Gaspari, Federico, Sennrich, Rico, Sosoni, Vilelmini, Georgakopoulou, Panayota, Lohar, Pintu, Way, Andy, Miceli Barone, Antonio Valerio, Gialama, Maria, Kurohashi, Sadao, and Fung, Pascale
Subjects: Machine translating
Abstract: This paper reports on a comparative evaluation of phrase-based statistical machine translation (PBSMT) and neural machine translation (NMT) for four language pairs, using the PET interface to compare educational domain output from both systems using a variety of metrics, including automatic evaluation as well as human rankings of adequacy and fluency, error-type markup, and post-editing (technical and temporal) effort, performed by professional translators.Our results show a preference for NMT in side-by-side ranking for all language pairs, texts, and segment lengths. In addition, perceived fluency is improved and annotated errors are fewer in the NMT output. Results are mixed for perceived adequacy and for errors of omission, addition, and mistranslation. Despite far fewer segments requiring post-editing, document-level post-editing performance was not found to have significantly improved in NMT compared toPBSMT. This evaluation was conducted as part of the TraMOOC project, which aims to create a replicable semi-automated methodology for high-quality machine translation of educational data.
Published: 2017

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

267 results on '"Castilho, Sheila"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources