25 results on '"Maillette de Buy Wenniger, Gideon"'
Search Results
2. Labeling hierarchical phrase-based models without linguistic resources
- Author
-
Maillette de Buy Wenniger, Gideon and Sima’an, Khalil
- Published
- 2015
- Full Text
- View/download PDF
3. SChuBERT: Scholarly Document Chunks with BERT-encoding boost Citation Count Prediction
- Author
-
van Dongen, Thomas, Maillette de Buy Wenniger, Gideon, Schomaker, Lambert, Chandrasekaran, Muthu Kumar, and Artificial Intelligence
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Computation and Language ,Information retrieval ,Computer science ,I.2.7 ,Proxy (statistics) ,Document processing ,Citation ,Computation and Language (cs.CL) ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,Machine Learning (cs.LG) - Abstract
Predicting the number of citations of scholarly documents is an upcoming task in scholarly document processing. Besides the intrinsic merit of this information, it also has a wider use as an imperfect proxy for quality which has the advantage of being cheaply available for large volumes of scholarly documents. Previous work has dealt with number of citations prediction with relatively small training data sets, or larger datasets but with short, incomplete input text. In this work we leverage the open access ACL Anthology collection in combination with the Semantic Scholar bibliometric database to create a large corpus of scholarly documents with associated citation information and we propose a new citation prediction model called SChuBERT. In our experiments we compare SChuBERT with several state-of-the-art citation prediction models and show that it outperforms previous methods by a large margin. We also show the merit of using more training data and longer input for number of citations prediction., Comment: Published at the First Workshop on Scholarly Document Processing, at EMNLP 2020. Minor corrections were made to the workshop version, including addition of color to Figures 1,2
- Published
- 2020
4. SChuBERT:Scholarly Document Chunks with BERT-encoding boost Citation Count Prediction
- Author
-
van Dongen, Thomas, Chandrasekaran, Muthu Kumar, van Dongen, Thomas, Maillette de Buy Wenniger, Gideon, Schomaker, Lambert, van Dongen, Thomas, Chandrasekaran, Muthu Kumar, van Dongen, Thomas, Maillette de Buy Wenniger, Gideon, and Schomaker, Lambert
- Abstract
Predicting the number of citations of scholarly documents is an upcoming task in scholarly document processing. Besides the intrinsic merit of this information, it also has a wider use as an imperfect proxy for quality which has the advantage of being cheaply available for large volumes of scholarly documents. Previous work has dealt with number of citations prediction with relatively small training data sets, or larger datasets but with short, incomplete input text. In this work we leverage the open access ACL Anthology collection in combination with the Semantic Scholar bibliometric database to create a large corpus of scholarly documents with associated citation information and we propose a new citation prediction model called SChuBERT. In our experiments we compare SChuBERT with several state-of-the-art citation prediction models and show that it outperforms previous methods by a large margin. We also show the merit of using more training data and longer input for number of citations prediction.
- Published
- 2020
5. No padding please: efficient neural handwriting recognition
- Author
-
Maillette de Buy Wenniger, Gideon, Schomaker, Lambert, Way, Andy, Maillette de Buy Wenniger, Gideon, Schomaker, Lambert, and Way, Andy
- Abstract
Neural handwriting recognition (NHR) is the recognition of handwritten text with deep learning models, such as multi-dimensional long short-term memory (MDLSTM) re-current neural networks. Models with MDLSTM layers have achieved state-of-the art results on handwritten text recognition tasks. While multi-directional MDLSTM-layers have an unbeaten ability to capture the complete context in all directions, this strength limits the possibilities for parallelization, and therefore comes at a high computational cost.In this work we develop methods to create efficient MDLSTM-based models for NHR, particularly a method aimed at eliminating computation waste that results from padding. This proposed method, called example-packing, replaces wasteful stacking of padded examples with efficient tiling in a 2-dimensional grid.For word-based NHR this yields a speed improvement of factor6.6 over an already efficient baseline of minimal padding foreach batch separately. For line-based NHR the savings are more modest, but still significant.In addition to example-packing, we propose: 1) a technique to optimize parallelization for dynamic graph definition frameworks including PyTorch, using convolutions with grouping, 2) a method for parallelization across GPUs for variable-length example batches. All our techniques are thoroughly tested on our own PyTorch re-implementation of MDLSTM-based NHR models. A thorough evaluation on the IAM dataset shows that our models are performing similar to earlier implementations of state-of-theart models. Our efficient NHR model and some of the reusable techniques discussed with it offer ways to realize relatively efficient models for the omnipresent scenario of variable-length inputs in deep learning.
- Published
- 2020
- Full Text
- View/download PDF
6. Investigating query expansion and coreference resolution in question answering on BERT
- Author
-
Bhattacharjee, Santanu, Haque, Rejwanul, Maillette de Buy Wenniger, Gideon, Way, Andy, Bhattacharjee, Santanu, Haque, Rejwanul, Maillette de Buy Wenniger, Gideon, and Way, Andy
- Abstract
The Bidirectional Encoder Representations from Transformers (BERT) model produces state-of-the-art results in many question answering (QA) datasets, including the Stanford Question Answering Dataset (SQuAD). This paper presents a query expansion (QE) method that identifies good terms from input questions, extracts synonyms for the good terms using a widely-used language resource, WordNet, and selects the most relevant synonyms from the list of extracted synonyms. The paper also introduces a novel QE method that produces many alternative sequences for a given input question using same-language machine translation (MT). Furthermore, we use a coreference resolution (CR) technique to identify anaphors or cataphors in paragraphs and substitute them with the original referents. We found that the QA system with this simple CR technique significantly outperforms the BERT baseline in a QA task. We also found that our best-performing QA system is the one that applies these three preprocessing methods (two QE and CR methods) together to BERT, which produces an excellent F 1 score (89.8 F1 points) in a QA task. Further, we present a comparative analysis on the performances of the BERT QA models taking a variety of criteria into account, and demonstrate our findings in the answer span prediction task.
- Published
- 2020
- Full Text
- View/download PDF
7. SChuBERT:Scholarly Document Chunks with BERT-encoding boost Citation Count Prediction
- Author
-
Chandrasekaran, Muthu Kumar, van Dongen, Thomas, Maillette de Buy Wenniger, Gideon, Schomaker, Lambert, Chandrasekaran, Muthu Kumar, van Dongen, Thomas, Maillette de Buy Wenniger, Gideon, and Schomaker, Lambert
- Abstract
Predicting the number of citations of scholarly documents is an upcoming task in scholarly document processing. Besides the intrinsic merit of this information, it also has a wider use as an imperfect proxy for quality which has the advantage of being cheaply available for large volumes of scholarly documents. Previous work has dealt with number of citations prediction with relatively small training data sets, or larger datasets but with short, incomplete input text. In this work we leverage the open access ACL Anthology collection in combination with the Semantic Scholar bibliometric database to create a large corpus of scholarly documents with associated citation information and we propose a new citation prediction model called SChuBERT. In our experiments we compare SChuBERT with several state-of-the-art citation prediction models and show that it outperforms previous methods by a large margin. We also show the merit of using more training data and longer input for number of citations prediction.
- Published
- 2020
8. Improved feature decay algorithms for statistical machine translation
- Author
-
Poncelas, Alberto, primary, Maillette de Buy Wenniger, Gideon, additional, and Way, Andy, additional
- Published
- 2020
- Full Text
- View/download PDF
9. SChuBERT: Scholarly Document Chunks with BERT-encoding boost Citation Count Prediction.
- Author
-
van Dongen, Thomas, primary, Maillette de Buy Wenniger, Gideon, additional, and Schomaker, Lambert, additional
- Published
- 2020
- Full Text
- View/download PDF
10. Structure-Tags Improve Text Classification for Scholarly Document Quality Prediction
- Author
-
Maillette de Buy Wenniger, Gideon, primary, van Dongen, Thomas, additional, Aedmaa, Eleri, additional, Kruitbosch, Herbert Teun, additional, Valentijn, Edwin A., additional, and Schomaker, Lambert, additional
- Published
- 2020
- Full Text
- View/download PDF
11. Improved feature decay algorithms for statistical machine translation.
- Author
-
Poncelas, Alberto, Maillette de Buy Wenniger, Gideon, and Way, Andy
- Subjects
MACHINE translating ,ALGORITHMS ,TASK performance - Abstract
In machine-learning applications, data selection is of crucial importance if good runtime performance is to be achieved. In a scenario where the test set is accessible when the model is being built, training instances can be selected so they are the most relevant for the test set. Feature Decay Algorithms (FDA) are a technique for data selection that has demonstrated excellent performance in a number of tasks. This method maximizes the diversity of the n-grams in the training set by devaluing those ones that have already been included. We focus on this method to undertake deeper research on how to select better training data instances. We give an overview of FDA and propose improvements in terms of speed and quality. Using German-to-English parallel data, first we create a novel approach that decreases the execution time of FDA when multiple computation units are available. In addition, we obtain improvements on translation quality by extending FDA using information from the parallel corpus that is generally ignored. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
12. Combining SMT and NMT back-translated data for efficient NMT
- Author
-
Poncelas, Alberto, Popović, Maja, Shterionov, Dimitar, Maillette de Buy Wenniger, Gideon, Way, Andy, Poncelas, Alberto, Popović, Maja, Shterionov, Dimitar, Maillette de Buy Wenniger, Gideon, and Way, Andy
- Abstract
Neural Machine Translation (NMT) models achieve their best performance when large sets of parallel data are used for training. Consequently, techniques for augmenting the training set have become popular recently. One of these methods is back-translation (Sennrich et al., 2016), which consists on generating synthetic sentences by translating a set of monolingual, target-language sentences using a Machine Translation (MT) model. Generally, NMT models are used for back-translation. In this work, we analyze the performance of models when the training data is extended with synthetic data using different MT approaches. In particular we investigate back-translated data generated not only by NMT but also by Statistical Machine Translation (SMT) models and combinations of both. The results reveal that the models achieve the best performances when the training set is augmented with back-translated data created by merging different MT approaches.
- Published
- 2019
- Full Text
- View/download PDF
13. Transductive data-selection algorithms for fine-tuning neural machine translation
- Author
-
Poncelas, Alberto, Maillette de Buy Wenniger, Gideon, Way, Andy, Poncelas, Alberto, Maillette de Buy Wenniger, Gideon, and Way, Andy
- Abstract
Machine Translation models are trained to translate a variety of documents from one language into another. However, models specifically trained for a particular characteristics of the documents tend to perform better. Fine-tuning is a technique for adapting an NMT model to some domain. In this work, we want to use this technique to adapt the model to a given test set. In particular, we are using transductive data selection algorithms which take advantage the information of the test set to retrieve sentences from a larger parallel set.
- Published
- 2019
14. Adaptation of machine translation models with back-translated data using transductive data selection methods
- Author
-
Poncelas, Alberto, Maillette de Buy Wenniger, Gideon, Way, Andy, Poncelas, Alberto, Maillette de Buy Wenniger, Gideon, and Way, Andy
- Abstract
Data selection has proven its merit for improving Neural Machine Translation (NMT), when applied to authentic data. But the benefit of using synthetic data in NMT training, produced by the popular back-translation technique, raises the question if data selection could also be useful for synthetic data? In this work we use Infrequent n-gram Recovery (INR) and Feature Decay Algorithms (FDA), two transductive data selection methods to obtain subsets of sentences from synthetic data. These methods ensure that selected sentences share n-grams with the test set so the NMT model can be adapted to translate it. Performing data selection on back-translated data creates new challenges as the source-side may contain noise originated by the model used in the back-translation. Hence, finding ngrams present in the test set become more difficult. Despite that, in our work we show that adapting a model with a selection of synthetic data is an useful approach.
- Published
- 2019
15. Data Selection with Feature Decay Algorithms Using an Approximated Target Side
- Author
-
Way, Andy, Poncelas, Alberto, Maillette de Buy Wenniger, Gideon, Turchi, Marco, Niehues, Jan, and Frederico, Marcello
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Machine Translation ,Statistical Machine Translation ,Neural Machine Translation ,Machine translating ,Computation and Language (cs.CL) - Abstract
Data selection techniques applied to neural machine trans-lation (NMT) aim to increase the performance of a model byretrieving a subset of sentences for use as training data.One of the possible data selection techniques are trans-ductive learning methods, which select the data based on thetest set, i.e. the document to be translated. A limitation ofthese methods to date is that using the source-side test setdoes not by itself guarantee that sentences are selected withcorrect translations, or translations that are suitable given thetest-set domain. Some corpora, such as subtitle corpora, maycontain parallel sentences with inaccurate translations causedby localization or length restrictions.In order to try to fix this problem, in this paper we pro-pose to use an approximated target-side in addition to thesource-side when selecting suitable sentence-pairs for train-ing a model. This approximated target-side is built by pre-translating the source-side.In this work, we explore the performance of this generalidea for one specific data selection approach called FeatureDecay Algorithms (FDA).We train German-English NMT models on data selectedby using the test set (source), the approximated target side,and a mixture of both. Our findings reveal that models builtusing a combination of outputs of FDA (using the test setand an approximated target side) perform better than thosesolely using the test set. We obtain a statistically significantimprovement of more than 1.5 BLEU points over a modeltrained with all data, and more than 0.5 BLEU points over astrong FDA baseline that uses source-side information only.
- Published
- 2018
16. No Padding Please: Efficient Neural Handwriting Recognition
- Author
-
Maillette de Buy Wenniger, Gideon, primary, Schomaker, Lambert, additional, and Way, Andy, additional
- Published
- 2019
- Full Text
- View/download PDF
17. Investigating Backtranslation in Neural Machine Translation
- Author
-
Poncelas, Alberto, Shterionov, Dimitar, Way, Andy, Maillette de Buy Wenniger, Gideon, Passban, Peyman, Poncelas, Alberto, Shterionov, Dimitar, Way, Andy, Maillette de Buy Wenniger, Gideon, and Passban, Peyman
- Abstract
A prerequisite for training corpus-based machine translation (MT) systems – either Statistical MT (SMT) or Neural MT (NMT) – is the availability of high-quality parallel data. This is arguably more important today than ever before, as NMT has been shown in many studies to outperform SMT, but mostly when large parallel corpora are available; in cases where data is limited, SMT can still outperform NMT. Recently researchers have shown that back-translating monolingual data can be used to create synthetic parallel corpora, which in turn can be used in combination with authentic parallel data to train a high-quality NMT system. Given that large collections of new parallel text become available only quite rarely, back-translation has become the norm when building state-of-the-art NMT systems, especially in resource-poor scenarios. However, we assert that there are many unknown factors regarding the actual effects of back-translated data on the translation capabilities of an NMT model. Accordingly, in this work we investigate how using back-translated data as a training corpus – both as a separate standalone dataset as well as combined with human-generated parallel data – affects the performance of an NMT model. We use incrementally larger amounts of back-translated data to train a range of NMT systems for German-to-English, and analyse the resulting translation performance.
- Published
- 2018
18. Feature Decay Algorithms for Neural Machine Translation
- Author
-
Poncelas, Alberto, Maillette de Buy Wenniger, Gideon, Way, Andy, Poncelas, Alberto, Maillette de Buy Wenniger, Gideon, and Way, Andy
- Abstract
Neural Machine Translation (NMT) systems require a lot of data to be competitive. For this reason, data selection techniques are used only for fine-tuning systems that have been trained with larger amounts of data. In this work we aim to use Feature Decay Algorithms (FDA) data selection techniques not only to fine-tune a system but also to build a complete system with less data. Our findings reveal that it is possible to find a subset of sentence pairs, that outperforms by 1.11 BLEU points the full training corpus, when used for training a German-English NMT system.
- Published
- 2018
19. Data selection with feature decay algorithms using an approximated target side
- Author
-
Turchi, Marco, Niehues, Jan, Frederico, Marcello, Way, Andy, Poncelas, Alberto, Maillette de Buy Wenniger, Gideon, Turchi, Marco, Niehues, Jan, Frederico, Marcello, Way, Andy, Poncelas, Alberto, and Maillette de Buy Wenniger, Gideon
- Abstract
Data selection techniques applied to neural machine trans-lation (NMT) aim to increase the performance of a model byretrieving a subset of sentences for use as training data.One of the possible data selection techniques are trans-ductive learning methods, which select the data based on thetest set, i.e. the document to be translated. A limitation ofthese methods to date is that using the source-side test setdoes not by itself guarantee that sentences are selected withcorrect translations, or translations that are suitable given thetest-set domain. Some corpora, such as subtitle corpora, maycontain parallel sentences with inaccurate translations causedby localization or length restrictions.In order to try to fix this problem, in this paper we pro-pose to use an approximated target-side in addition to thesource-side when selecting suitable sentence-pairs for train-ing a model. This approximated target-side is built by pre-translating the source-side.In this work, we explore the performance of this generalidea for one specific data selection approach called FeatureDecay Algorithms (FDA).We train German-English NMT models on data selectedby using the test set (source), the approximated target side,and a mixture of both. Our findings reveal that models builtusing a combination of outputs of FDA (using the test setand an approximated target side) perform better than thosesolely using the test set. We obtain a statistically significantimprovement of more than 1.5 BLEU points over a modeltrained with all data, and more than 0.5 BLEU points over astrong FDA baseline that uses source-side information only.
- Published
- 2018
20. Data selection with feature decay algorithms using an approximated target side
- Author
-
Poncelas, Alberto, Maillette de Buy Wenniger, Gideon, Way, Andy, Poncelas, Alberto, Maillette de Buy Wenniger, Gideon, and Way, Andy
- Abstract
Data selection techniques applied to neural machine translation (NMT) aim to increase the performance of a model by retrieving a subset of sentences for use as training data. One of the possible data selection techniques are transductive learning methods, which select the data based on the test set, i.e. the document to be translated. A limitation of these methods to date is that using the source-side test set does not by itself guarantee that sentences are selected with correct translations, or translations that are suitable given the test-set domain. Some corpora, such as subtitle corpora, may contain parallel sentences with inaccurate translations caused by localization or length restrictions. In order to try to fix this problem, in this paper we propose to use an approximated target-side in addition to the source-side when selecting suitable sentence-pairs for training a model. This approximated target-side is built by pretranslating the source-side. In this work, we explore the performance of this general idea for one specific data selection approach called Feature Decay Algorithms (FDA). We train German-English NMT models on data selected by using the test set (source), the approximated target side, and a mixture of both. Our findings reveal that models built using a combination of outputs of FDA (using the test set and an approximated target side) perform better than those solely using the test set. We obtain a statistically significant improvement of more than 1.5 BLEU points over a model trained with all data, and more than 0.5 BLEU points over a strong FDA baseline that uses source-side information only.
- Published
- 2018
21. Applying N-gram alignment entropy to improve feature decay algorithms
- Author
-
Poncelas, Alberto, Maillette de Buy Wenniger, Gideon, Way, Andy, Poncelas, Alberto, Maillette de Buy Wenniger, Gideon, and Way, Andy
- Abstract
Data Selection is a popular step in Machine Translation pipelines. Feature Decay Algorithms (FDA) is a technique for data selection that has shown a good performance in several tasks. FDA aims to maximize the coverage of n-grams in the test set. However, intuitively, more ambiguous n-grams require more training examples in order to adequately estimate their translation probabilities. This ambiguity can be measured by alignment entropy. In this paper we propose two methods for calculating the alignment entropies for n-grams of any size, which can be used for improving the performance of FDA. We evaluate the substitution of the n-gram-specific entropy values computed by these methods to the parameters of both the exponential and linear decay factor of FDA. The experiments conducted on German-to-English and Czech-to-English translation demonstrate that the use of alignment entropies can lead to an increase in the quality of the results of FDA.
- Published
- 2017
22. Elastic-substitution decoding for hierarchical SMT: efficiency, richer search and double labels
- Author
-
Maillette de Buy Wenniger, Gideon, Sima'an, Khalil, Way, Andy, Maillette de Buy Wenniger, Gideon, Sima'an, Khalil, and Way, Andy
- Abstract
Elastic-substitution decoding (ESD), first introduced by Chiang (2010), can be important for obtaining good results when applying labels to enrich hierarchical statistical machine translation (SMT). However, an efficient implementation is essential for scalable application. We describe how to achieve this, contributing essential details that were missing in the original exposition. We compare ESD to strict matching and show its superiority for both reordering and syntactic labels. To overcome the sub-optimal performance due to the late evaluation of features marking label substitution types, we increase the diversity of the rules explored during cube pruning initialization with respect to labels their labels. This approach gives significant improvements over basic ESD and performs favorably compared to extending the search by increasing the cube pruning pop-limit. Finally, we look at combining multiple labels. The combination of reordering labels and target-side boundary-tags yields a significant improvement in terms of the word-order sensitive metrics Kendall reordering score and METEOR. This confirms our intuition that the combination of reordering labels and syntactic labels can yield improvements over either label by itself, despite increased sparsity.
- Published
- 2017
23. Applying N-gram Alignment Entropy to Improve Feature Decay Algorithms
- Author
-
Poncelas, Alberto, primary, Maillette de Buy Wenniger, Gideon, additional, and Way, Andy, additional
- Published
- 2017
- Full Text
- View/download PDF
24. Bilingual Markov Reordering Labels for Hierarchical SMT
- Author
-
Maillette de Buy Wenniger, Gideon, primary and Sima'an, Khalil, additional
- Published
- 2014
- Full Text
- View/download PDF
25. Improving transductive data selection algorithms for machine translation
- Author
-
Poncelas, Alberto, Way, Andy, and Maillette de Buy Wenniger, Gideon
- Subjects
Machine learning ,Computational linguistics ,Machine translating - Abstract
In this work, we study different ways of improving Machine Translation models by using the subset of training data that is the most relevant to the test set. This is achieved by using Transductive Algoritms (TA) for data selection. In particular, we explore two methods: Infrequent N-gram Recovery (INR) and Feature Decay Algorithms (FDA). Statistical Machine Translation (SMT) models do not always perform better when more data are used for training. Using these techniques to extract the training sentences leads to a better performance of the models for translating a particular test set than using the complete training dataset. Neural Machine Translation (NMT) can outperform SMT models, but they require more data to achieve the best performance. In this thesis, we explore how INR and FDA can also be beneficial to improving NMT models with just a fraction of the available data. On top of that, we propose several improvements for these data-selection methods by exploiting the information on the target side. First, we use the alignment between words in the source and target sides to modify the selection criteria of these methods. Those sentences containing n-grams that are more difficult to translate should be promoted so that more occurrences of these n-grams are selected. Another extension proposed is to select sentences based not on the test set but on an MT-generated approximated translation (so the target-side of the sentences are considered in the selection criteria). Finally, target-language sentences can be translated into the source-language so that INR and FDA have more candidates to select sentences from.
- Published
- 2019
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.