Back to Search
Start Over
An autoencoder with bilingual sparse features for improved statistical machine translation
- Source :
- ICASSP
- Publication Year :
- 2014
- Publisher :
- IEEE, 2014.
-
Abstract
- Though sparse features have produced significant gains over traditional dense features in statistical machine translation, careful feature selection and feature engineering are necessary to avoid overfitting in optimizations. However, many sparse features are highly overlapping with each other; that is, they cover the same or similar information of translational equivalence from slightly different points of view, and eventually overfit easily with only very feature training samples in given bilingual stochastic context-free grammar (SCFG) rules. We propose a natural autoencoder that maps all the discrete and overlapping sparse features for each SCFG rule into a continuous vector, so that the information encoded in sparse feature vectors becomes a dense vector that may enjoy more samples during training and avoid overfitting. Our experiments showed that for a 33million bilingual SCFG rules statistical machine translation system, the autoencoder generalizes much better than sparse features alone using the same optimization framework.
- Subjects :
- Feature engineering
Machine translation
Computer science
business.industry
Feature vector
Feature selection
Pattern recognition
Overfitting
Machine learning
computer.software_genre
Autoencoder
Synchronous context-free grammar
Artificial intelligence
Equivalence (formal languages)
business
computer
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Accession number :
- edsair.doi...........ce42e8493f896d4effa9f84db0e8c59d
- Full Text :
- https://doi.org/10.1109/icassp.2014.6854978