Back to Search
Start Over
Revisiting NMT for normalization of early English letters
- Source :
- LaTeCH@NAACL-HLT
- Publication Year :
- 2019
- Publisher :
- The Association for Computational Linguistics, 2019.
-
Abstract
- This paper studies the use of NMT (neural machine translation) as a normalization method for an early English letter corpus. The corpus has previously been normalized so that only less frequent deviant forms are left out without normalization. This paper discusses different methods for improving the normalization of these deviant forms by using different approaches. Adding features to the training data is found to be unhelpful, but using a lexicographical resource to filter the top candidates produced by the NMT model together with lemmatization improves results.
- Subjects :
- Normalization (statistics)
Training set
Machine translation
Computer science
business.industry
Lemmatisation
education
02 engineering and technology
010501 environmental sciences
computer.software_genre
Lexicographical order
01 natural sciences
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
6121 Languages
Artificial intelligence
business
computer
Natural language processing
0105 earth and related environmental sciences
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- LaTeCH@NAACL-HLT
- Accession number :
- edsair.doi.dedup.....6a14db640c901874a7d0abe07ca6b56a