Back to Search
Start Over
Searching for Discriminative Metadata of Heterogenous Corpora
- Source :
- Fourteenth International Workshop on Treebanks and Linguistic Theories (TLT14), Fourteenth International Workshop on Treebanks and Linguistic Theories (TLT14), Dec 2015, Varsovie, Poland. pp.72-82
- Publication Year :
- 2015
- Publisher :
- HAL CCSD, 2015.
-
Abstract
- International audience; In this paper, we use machine learning techniques for part-of-speech tagging and parsing to explore the specificities of a highly heterogeneous corpus. The corpus used is a treebank of Old French made of texts which differ with respect to several types of metadata: production date, form (verse/prose), domain , and dialect. We conduct experiments in order to determine which of these metadata are the most discriminative and to induce a general methodology .
- Subjects :
- [INFO.INFO-TT] Computer Science [cs]/Document and Text Processing
[SHS.LANGUE] Humanities and Social Sciences/Linguistics
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
dependency parsing
machine learning
[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL]
Old French
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
heterogeneous corpus exploration
[INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]
[INFO.INFO-HC] Computer Science [cs]/Human-Computer Interaction [cs.HC]
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
POS labelling
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Fourteenth International Workshop on Treebanks and Linguistic Theories (TLT14), Fourteenth International Workshop on Treebanks and Linguistic Theories (TLT14), Dec 2015, Varsovie, Poland. pp.72-82
- Accession number :
- edsair.dedup.wf.001..60f5d330c45ea51657e5b7b4e580bfc3