1. The linguistic construal of disciplinarity: A data-mining approach using register features.
- Author
-
Teich, Elke, Degaetano ‐ Ortlieb, Stefania, Fankhauser, Peter, Kermes, Hannah, and Lapshinova ‐ Koltunski, Ekaterina
- Subjects
- *
LANGUAGE & languages , *LINGUISTICS , *MATHEMATICAL models , *RESEARCH funding , *SCIENCE , *DATA mining , *THEORY - Abstract
We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus ( scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes ( LSP) and have implications for various natural language processing ( NLP) tasks, for example, authorship attribution, text mining, or training NLP tools. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF