Back to Search Start Over

The importance of sampling frames in representative historical corpora : a case study of Parisian theater

Authors :
Angus B. Grieve-Smith
Source :
CogniTextes, Vol 19 (2019)
Publication Year :
2019
Publisher :
Association Française de Linguistique Cognitive, 2019.

Abstract

Cognitive linguistics makes specific claims about language use, and corpora are our most powerful tool to test those claims. Representative sampling (Laplace 1814) is a technique that allows us to study smaller, more manageable corpora, and generalize our results to a broader sampling frame. For a sampled corpus to be relevant to our research questions, its sampling frame must have an understandable connection to the subject of our research question.In my dissertation study (Grieve-Smith 2009) I tested the type frequency hypothesis of analogical extension (Bybee 1995) using the FRANTEXT corpus (CNRTL 2018). In this study I test the theatrical texts in FRANTEXT from 1800-1815 against the new Digital Parisian Stage corpus, sampled from Wicks (1950 et seq.), a catalog of every play that premiered in Paris in the nineteenth century. Declarative sentence negations in the Digital Parisian Stage corpus occurred with ne … pas in 73.9 % of tokens, while in FRANTEXT they only occurred with ne … pas in 50.5 % of tokens. This shows that FRANTEXT is biased in favor of elite literary language. To properly test usage-based theories of language change we will need a representative corpus covering a century or more.

Details

Language :
English, French
ISSN :
19585322
Volume :
19
Database :
Directory of Open Access Journals
Journal :
CogniTextes
Publication Type :
Academic Journal
Accession number :
edsdoj.9362880801bd4c26b690e8657f393058
Document Type :
article
Full Text :
https://doi.org/10.4000/cognitextes.1671