Back to Search Start Over

Assessing the Evolution of Written Language through Data Mining in Large Corpora

Authors :
Cuauhtemoc Garcia-Garcia
Source :
ProQuest LLC. 2016Ph.D. Dissertation, Stanford University.
Publication Year :
2016

Abstract

Across the centuries, the question of the origin of language has captivated the human imagination. Many theories have been proposed to address fundamental questions such as: Where do languages come from? How do they evolve? What are the societal drivers of this change? Historically, one of the biggest challenges in addressing these questions has been a lack of large-scale empirical data, which has made it difficult to rigorously test hypotheses and correlate language change with cultural patterns. My dissertation research analyzes how written Spanish and Portuguese in the Americas diverged from their European counterparts, focusing on within-language shifts. Methodologically, I carry out this research by data mining in large digitized corpora (~300,000 documents ranging from the twelfth century to today) as well as performing close reading and contextual analysis of selected material. The interdisciplinary nature of my research was carried out as collaborative work between the Division of Literatures, Cultures, and Languages, the Department of Biology and the Stanford Libraries. The breadth of my project requires close attention to historical and social context, which may often be drivers of the written language changes that we trace computationally. For instance, Portugal imposed a 300-year prohibition of the printing press and universities in Brazil, whereas these institutions were introduced into Hispanic America soon after the arrival of Columbus. Accordingly, I found that written Spanish changed relatively "smoothly", reflecting a continuous assimilation of changes in the spoken form, while comparable changes appear much more "abruptly" in written Brazilian Portuguese. This work has an extensive study of personal pronoun evolution in Portuguese, and have found that past prohibition of the printing press, coupled with shifts in literary movements reflecting increased national sentiment, was a major driver of pronoun shift in nineteenth-century Brazilian Portuguese (Chapters 1 and 2). In Chapter 3, I analyze the evolution of personal pronouns in Spanish and show how regional writing styles affected the overall pronoun shift. Finally, I hope that this research will shed light on the main compleSpeech x socioeconomic and geographical factors that led to divergent evolution of written language between the Americas and the Iberian Peninsula. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]

Details

Language :
English
ISBN :
979-83-575-0839-3
ISSN :
3575-0839
ISBNs :
979-83-575-0839-3
Database :
ERIC
Journal :
ProQuest LLC
Publication Type :
Dissertation/ Thesis
Accession number :
ED649700
Document Type :
Dissertations/Theses - Doctoral Dissertations