4 results on '"historical corpus"'
Search Results
2. Creation of an annotated corpus of Old and Middle Hungarian court records and private correspondence.
- Author
-
Novák, Attila, Gugán, Katalin, Varga, Mónika, and Dömötör, Adrienne
- Subjects
- *
NATIVE language , *METADATA , *SOCIOLINGUISTIC research , *ANNOTATIONS , *COURT records - Abstract
The paper introduces a novel annotated corpus of Old and Middle Hungarian (16–18 century), the texts of which were selected in order to approximate the vernacular of the given historical periods as closely as possible. The corpus consists of testimonies of witnesses in trials and samples of private correspondence. The texts are not only analyzed morphologically, but each file contains metadata that would also facilitate sociolinguistic research. The texts were segmented into clauses, manually normalized and morphosyntactically annotated using an annotation system consisting of the PurePos PoS tagger and the Hungarian morphological analyzer HuMor originally developed for Modern Hungarian but adapted to analyze Old and Middle Hungarian morphological constructions. The automatically disambiguated morphological annotation was manually checked and corrected using an easy-to-use web-based manual disambiguation interface. The normalization process and the manual validation of the annotation required extensive teamwork and provided continuous feedback for the refinement of the computational morphology and iterative retraining of the statistical models of the tagger. The paper discusses some of the typical problems that occurred during the normalization procedure and their tentative solutions. Besides, we also describe the automatic annotation tools, the process of semi-automatic disambiguation, and the query interface, a special function of which also makes correction of the annotation possible. Displaying the original, the normalized and the parsed versions of the selected texts, the beta version of the first fully normalized and annotated historical corpus of Hungarian is freely accessible at the address
http://tmk.nytud.hu/ . [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
3. Limitaciones en el uso de corpus diacrónicos del español. Nuevas aportaciones desde el proyecto de investigación Post Scriptum
- Author
-
Vaamonde Dos Santos, Gael and Repositório da Universidade de Lisboa
- Subjects
Diachronic corpus ,corpus histórico ,Corpus annotation ,cartas privadas ,Private letters ,Ordinary writing ,historical corpus ,TEI-XML ,anotación de corpus - Abstract
The purpose of this paper is two-fold. On the one hand, some constraints on the computerized diachronic corpora currently available for Spanish research are explained. On the other hand, the Post Scriptum Project is presented, which aims to publish a collection of private letters written in Spanish and Portuguese along the Early Modern period. Post Scriptum can provide solutions to the shortcomings of other similar tools, making it a suitable resource for research in Historical Linguistics and an ideal complement to the existing large corpora., Este trabajo tiene un doble objetivo. Por un lado, se exponen algunas limitaciones de los corpus diacrónicos informatizados que están actualmente disponibles para la investigación del español. Por otro lado, se da a conocer el proyecto de investigación Post Scriptum, que pretende publicar un conjunto de cartas privadas escritas en español y portugués durante la Edad Moderna. Post Scriptum puede aportar soluciones a las carencias de otros corpus similares, convirtiéndose en un recurso adecuado para la investigación en lingüística histórica y en un complemento idóneo de los grandes corpus existentes.
- Published
- 2015
4. Post Scriptum: Archivo Digital de Escritura Cotidiana
- Author
-
Vaamonde, Gael, Costa, Ana L., Marquilhas, Rita, Clara Pinto, Pratas, Fernanda, and Repositório da Universidade de Lisboa
- Subjects
Diachronic linguistics ,Cartas ,Portuguese ,Linguistica diacrónica ,edición digital ,Lingüística diacrónica ,Corpus annotation ,cartas privadas ,Anotación de corpus ,Spanish ,digital edition ,TEI-XML ,Português ,Portugués ,Español ,corpus histórico ,Letters ,digital humanities ,private letters ,historical corpus ,humanidades digitales - Abstract
El proyecto de investigación Post Scriptum: Archivo Digital de Escritura Cotidiana (P.S.) tiene por objeto la recuperación y publicación de cartas privadas escritas en España y Portugal durante la Época Moderna. Estas epístolas, en su mayoría inéditas, fueron producidas por autores muy diversos y de diferente condición social. Así, podemos encontrar hombres o mujeres, adultos o niños, amos o criados, ladrones, soldados, artesanos, curas, políticos, y otro tipo de categorías sociales. Sus misivas sobrevivieron excepcionalmente, al cruzarse sus vidas con los medios de persecución tanto de la Inquisición como de distintos tribunales civiles y eclesiásticos, instituciones que solían hacer uso de la corres- pondencia privada como prueba de los delitos que estaban siendo juzgados. Estas fuentes escritas suelen presentar una retórica (casi) oral, tematizando asuntos cotidianos que hasta ahora no se habían estudiado más que a partir de casos aislados. En este trabajo, se explicará la metodología utilizada en la edición digital de los documentos para su disponibilidad en línea; además, se dará a conocer el proceso de modernización de los textos y su posterior etiquetación morfológica y sintáctica. El objetivo final es elaborar un corpus diacrónico anotado que sirva como recurso electrónico para el estudio lingüístico e histórico del español y el portugués., Post Scriptum: A Digital Archive of Ordinary Writings (P.S.) is a project that aims to collect and publish Portuguese and Spanish private letters written along the Modern Ages. These documents are unpublished epistolary writings, written by authors from different social backgrounds. They could be either masters or servants, adults or children, men or women, thieves, soldiers, artisans, priests, political activists, among other kinds of social agents. Their epistolarity survived by chance, when their paths met the persecution means used by the Inquisition and the civil courts, two institutions that used private cor- respondence as criminal evidence. These textual resources often present an (almost) oral rhetoric, treating everyday issues of past centuries in a register that hasn’t been easy to study, apart from brief examples. In the proposed paper, discussion over the methodological options that lead to the digital edition available online will be raised. Further, the modernization of texts the POS and syntactic annotation will be explained. The aim is to develop a diachronic and annotated corpus that could be used as an electronical resource for linguistic and historical research of Spanish and Portuguese.
- Published
- 2014
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.