Learner Corpus Anonymization in the Age of GDPR : Insights from the Creation of a Learner Corpus of Swedish

Authors :: Megyesi, Beáta
Granstedt, Lena
Johansson, Sofia
Prentice, Julia
Rosen, Dan
Schenström, Carl-Johan
Sundberg, Gunlög
Wiren, Mats
Volodina, Elena
Publication Year :: 2018
Publisher :: Stockholms universitet, Svenska/Nordiska språk, 2018.
Abstract: This paper reports on the status of learner corpus anonymization for the ongoing research infrastructure project SweLL. The main project aim is to deliver and make available for research a well-annotated corpus of essays written by second language (L2) learners of Swedish. As the practice shows, annotation of learner texts is a sensitive process demanding a lot of compromises between ethical and legal demands on the one hand, and research and technical demands, on the other. Below, is a concise description of the current status of pseudonymization of language learner data to ensure anonymity of the learners, with numerous examples of the above-mentioned compromises.

Subjects :: General Language Studies and Linguistics
Jämförande språkvetenskap och allmän lingvistik
Språkteknologi (språkvetenskaplig databehandling)
Language Technology (Computational Linguistics)

Tools