1. Bewerten in Blogkommentaren : Mehrebenenannotation sprachlichen Bewertens
- Author
-
Trevisan, Bianka and Jakobs, Eva-Maria
- Subjects
multi-level annotation ,Text Mining ,sentiment analysis ,Mehrebenenannotation ,Sprachwissenschaft, Linguistik ,Blog ,text analysis ,ddc:400 ,NLP ,Textanalyse ,World Wide Web 2.0 - Abstract
Previously, linguistic text analysis is performed manually. However, today there are new methodological approaches that allow researchers more efficient ways of working. A new methodological approach is the usage of Text Mining methods and techniques. The methodology of Text Mining comes originally from computer science and has been, inter alia, used for the purpose of frequency and co-occurrence analysis. Subject of this work is the combination of the linguistic evaluation theory and Text Mining. The linguistic evaluation theory by Sandig (1979) describes the act of evaluation as a linguistic act, where individual evaluation criteria are verbalized by linguistic expressions. Verbal evaluation is done by using an inventory of evaluative expressions, which may vary depending on text type, such as for the text type blog comment. These text types are characterized in comparison to genuine written linguistically oriented text types (e.g. journalistic texts) by specific linguistic phenomena such as standard deviation, interactive units within the meaning of Zifonun et al. (1997) or onomatopoeic expressions. They have to be taken into account in the method development. Concerning this background, the following research questions arise: What does a machine (and thus the automation) afford to meet these challenges? What evaluation-related linguistic phenomena must be considered in the automation? How do Text Mining methods and techniques be developed further to meet these challenges? The aim of this work is the development of a first theoretical-methodological approach for the automatic analysis of verbal evaluations in blog comments by Text Mining considering text type-specific linguistic phenomena. For this purpose, existing manual and automatic methods of text analysis are adapted and optimized. The methodology development is carried out using the example of blog comments based on a corpus dealing with mobile communication systems. The methodology development provides a two-stage procedure consisting of a pre- and a main study. Subject of the pre-study is the evaluation of Text Mining methods and the identification of empirical problems by manual and automatic analysis of blog comments; the evaluation of automatic analysis tools is done with reference to the text analysis software PASW Modeler and MySQL. To determine the empirical problems, frequency, co-occurrence and sentiment analysis are carried out. The identified empirical, text type-related problems are classified and used as implications for the main study. Subject of the main study is to develop an approach for the identification of opinion- indicating expressions in German blog comments. The data base consists of two sub-corpora collected criteria-based. The sub-corpora are analyzed with methods and tools of corpus linguistics and Natural Language Processing, the analysis is carried out by means of linguistic multi-level annotation (in EXMARaLDA). Final result is a fine-grained, feature-based, linguistic multi-level annotation model, which is evaluated by inter-annotator agreement. Using the multi-level annotation model the sub-corpora are finally annotated by five annotators and a gold standard is derived.
- Published
- 2014