1. A Bridge to Somewhere: Coded Corpora, Text Mining, and the Future 'Beyond' Political Science.
- Author
-
Shulman, Stuart W.
- Subjects
- *
TEXT mining , *ANNOTATIONS , *CODING theory , *INFORMATION retrieval , *RESEARCH methodology , *POLITICAL science - Abstract
Coded corpora are used for basic and applied research in social and computational sciences. Yet the manual annotation of collections of textâthe codingâis often conducted in an ad hoc, inconsistent, non-replicable, and unreliable manner. It is not that there are no techniques or that tools do not exist. Rather, it is that they tend to be hidden away in small niche subfields where knowledge of them is limited to a small research community. As a result, many researchers code text for a variety of reasons, but it remains very difficult to share these annotations with other researchers or to work on them with researchers from other disciplines. This paper argues that what are needed are more universal annotation metrics, a standard lexicon, and widely shared, semi-automated coding tools that make the work of humans more useful, fungible, and durable. Ideally, these tools would be interoperable, or combined in a single system. The new system would allow humans to create annotations and allow other experts to efficiently validate and adjudicate their work. At a deeper level, this calls for better codified approaches to training and deploying codersâan annotation science subfieldâso that a more coherent and collaborative research community can form around this promising methodological domain. ..PAT.-Unpublished Manuscript [ABSTRACT FROM AUTHOR]
- Published
- 2007