10 results on '"Biemann, Chris"'
Search Results
2. Applying Semantic Parsing to Question Answering Over Linked Data: Addressing the Lexical Gap
- Author
-
Hakimov, Sherzod, Unger, Christina, Walter, Sebastian, Cimiano, Philipp, Biemann, Chris, Handschuh, Siegfried, Freitas, Andre, Meziane, Farid, and Metais, Elsiabeth
- Subjects
Vocabulary ,Parsing ,Information retrieval ,Computer science ,business.industry ,media_common.quotation_subject ,Parse tree ,QALD ,Linked data ,Ontology (information science) ,computer.software_genre ,Lexical item ,semantic parsing ,question answering ,Question answering ,Artificial intelligence ,CCG ,business ,computer ,Natural language ,Natural language processing ,media_common - Abstract
Question answering over linked data has emerged in the past years as an important topic of research in order to provide natural language access to a growing body of linked open data on the Web. In this paper we focus on analyzing the lexical gap that arises as a challenge for any such question answering system. The lexical gap refers to the mismatch between the vocabulary used in a user question and the vocabulary used in the relevant dataset. We implement a semantic parsing approach and evaluate it on the QALD-4 benchmark, showing that the performance of such an approach suffers from training data sparseness. Its performance can, however, be substantially improved if the right lexical knowledge is available. To show this, we model a set of lexical entries by hand to quantify the number of entries that would be needed. Further, we analyze if a state-of-the-art tool for inducing ontology lexica from corpora can derive these lexical entries automatically. We conclude that further research and investments are needed to derive such lexical knowledge automatically or semi-automatically.
- Published
- 2015
3. Lemonade: A Web Assistant for Creating and Debugging Ontology Lexica
- Author
-
Rico, Mariano, Unger, Christina, Biemann, Chris, Handschuh, Siegfried, Freitas, Andre, Meziane, Farid, and Metais, Elisabeth
- Subjects
World Wide Web ,Debugging ,Computer science ,Process (engineering) ,media_common.quotation_subject ,Grammatical Framework ,Ontology (information science) ,Lexicon ,Porting ,Sweetening ,media_common - Abstract
The current state of the art in providing lexicalizations for ontologies is the lemon model. Based on experiences in creating a lemon lexicon for the DBpedia ontology in English and subsequently porting it to Spanish and German, we show that creating ontology lexica is a time consuming, often tedious and also error-prone process. As a remedy, this paper introduces Lemonade, an assistant that facilitates the creation of lexica and helps users in spotting errors and inconsistencies in the created lexical entries, thereby ‘sweetening’ the otherwise ‘bitter’ lemon.
- Published
- 2015
4. Towards a Network Model of the Coreness of Texts: An Experiment in Classifying Latin Texts Using the TTLab Latin Tagger
- Author
-
Mehler, Alexander, vor der Brück, Tim, Gleim, Rüdiger, Geelhaar, Tim, and Biemann, Chris
- Subjects
Conditional random field ,Structure (mathematical logic) ,business.industry ,Computer science ,Representation (arts) ,Lexicon ,computer.software_genre ,Task (project management) ,Dynamics (music) ,Medieval Latin ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Preprocessor ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
The analysis of longitudinal corpora of historical texts requires the integrated development of tools for automatically preprocessing these texts and for building representation models of their genre- and register-related dynamics. In this chapter we present such a joint endeavor that ranges from resource formation via preprocessing to network-based text representation and classification. We start with presenting the so-called TTLab Latin Tagger (TLT) that preprocesses texts of classical and medieval Latin. Its lexical resource in the form of the Frankfurt Latin Lexicon (FLL) is also briefly introduced. As a first test case for showing the expressiveness of these resources, we perform a tripartite classification task of authorship attribution, genre detection and a combination thereof. To this end, we introduce a novel text representation model that explores the core structure (the so-called coreness) of lexical network representations of texts. Our experiment shows the expressiveness of this representation format and mediately of our Latin preprocessor.
- Published
- 2014
5. Using Linked Data to Evaluate the Impact of Research and Development in Europe: A Structural Equation Model
- Author
-
Zaveri, Amrapali, Vissoci, Joao Ricardo Nickenig, Daraio, Cinzia, Pietrobon, Ricardo, Alani, Harith, Kagal, Lalana, Fokoue, Achille, Groth, Paul, Biemann, Chris, Parreira, Josiane Xavier, Aroyo, Lora, Noy, Natasha, Welty, Chris, and Janowicz, Krzysztof
- Subjects
Operations research ,business.industry ,Computer science ,zaveri sys:relevantFor:infai MOLE ,Linked data ,Latent variable ,Citation impact ,Structural equation modeling ,Open data ,Health care ,Econometrics ,media_common.cataloged_instance ,Factorial analysis ,European union ,business ,media_common - Abstract
Europe has a high impact on the global biomedical literature, having contributed with a growing number of research articles and a significant citation impact. However, the impact of research and development generated by european countries on economic, educational and healthcare performance is poorly understood. The recent linking open data (lod) project has made a lot of data sources publicly available and in human-readable formats. In this paper, we demonstrate the utility of lod in assessing the impact of research and development (r&d) on the economic, education and healthcare performance in europe. We extract relevant variables from two lod datasets, namely world bank and eurostat. We analyze the data for 20 out of the 27 european countries over a span of 10 years (1999 to 2009). We use a structural equation modeling (sem) approach to quantify the impact of r&d on the different measures. We perform different exploratory and confirmatory factorial analysis evaluations which gives rise to four latent variables that are included in the model: (i) research and development (r&d), (ii) economic performance (ecop), (iii) educational performance (edup), (iv) healthcare performance (hcarep) of the european countries. Our results indicate the importance of r&d to the overall development of the european educational and healthcare performance (directly) and economic performance (indirectly). The results also shows the practical applicability of lod to estimate this impact.keywordseuropean unionlatent variablestructural equation modelinghealth expenditurelink datathese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
- Published
- 2013
6. Ord i Dag: Mining Norwegian Daily Newswire.
- Author
-
Salakoski, Tapio, Ginter, Filip, Pyysalo, Sampo, Pahikkala, Tapio, Eiken, Unni Cathrine, Liseth, Anja Therese, Witschel, Hans Friedrich, Richter, Matthias, and Biemann, Chris
- Abstract
We present Ord i Dag, a new service that displays today's most important keywords. These are extracted fully automatically from Norwegian online newspapers. Describing the complete process, we provide an entirely disclosed method for media monitoring and news summarization. For keyword extraction, a reference corpus serves as background about average language use, which is contrasted with the current day's word frequencies. Having detected the most prominent keywords of a day, we introduce several ways of grouping and displaying them in intuitive ways. A discussion about possible applications concludes. Up to now, the service is available for Norwegian and German. As only some shallow language-specific processing is needed, it can easily be set up for other languages. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
7. Disentangling from Babylonian Confusion - Unsupervised Language Identification.
- Author
-
Gelbukh, Alexander, Biemann, Chris, and Teresniak, Sven
- Abstract
This work presents an unsupervised solution to language identification. The method sorts multilingual text corpora on the basis of sentences into the different languages that are contained and makes no assumptions on the number or size of the monolingual fractions. Evaluation on 7-lingual corpora and bilingual corpora show that the quality of classification is comparable to supervised approaches and works almost error-free from 100 sentences per language on. [ABSTRACT FROM AUTHOR]
- Published
- 2005
8. A rule-based approach to implicit emotion detection in text
- Author
-
Orizu Udochukwu, Yulan He, Biemann, Chris, Handschuh, Siegfried, Freitas, André, and et al
- Subjects
Matching (statistics) ,Computer science ,business.industry ,Pipeline (computing) ,Emotion detection ,Rule-based system ,Lexicon ,computer.software_genre ,Task (project management) ,Margin (machine learning) ,Classifier (linguistics) ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
Most research in the area of emotion detection in written text focused on detecting explicit expressions of emotions in text. In this paper, we present a rule-based pipeline approach for detecting implicit emotions in written text without emotion-bearing words based on the OCC Model. We have evaluated our approach on three different datasets with five emotion categories. Our results show that the proposed approach outperforms the lexicon matching method consistently across all the three datasets by a large margin of 17–30% in F-measure and gives competitive performance compared to a supervised classifier. In particular, when dealing with formal text which follows grammatical rules strictly, our approach gives an average F-measure of 82.7% on “Happy”, “Angry-Disgust” and “Sad”, even outperforming the supervised baseline by nearly 17% in F-measure. Our preliminary results show the feasibility of the approach for the task of implicit emotion detection in written text.
- Published
- 2015
9. Improving Supervised Classification Using Information Extraction
- Author
-
Lidia Pivovarova, Roman Yangarber, Mian Du, Matthew Pierce, Biemann, Chris, Handschuh, Siegfried, Freitas, André, Meziane, Farid, Métais, Elisabeth, Department of Computer Science, and Computational Linguistics research group / Roman Yangarber
- Subjects
Computer science ,business.industry ,education ,Supervised learning ,113 Computer and information sciences ,Machine learning ,computer.software_genre ,Information extraction ,ComputingMethodologies_PATTERNRECOGNITION ,Industry sector ,Classification methods ,Data mining ,Artificial intelligence ,business ,Classifier (UML) ,computer ,Statistical classifier - Abstract
We explore supervised learning for multi-class, multi-label text classification, focusing on real-world settings, where the distribution of labels changes dynamically over time. We use the PULS Information Extraction system to collect information about the distribution of class labels over named entities found in text. We then combine a knowledge-based rote classifier with statistical classifiers to obtain better performance than either classification method alone. The resulting classifier yields a significant improvement in macro-averaged F-measure compared to the state of the art, while maintaining comparable micro-average.
- Published
- 2015
10. Statistical Machine Translation of Subtitles: From OpenSubtitles to TED
- Author
-
Martin Volk, Mathias Müller, University of Zurich, Gurevych, Iryna, Biemann, Chris, and Zesch, Torsten
- Subjects
Machine translation ,business.industry ,Computer science ,410 Linguistics ,000 Computer science, knowledge & systems ,computer.software_genre ,Machine translation software usability ,Linguistics ,Example-based machine translation ,Rule-based machine translation ,10105 Institute of Computational Linguistics ,Subtitle ,1700 General Computer Science ,Artificial intelligence ,Evaluation of machine translation ,2614 Theoretical Computer Science ,business ,computer ,Sentence ,Natural language processing ,BLEU ,Word order - Abstract
In this paper, we describe how the differences between subtitle corpora, OpenSubtitles and TED, influence machine translation quality. In particular, we investigate whether statistical machine translation systems built on their basis can be used interchangeably. Our results show that OpenSubtiles and TED contain very different kinds of subtitles that warrant a subclassification of the genre. In addition, we have taken a closer look at the translation of questions as a sentence type with special word order. Interestingly, we found the BLEU scores for questions to be higher than for random sentences.
- Published
- 2013
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.