41 results on '"Wikification"'
Search Results
2. A step further towards a consensus on linking tweets to Wikipedia.
- Author
-
Nait-Hamoud, Mohamed Cherif, Lahfa, Fedoua, and Ennaji, Abdellatif
- Abstract
The study of contemporary tweet-based Entity Linking (EL) systems reveals a lack of a standard definition and a consensus on the task. Specifically, identifying what should be annotated in texts remains a recurring question. This prevents proper design and fair evaluation of EL systems. To tackle this issue, the present paper introduces a set of rules intended to define the EL task for tweets. We experimented the effectiveness of the proposed rules by developing TELS, an end-to-end supervised system that links tweets to Wikipedia. The experiments conducted on five publicly available datasets show that our system outperforms the baselines with an improvement, in terms of overall macro F1-score (micro F1-score), ranging from 25.04% (7.32%) up to 35.36% (42.03%). Moreover, feature analysis reveals that when the annotation is not limited to very few entity types, the proposed rules capture more efficiently annotators' tacit agreements from datasets. Consequently, the proposed rules constitute a step further towards a consensus on the EL task. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. Text2Storyline: Generating Enriched Storylines from Text
- Author
-
Gonçalves, Francisco, Campos, Ricardo, Jorge, Alípio, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Kamps, Jaap, editor, Goeuriot, Lorraine, editor, Crestani, Fabio, editor, Maistro, Maria, editor, Joho, Hideo, editor, Davis, Brian, editor, Gurrin, Cathal, editor, Kruschwitz, Udo, editor, and Caputo, Annalina, editor
- Published
- 2023
- Full Text
- View/download PDF
4. Context-enhanced concept disambiguation in Wikification
- Author
-
Mozhgan Saeidi, Kaveh Mahdaviani, Evangelos Milios, and Norbert Zeh
- Subjects
Wikification ,Word sense disambiguation ,Text coherence ,Wikipedia ,Representation learning ,Cybernetics ,Q300-390 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Wikification is a method to automatically enrich a text with links to Wikipedia as a knowledge base. One step in Wikification is detecting ambiguous mentions, and one other step is disambiguating those mentions.In this paper, we worked on the mention disambiguation problem. Some state-of-the-art disambiguation approaches have divided long input document text into non-overlapping windows. Later, for each ambiguous mention, they pick the most similar sense to the chosen meaning of the key-entity (a word that helps disambiguation other words of the text). Partitioning the input into disjoint windows means that the most appropriate key-entity to disambiguate a given mention may be in an adjacent window. The disjoint windows negatively affect the accuracy of these methods. This work presents CACW (Context-Aware Concept Wikifier), a knowledge-based approach to produce the correct meaning for ambiguous mentions in the document. CACW incorporates two algorithms; the first uses co-occurring mentions in consecutive windows to augment the available contextual information to find the correct sense. The second algorithm ranks senses based on their context relevancy. We also define a new metric for disambiguation to measure the coherence of the whole text document. Comparing our approach with state-of-the-art methods shows the effectiveness of our method in terms of text coherence in the English Wikification task. We observed between 10-20 percent improvement in the F1 measure compared to the state-of-the-art techniques.
- Published
- 2023
- Full Text
- View/download PDF
5. Graph Representation Learning in Document Wikification
- Author
-
Saeidi, Mozhgan, Milios, Evangelos, Zeh, Norbert, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Barney Smith, Elisa H., editor, and Pal, Umapada, editor
- Published
- 2021
- Full Text
- View/download PDF
6. Classification and Visualization of Travel Blog Entries Based on Types of Tourism
- Author
-
Shibata, Naoki, Shinoda, Hiroto, Nanba, Hidetsugu, Ishino, Aya, Takezawa, Toshiyuki, Neidhardt, Julia, editor, and Wörndl, Wolfgang, editor
- Published
- 2020
- Full Text
- View/download PDF
7. Computationally Efficient Context-Free Named Entity Disambiguation with Wikipedia.
- Author
-
Simos, Michael Angelos and Makris, Christos
- Subjects
- *
ARTIFICIAL neural networks , *ARTIFICIAL intelligence , *MOBILE apps , *ELECTRONIC encyclopedias , *MODERN languages , *ARTIFICIAL languages , *NATURAL language processing - Abstract
The induction of the semantics of unstructured text corpora is a crucial task for modern natural language processing and artificial intelligence applications. The Named Entity Disambiguation task comprises the extraction of Named Entities and their linking to an appropriate representation from a concept ontology based on the available information. This work introduces novel methodologies, leveraging domain knowledge extraction from Wikipedia in a simple yet highly effective approach. In addition, we introduce a fuzzy logic model with a strong focus on computational efficiency. We also present a new measure, decisive in both methods for the entity linking selection and the quantification of the confidence of the produced entity links, namely the relative commonness measure. The experimental results of our approach on established datasets revealed state-of-the-art accuracy and run-time performance in the domain of fast, context-free Wikification, by relying on an offline pre-processing stage on the corpus of Wikipedia. The methods introduced can be leveraged as stand-alone NED methodologies, propitious for applications on mobile devices, or in the context of vastly reducing the complexity of deep neural network approaches as a first context-free layer. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. Japanese-Chinese Cross-Language Entity Linking Adapting to User’s Language Ability
- Author
-
Kimura, Fuminori, Zhou, Jialiang, Maeda, Akira, Ao, Sio-Iong, editor, Kim, Haeng Kon, editor, Castillo, Oscar, editor, Chan, Alan Hoi-Shou, editor, and Katagiri, Hideki, editor
- Published
- 2018
- Full Text
- View/download PDF
9. Computationally Efficient Context-Free Named Entity Disambiguation with Wikipedia
- Author
-
Michael Angelos Simos and Christos Makris
- Subjects
named entity disambiguation ,text annotation ,context-free Wikification ,word sense disambiguation ,ontologies ,Wikification ,Information technology ,T58.5-58.64 - Abstract
The induction of the semantics of unstructured text corpora is a crucial task for modern natural language processing and artificial intelligence applications. The Named Entity Disambiguation task comprises the extraction of Named Entities and their linking to an appropriate representation from a concept ontology based on the available information. This work introduces novel methodologies, leveraging domain knowledge extraction from Wikipedia in a simple yet highly effective approach. In addition, we introduce a fuzzy logic model with a strong focus on computational efficiency. We also present a new measure, decisive in both methods for the entity linking selection and the quantification of the confidence of the produced entity links, namely the relative commonness measure. The experimental results of our approach on established datasets revealed state-of-the-art accuracy and run-time performance in the domain of fast, context-free Wikification, by relying on an offline pre-processing stage on the corpus of Wikipedia. The methods introduced can be leveraged as stand-alone NED methodologies, propitious for applications on mobile devices, or in the context of vastly reducing the complexity of deep neural network approaches as a first context-free layer.
- Published
- 2022
- Full Text
- View/download PDF
10. Scalable Disambiguation System Capturing Individualities of Mentions
- Author
-
Mai, Tiep, Shi, Bichen, Nicholson, Patrick K., Ajwani, Deepak, Sala, Alessandra, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Gracia, Jorge, editor, Bond, Francis, editor, McCrae, John P., editor, Buitelaar, Paul, editor, Chiarcos, Christian, editor, and Hellmann, Sebastian, editor
- Published
- 2017
- Full Text
- View/download PDF
11. Wikifying software artifacts.
- Author
-
Nassif, Mathieu and Robillard, Martin P.
- Abstract
Context: The computational linguistics community has developed tools, called wikifiers, to identify links to Wikipedia articles from free-form text. Software engineering research can leverage wikifiers to add semantic information to software artifacts. However, no empirically-grounded basis exists to choose an effective wikifier and to configure it for the software domain, on which wikifiers were not specifically trained. Objective: We conducted a study to guide the selection of a wikifier and its configuration for applications in the software domain, and to measure what performance can be expected of wikifiers. Method: We applied six wikifiers, with multiple configurations, to a sample of 500 Stack Overflow posts. We manually annotated the 41 124 articles identified by the wikifiers as correct or not to compare their precision and recall. Results: Each wikifier, in turn, achieved the highest precision, between 13% and 82%, for different thresholds of recall, from 60% to 5%. However, filtering the wikifiers’ output with a whitelist can considerably improve the precision above 79% for recall up to 30%, and above 47% for recall up to 60%. Conclusions: Results reported in each wikifier’s original article cannot be generalized to software-specific documents. Given that no wikifier performs universally better than all others, we provide empirically grounded insights to select a wikifier for different scenarios, and suggest ways to further improve their performance for the software domain. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
12. Encyclopaedic question answering
- Author
-
Dornescu, Iustin, Orasan, Constantin, Mitkov, Ruslan, and Navigli, Roberto
- Subjects
025.04 ,question answering ,semantic architecture ,semantic question answering ,named entity disambiguation ,Natural Language Processing ,question interpretation ,question decomposition ,wikification ,wikipedia - Abstract
Open-domain question answering (QA) is an established NLP task which enables users to search for speciVc pieces of information in large collections of texts. Instead of using keyword-based queries and a standard information retrieval engine, QA systems allow the use of natural language questions and return the exact answer (or a list of plausible answers) with supporting snippets of text. In the past decade, open-domain QA research has been dominated by evaluation fora such as TREC and CLEF, where shallow techniques relying on information redundancy have achieved very good performance. However, this performance is generally limited to simple factoid and deVnition questions because the answer is usually explicitly present in the document collection. Current approaches are much less successful in Vnding implicit answers and are diXcult to adapt to more complex question types which are likely to be posed by users. In order to advance the Veld of QA, this thesis proposes a shift in focus from simple factoid questions to encyclopaedic questions: list questions composed of several constraints. These questions have more than one correct answer which usually cannot be extracted from one small snippet of text. To correctly interpret the question, systems need to combine classic knowledge-based approaches with advanced NLP techniques. To Vnd and extract answers, systems need to aggregate atomic facts from heterogeneous sources as opposed to simply relying on keyword-based similarity. Encyclopaedic questions promote QA systems which use basic reasoning, making them more robust and easier to extend with new types of constraints and new types of questions. A novel semantic architecture is proposed which represents a paradigm shift in open-domain QA system design, using semantic concepts and knowledge representation instead of words and information retrieval. The architecture consists of two phases, analysis – responsible for interpreting questions and Vnding answers, and feedback – responsible for interacting with the user. This architecture provides the basis for EQUAL, a semantic QA system developed as part of the thesis, which uses Wikipedia as a source of world knowledge and iii employs simple forms of open-domain inference to answer encyclopaedic questions. EQUAL combines the output of a syntactic parser with semantic information from Wikipedia to analyse questions. To address natural language ambiguity, the system builds several formal interpretations containing the constraints speciVed by the user and addresses each interpretation in parallel. To Vnd answers, the system then tests these constraints individually for each candidate answer, considering information from diUerent documents and/or sources. The correctness of an answer is not proved using a logical formalism, instead a conVdence-based measure is employed. This measure reWects the validation of constraints from raw natural language, automatically extracted entities, relations and available structured and semi-structured knowledge from Wikipedia and the Semantic Web. When searching for and validating answers, EQUAL uses the Wikipedia link graph to Vnd relevant information. This method achieves good precision and allows only pages of a certain type to be considered, but is aUected by the incompleteness of the existing markup targeted towards human readers. In order to address this, a semantic analysis module which disambiguates entities is developed to enrich Wikipedia articles with additional links to other pages. The module increases recall, enabling the system to rely more on the link structure of Wikipedia than on word-based similarity between pages. It also allows authoritative information from diUerent sources to be linked to the encyclopaedia, further enhancing the coverage of the system. The viability of the proposed approach was evaluated in an independent setting by participating in two competitions at CLEF 2008 and 2009. In both competitions, EQUAL outperformed standard textual QA systems as well as semi-automatic approaches. Having established a feasible way forward for the design of open-domain QA systems, future work will attempt to further improve performance to take advantage of recent advances in information extraction and knowledge representation, as well as by experimenting with formal reasoning and inferencing capabilities.
- Published
- 2012
13. OTNEL: A Distributed Online Deep Learning Semantic Annotation Methodology.
- Author
-
Makris, Christos and Simos, Michael Angelos
- Subjects
DEEP learning ,SEMANTICS ,ARTIFICIAL intelligence ,INFORMATION retrieval ,DATA mining - Abstract
Semantic representation of unstructured text is crucial in modern artificial intelligence and information retrieval applications. The semantic information extraction process from an unstructured text fragment to a corresponding representation from a concept ontology is known as named entity disambiguation. In this work, we introduce a distributed, supervised deep learning methodology employing a long short-term memory-based deep learning architecture model for entity linking withWikipedia. In the context of a frequently changing online world, we introduce and study the domain of online training named entity disambiguation, featuring on-the-fly adaptation to underlying knowledge changes. Our novel methodology evaluates polysemous anchor mentions with sense compatibility based on thematic segmentation of the Wikipedia knowledge graph representation. We aim at both robust performance and high entity-linking accuracy results. The introduced modeling process efficiently addresses conceptualization, formalization, and computational challenges for the online training entity-linking task. The novel online training concept can be exploited for wider adoption, as it is considerably beneficial for targeted topic, online global context consensus for entity disambiguation. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
14. Entity Linking for Mathematical Expressions in Scientific Documents
- Author
-
Kristianto, Giovanni Yoko, Topić, Goran, Aizawa, Akiko, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Morishima, Atsuyuki, editor, Rauber, Andreas, editor, and Liew, Chern Li, editor
- Published
- 2016
- Full Text
- View/download PDF
15. Wikifying Novel Words to Mixtures of Wikipedia Senses by Structured Sparse Coding
- Author
-
Pintér, Balázs, Vörös, Gyula, Szabó, Zoltán, Lőrincz, András, Kacprzyk, Janusz, Series editor, Fred, Ana, editor, and De Marsico, Maria, editor
- Published
- 2015
- Full Text
- View/download PDF
16. Entity Linking for Vietnamese Tweets
- Author
-
Van, Duy K., Huynh, Huy M., Nguyen, Hien T., Vo, Vinh T., Kacprzyk, Janusz, Series editor, Nguyen, Viet-Ha, editor, Le, Anh-Cuong, editor, and Huynh, Van-Nam, editor
- Published
- 2015
- Full Text
- View/download PDF
17. Combining Heuristics and Learning for Entity Linking
- Author
-
Nguyen, Hien T., Akan, Ozgur, Series editor, Bellavista, Paolo, Series editor, Cao, Jiannong, Series editor, Coulson, Geoff, Series editor, Dressler, Falko, Series editor, Ferrari, Domenico, Series editor, Gerla, Mario, Series editor, Kobayashi, Hisashi, Series editor, Palazzo, Sergio, Series editor, Sahni, Sartaj, Series editor, Shen, Xuemin (Sherman), Series editor, Stan, Mircea, Series editor, Jia, Xiaohua, Series editor, Zomaya, Albert, Series editor, Vinh, Phan Cong, editor, Alagar, Vangalur, editor, Vassev, Emil, editor, and Khare, Ashish, editor
- Published
- 2014
- Full Text
- View/download PDF
18. Review on Wikification methods.
- Author
-
Szymański, Julian and Naruszewicz, Maciej
- Subjects
- *
CONCEPTS , *ANNOTATIONS , *RESEMBLANCE (Philosophy) - Abstract
The paper reviews methods on automatic annotation of texts with Wikipedia entries. The process, called Wikification aims at building references between concepts identified in the text and Wikipedia articles. Wikification finds many applications, especially in text representation, where it enables one to capture the semantic similarity of the documents. Also, it can be considered as automatic tagging of the text. We describe typical approaches to Wikification, and identify their advantages and disadvantages. The main problem for wide usage of the Wikification method is the lack of open-sourced frameworks that enable researchers to work cooperatively on that problem. Also problematic is the lack of a unified platform for evaluation of the results proposed by different approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
19. OTNEL: A Distributed Online Deep Learning Semantic Annotation Methodology
- Author
-
Christos Makris and Michael Angelos Simos
- Subjects
named entity disambiguation ,text annotation ,word sense disambiguation ,ontologies ,Wikification ,neural networks ,Technology - Abstract
Semantic representation of unstructured text is crucial in modern artificial intelligence and information retrieval applications. The semantic information extraction process from an unstructured text fragment to a corresponding representation from a concept ontology is known as named entity disambiguation. In this work, we introduce a distributed, supervised deep learning methodology employing a long short-term memory-based deep learning architecture model for entity linking with Wikipedia. In the context of a frequently changing online world, we introduce and study the domain of online training named entity disambiguation, featuring on-the-fly adaptation to underlying knowledge changes. Our novel methodology evaluates polysemous anchor mentions with sense compatibility based on thematic segmentation of the Wikipedia knowledge graph representation. We aim at both robust performance and high entity-linking accuracy results. The introduced modeling process efficiently addresses conceptualization, formalization, and computational challenges for the online training entity-linking task. The novel online training concept can be exploited for wider adoption, as it is considerably beneficial for targeted topic, online global context consensus for entity disambiguation.
- Published
- 2020
- Full Text
- View/download PDF
20. Text Semantic Annotation: A Distributed Methodology Based on Community Coherence
- Author
-
Christos Makris, Georgios Pispirigos, and Michael Angelos Simos
- Subjects
text annotation ,word sense disambiguation ,ontologies ,Wikification ,community detection ,Louvain algorithm ,Industrial engineering. Management engineering ,T55.4-60.8 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Text annotation is the process of identifying the sense of a textual segment within a given context to a corresponding entity on a concept ontology. As the bag of words paradigm’s limitations become increasingly discernible in modern applications, several information retrieval and artificial intelligence tasks are shifting to semantic representations for addressing the inherent natural language polysemy and homonymy challenges. With extensive application in a broad range of scientific fields, such as digital marketing, bioinformatics, chemical engineering, neuroscience, and social sciences, community detection has attracted great scientific interest. Focusing on linguistics, by aiming to identify groups of densely interconnected subgroups of semantic ontologies, community detection application has proven beneficial in terms of disambiguation improvement and ontology enhancement. In this paper we introduce a novel distributed supervised knowledge-based methodology employing community detection algorithms for text annotation with Wikipedia Entities, establishing the unprecedented concept of community Coherence as a metric for local contextual coherence compatibility. Our experimental evaluation revealed that deeper inference of relatedness and local entity community coherence in the Wikipedia graph bears substantial improvements overall via a focus on accuracy amelioration of less common annotations. The proposed methodology is propitious for wider adoption, attaining robust disambiguation performance.
- Published
- 2020
- Full Text
- View/download PDF
21. Thai Wikipedia Link Suggestion Framework
- Author
-
Rungsawang, Arnon, Siangkhio, Sompop, Surarerk, Athasit, Manaskasemsak, Bundit, Park, James J. (Jong Hyuk), editor, Barolli, Leonard, editor, Xhafa, Fatos, editor, and Jeong, Hwa Young, editor
- Published
- 2013
- Full Text
- View/download PDF
22. Generation of Hypertext for Web-Based Learning Based on Wikification
- Author
-
Lui, Andrew Kwok-Fai, Ng, Vanessa Sin-Chun, Tsang, Eddy K. M., Ho, Alex C. H., Kwan, Reggie, editor, McNaught, Carmel, editor, Tsang, Philip, editor, Wang, Fu Lee, editor, and Li, Kam Cheong, editor
- Published
- 2011
- Full Text
- View/download PDF
23. Semantic Web Technology and Recommender Systems.
- Author
-
Kotis, Konstantinos, Kotis, Konstantinos, and Spiliotopoulos, Dimitris
- Subjects
Information technology industries ,Context-Aware Recommender System ,Kendall Correlation Coefficient ,OWL/RDF ,ParlTech ,Pearson Correlation Coefficient ,RDF ,SBVR ,Spearman Correlation Coefficient ,Twitter ,Wikification ,associative mining ,big data analytics ,blockchain ,classification ,clustering ,cold start ,cryptocurrency ,cultural informatics ,cultural space ,cultural spaces ,data catalog ,dataset recommender ,decision support ,digital parliament ,digital transformation ,disruptive technologies ,domain knowledge graph ,fact type ,informal document to SBVR ,interactive information retrieval ,job matching ,journey planning ,keyword search ,knowledge graphs ,knowledge-driven processes ,legal tech ,linked data ,machine learning ,migrants ,mobility ,named entity disambiguation ,natural language ,natural language query ,neural networks ,ontologies ,ontology ,operational rules ,parliamentary administrators ,parliamentary hype cycle ,personalization ,preferences ,process mining ,reasoning ,recommendation ,recommendation system ,recommendation systems ,recommender systems ,refugees ,resource efficiency ,semantic trajectories ,semantic web ,sentiment analysis ,social data ,social media analysis ,social media analytics ,spatial datasets ,technology framework ,terrorism ,text annotation ,trending topics ,user experience ,user influence ,user modeling ,word sense disambiguation - Abstract
Summary: In this book (Volume I), 13 papers have been published on different topics of the wide research areas of Semantic Web and Recommender systems. These papers have been carefully selected based on the peer review of several respectful reviewers organized by MDPI's BDCC journal. This issue has attracted well-known international research teams, who we would like to thank for their work.
24. Big data and sentiment analysis to highlight decision behaviours: a case study for student population.
- Author
-
Troisi, Orlando, Grimaldi, Mara, Loia, Francesca, and Maione, Gennaro
- Subjects
- *
BEHAVIORAL assessment , *SCHOOL admission , *MOTIVATION (Psychology) , *SOCIAL media , *SELF-efficacy , *DECISION making , *COMMUNICATION , *DATA analytics , *EMOTIONS , *STUDENT attitudes , *ELECTRONIC publications , *HEALTH facility design & construction , *ALGORITHMS , *PUBLIC opinion , *DATA mining , *PSYCHOLOGICAL factors - Abstract
Starting from the assumption that the factors orienting University choice are heterogeneous and multidimensional, the study explores student’s motivations in higher education. To this aim, a big data analysis has been performed through ‘TalkWalker’, a tool based on the algorithms developed in the context of Social Data Intelligence, which allows understanding the sentiment of a group of people regarding a specific theme. The data have been extracted by drawing on published posts from anywhere in the world over a 12-month period from many online sources. According to the findings, the main variable capable of influencing the choice of University is training offer, followed by physical structure, work opportunities, prestige, affordability, communication, organisation, environmental sustainability. The study establishes an innovative research agenda for further studies by proposing the elaboration of a systems and process-based view for higher education. However, it presents the limitation of the superficial investigation, determined by the analysis of a large amount of data. Therefore, for future research, it might be appropriate to apply a different technique to realise a comparison and to check whether the size of the considered sample and the depth of the analysis technique can affect the results and the consequent considerations. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
25. Semantic Annotation of Documents Based on Wikipedia Concepts.
- Author
-
Brank, Janez, Leban, Gregor, and Grobelnik, Marko
- Subjects
SEMANTIC computing ,ARTIFICIAL neural networks ,ARTIFICIAL intelligence software ,MACHINE learning - Abstract
Copyright of Informatica (03505596) is the property of Slovene Society Informatika and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2018
26. Ranking Web Search Results Exploiting Wikipedia.
- Author
-
Kanavos, Andreas, Makris, Christos, Plegas, Yannis, and Theodoridis, Evangelos
- Subjects
- *
INTERNET searching , *SEARCH engines , *WORLD Wide Web , *WEBSITES , *SEMANTIC Web , *DATA mining - Abstract
It is widely known that search engines are the dominating tools for finding information on the web. In most of the cases, these engines return web page references on a global ranking taking in mind either the importance of the web site or the relevance of the web pages to the identified topic. In this paper, we focus on the problem of determining distinct thematic groups on web search engine results that other existing engines provide. We additionally address the problem of dynamically adapting their ranking according to user selections, incorporating user judgments as implicitly registered in their selection of relevant documents. Our system exploits a state of the art semantic web data mining technique that identifies semantic entities of Wikipedia for grouping the result set in different topic groups, according to the various meanings of the provided query. Moreover, we propose a novel probabilistic Network scheme that employs the aforementioned topic identification method, in order to modify ranking of results as the users select documents. We evaluated in practice our implemented prototype with extensive experiments with the ClueWeb09 dataset using the TREC's 2009, 2010, 2011 and 2012 Web Tracks' where we observed improved retrieval performance compared to current state of the art re-ranking methods. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
27. Computationally Efficient Context-Free Named Entity Disambiguation with Wikipedia
- Author
-
Christos Makris and Michael Angelos Simos
- Subjects
named entity disambiguation ,text annotation ,context-free Wikification ,word sense disambiguation ,ontologies ,Wikification ,fast Wikification ,artificial intelligence ,machine learning ,Information Systems - Abstract
The induction of the semantics of unstructured text corpora is a crucial task for modern natural language processing and artificial intelligence applications. The Named Entity Disambiguation task comprises the extraction of Named Entities and their linking to an appropriate representation from a concept ontology based on the available information. This work introduces novel methodologies, leveraging domain knowledge extraction from Wikipedia in a simple yet highly effective approach. In addition, we introduce a fuzzy logic model with a strong focus on computational efficiency. We also present a new measure, decisive in both methods for the entity linking selection and the quantification of the confidence of the produced entity links, namely the relative commonness measure. The experimental results of our approach on established datasets revealed state-of-the-art accuracy and run-time performance in the domain of fast, context-free Wikification, by relying on an offline pre-processing stage on the corpus of Wikipedia. The methods introduced can be leveraged as stand-alone NED methodologies, propitious for applications on mobile devices, or in the context of vastly reducing the complexity of deep neural network approaches as a first context-free layer.
- Published
- 2022
28. Wikidata in de praktijk bij de Koninklijke Bibliotheek, masterclass Wikidata 28-05-2021
- Author
-
Janssen, Olaf
- Subjects
KB collection highlights ,Wikidata ,Wikimedia Commons ,Koninklijke Bibliotheek ,KB, national library of the Netherlands ,KB Topstukken ,Wikification ,Linked open data ,LOD ,Wikimedia ,Wikipedia ,Olaf Janssen - Abstract
Nederlands: Presentatie over 'Wikidata in de praktijk bij de Koninklijke Bibliotheek' door Olaf Janssen tijdens de masterclass Meertalige gestructureerde data: Wikidata, onderdeel van de Wikimediatraining Suriname en het Caribisch gebied op 28 mei 2021.
- Published
- 2021
- Full Text
- View/download PDF
29. Using coreference and surrounding contexts for entity linking.
- Author
-
Huynh, Huy M., Nguyen, Trong T., and Cao, Tru H.
- Abstract
Ambiguity in meanings of words or phrases in a document is considered one of the most primary barriers in natural language processing. In this work, we address the task of identifying and linking mentions of entities into correct referents described in a given knowledge base. To deal with it, we propose a supervised learning method for ranking candidate entities in combination with exploiting a heuristic and coreference relations among mentions in a document. Furthermore, another advantage of our method is its simplicity and effectiveness with using much fewer features than other systems. The results from evaluation on TAC-KBP 2012 datasets show that our combination is efficient and this method has a comparable performance to the state-of-the-art ones. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
30. Wikipediaの多言語性を利用したwikificationの高精度・高機能化
- Author
-
綱川 隆司 and 綱川 隆司
- Abstract
2015年度~2018年度 科学研究費助成事業(若手研究(B))研究成果報告書, publisher
- Published
- 2019
31. A multilingual wikified data set of educational material
- Author
-
Hendrickx, Iris, Takoulidou, Eirini, Naskos, Thanasis, Kermanidis, Katia Lida, Sosoni, Vilelmini, De Vos, Hugo, Stasimioti, Maria, Van Zaanen, Menno, Georgakopoulou, Panayota, Egg, Markus, Kordoni, Valia, Popovic, Maja, Van Den Bosch, Antal, Hendrickx, Iris, Takoulidou, Eirini, Naskos, Thanasis, Kermanidis, Katia Lida, Sosoni, Vilelmini, De Vos, Hugo, Stasimioti, Maria, Van Zaanen, Menno, Georgakopoulou, Panayota, Egg, Markus, Kordoni, Valia, Popovic, Maja, and Van Den Bosch, Antal
- Abstract
We present a parallel wikified data set of parallel texts in eleven language pairs from the educational domain. English sentences are lined up to sentences in eleven other languages (BG, CS, DE, EL, HR, IT, NL, PL, PT, RU, ZH) where names and noun phrases (entities) are manually annotated and linked to their respective Wikipedia pages. For every linked entity in English, the corresponding term or phrase in the target language is also marked and linked to its Wikipedia page in that language. The annotation process was performed via crowdsourcing. In this paper we present the task, annotation process, the encountered difficulties with crowdsourcing for complex annotation, and the data set in more detail. We demonstrate the usage of the data set for Wikification evaluation. This data set is valuable as it constitutes a rich resource consisting of annotated data of English text linked to translations in eleven languages including several languages such as Bulgarian and Greek for which not many LT resources are available.
- Published
- 2019
32. A multilingual wikified data set of educational material
- Author
-
Hendrickx, I., Takoulidou, E., Naskos, T., Kermanidis, K. L., Sosoni, V., Vos, H., Stasimioti, M., Menno van Zaanen, Georgakopoulou, P., Egg, M., Kordoni, V., Popovic, M., Den Bosch, A., and Meertens Institute
- Subjects
MOOCs ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Crowdsourcing ,Wikification - Abstract
We present a parallel wikified data set of parallel texts in eleven language pairs from the educational domain. English sentences are lined up to sentences in eleven other languages (BG, CS, DE, EL, HR, IT, NL, PL, PT, RU, ZH) where names and noun phrases (entities) are manually annotated and linked to their respective Wikipedia pages. For every linked entity in English, the corresponding term or phrase in the target language is also marked and linked to its Wikipedia page in that language. The annotation process was performed via crowdsourcing. In this paper we present the task, annotation process, the encountered difficulties with crowdsourcing for complex annotation, and the data set in more detail. We demonstrate the usage of the data set for Wikification evaluation. This data set is valuable as it constitutes a rich resource consisting of annotated data of English text linked to translations in eleven languages including several languages such as Bulgarian and Greek for which not many LT resources are available.
- Published
- 2019
33. Visual analytics methods for the automatic content generation from streaming data
- Author
-
Giachelle, F.
- Subjects
Data provenance ,Visual analytics ,Human-in-the-loop ,Wikification - Published
- 2019
34. OTNEL: A Distributed Online Deep Learning Semantic Annotation Methodology
- Author
-
Michael Angelos Simos and Christos Makris
- Subjects
0209 industrial biotechnology ,Computer science ,Text annotation ,Context (language use) ,02 engineering and technology ,Ontology (information science) ,computer.software_genre ,lcsh:Technology ,Management Information Systems ,Entity linking ,020901 industrial engineering & automation ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,ontologies ,Adaptation (computer science) ,Artificial neural network ,lcsh:T ,business.industry ,named entity disambiguation ,Deep learning ,Wikification ,Information retrieval applications ,neural networks ,Computer Science Applications ,machine learning ,word sense disambiguation ,text annotation ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing ,Information Systems - Abstract
Semantic representation of unstructured text is crucial in modern artificial intelligence and information retrieval applications. The semantic information extraction process from an unstructured text fragment to a corresponding representation from a concept ontology is known as named entity disambiguation. In this work, we introduce a distributed, supervised deep learning methodology employing a long short-term memory-based deep learning architecture model for entity linking with Wikipedia. In the context of a frequently changing online world, we introduce and study the domain of online training named entity disambiguation, featuring on-the-fly adaptation to underlying knowledge changes. Our novel methodology evaluates polysemous anchor mentions with sense compatibility based on thematic segmentation of the Wikipedia knowledge graph representation. We aim at both robust performance and high entity-linking accuracy results. The introduced modeling process efficiently addresses conceptualization, formalization, and computational challenges for the online training entity-linking task. The novel online training concept can be exploited for wider adoption, as it is considerably beneficial for targeted topic, online global context consensus for entity disambiguation.
- Published
- 2020
35. Text Semantic Annotation: A Distributed Methodology Based on Community Coherence
- Author
-
Georgios Pispirigos, Michael Angelos Simos, and Christos Makris
- Subjects
lcsh:T55.4-60.8 ,Computer science ,Process (engineering) ,Text annotation ,Context (language use) ,02 engineering and technology ,Ontology (information science) ,lcsh:QA75.5-76.95 ,Theoretical Computer Science ,Clauset-Newman-Moore algorithm ,020204 information systems ,community detection ,0202 electrical engineering, electronic engineering, information engineering ,lcsh:Industrial engineering. Management engineering ,ontologies ,Polysemy ,Numerical Analysis ,Information retrieval ,Wikification ,Louvain algorithm ,Computational Mathematics ,word sense disambiguation ,Computational Theory and Mathematics ,text annotation ,Bag-of-words model ,020201 artificial intelligence & image processing ,lcsh:Electronic computers. Computer science ,Coherence (linguistics) ,Natural language - Abstract
Text annotation is the process of identifying the sense of a textual segment within a given context to a corresponding entity on a concept ontology. As the bag of words paradigm&rsquo, s limitations become increasingly discernible in modern applications, several information retrieval and artificial intelligence tasks are shifting to semantic representations for addressing the inherent natural language polysemy and homonymy challenges. With extensive application in a broad range of scientific fields, such as digital marketing, bioinformatics, chemical engineering, neuroscience, and social sciences, community detection has attracted great scientific interest. Focusing on linguistics, by aiming to identify groups of densely interconnected subgroups of semantic ontologies, community detection application has proven beneficial in terms of disambiguation improvement and ontology enhancement. In this paper we introduce a novel distributed supervised knowledge-based methodology employing community detection algorithms for text annotation with Wikipedia Entities, establishing the unprecedented concept of community Coherence as a metric for local contextual coherence compatibility. Our experimental evaluation revealed that deeper inference of relatedness and local entity community coherence in the Wikipedia graph bears substantial improvements overall via a focus on accuracy amelioration of less common annotations. The proposed methodology is propitious for wider adoption, attaining robust disambiguation performance.
- Published
- 2020
36. Big data and sentiment analysis to highlight decision behaviours: a case study for student population
- Author
-
Orlando Troisi, Gennaro Maione, Francesca Loia, and Mara Grimaldi
- Subjects
Higher education ,Big data ,Social search ,Arts and Humanities (miscellaneous) ,0502 economics and business ,Choice of University ,ComputingMilieux_COMPUTERSANDEDUCATION ,Developmental and Educational Psychology ,Student population ,business.industry ,05 social sciences ,Sentiment analysis ,Choice of University, social search, big data analysis, sentiment analysis, wikification ,050301 education ,General Social Sciences ,big data analysis ,sentiment analysis ,social search ,wikification ,Data science ,Human-Computer Interaction ,050211 marketing ,business ,Psychology ,0503 education - Abstract
Starting from the assumption that the factors orienting University choice are heterogeneous and multidimensional, the study explores student’s motivations in higher education. To this aim, a big da...
- Published
- 2018
37. Concept-based and relation-based corpus navigation : applications of natural language processing in digital humanities
- Author
-
Ruiz Fabo, Pablo, Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094 (Lattice), Département Littératures et langage (LILA), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université Sorbonne Paris Cité (USPC)-Université Sorbonne Nouvelle - Paris 3, Université Paris sciences et lettres, Thierry Poibeau, and STAR, ABES
- Subjects
Navigation en corpus ,Natural language processing ,Extraction de propositions ,[INFO.INFO-TT] Computer Science [cs]/Document and Text Processing ,Wikification ,Entity Linking ,Extraction de relations ,[SHS.LANGUE] Humanities and Social Sciences/Linguistics ,Traitement automatique des langues ,Corpus visualization ,Corpus navigation ,[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,Liage d’entité ,Proposition extraction ,Relation extraction ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,Evaluation ,Humanités numériques ,Visualisation de corpus ,Digital humanities - Abstract
Social sciences and Humanities research is often based on large textual corpora, that it would be unfeasible to read in detail. Natural Language Processing (NLP) can identify important concepts and actors mentioned in a corpus, as well as the relations between them. Such information can provide an overview of the corpus useful for domain-experts, and help identify corpus areas relevant for a given research question. To automatically annotate corpora relevant for Digital Humanities (DH), the NLP technologies we applied are, first, Entity Linking, to identify corpus actors and concepts. Second, the relations between actors and concepts were determined based on an NLP pipeline which provides semantic role labeling and syntactic dependencies among other information. Part I outlines the state of the art, paying attention to how the technologies have been applied in DH.Generic NLP tools were used. As the efficacy of NLP methods depends on the corpus, some technological development was undertaken, described in Part II, in order to better adapt to the corpora in our case studies. Part II also shows an intrinsic evaluation of the technology developed, with satisfactory results. The technologies were applied to three very different corpora, as described in Part III. First, the manuscripts of Jeremy Bentham. This is a 18th-19th century corpus in political philosophy. Second, the PoliInformatics corpus, with heterogeneous materials about the American financial crisis of 2007-2008. Finally, the Earth Negotiations Bulletin (ENB), which covers international climate summits since 1995, where treaties like the Kyoto Protocol or the Paris Agreements get negotiated.For each corpus, navigation interfaces were developed. These user interfaces (UI) combine networks, full-text search and structured search based on NLP annotations. As an example, in the ENB corpus interface, which covers climate policy negotiations, searches can be performed based on relational information identified in the corpus: the negotiation actors having discussed a given issue using verbs indicating support or opposition can be searched, as well as all statements where a given actor has expressed support or opposition. Relation information is employed, beyond simple co-occurrence between corpus terms.The UIs were evaluated qualitatively with domain-experts, to assess their potential usefulness for research in the experts' domains. First, we payed attention to whether the corpus representations we created correspond to experts' knowledge of the corpus, as an indication of the sanity of the outputs we produced. Second, we tried to determine whether experts could gain new insight on the corpus by using the applications, e.g. if they found evidence unknown to them or new research ideas. Examples of insight gain were attested with the ENB interface; this constitutes a good validation of the work carried out in the thesis. Overall, the applications' strengths and weaknesses were pointed out, outlining possible improvements as future work., La recherche en Sciences humaines et sociales repose souvent sur de grandes masses de données textuelles, qu'il serait impossible de lire en détail. Le Traitement automatique des langues (TAL) peut identifier des concepts et des acteurs importants mentionnés dans un corpus, ainsi que les relations entre eux. Ces informations peuvent fournir un aperçu du corpus qui peut être utile pour les experts d'un domaine et les aider à identifier les zones du corpus pertinentes pour leurs questions de recherche. Pour annoter automatiquement des corpus d'intérêt en Humanités numériques, les technologies TAL que nous avons appliquées sont, en premier lieu, le liage d'entités (plus connu sous le nom de Entity Linking), pour identifier les acteurs et concepts du corpus ; deuxièmement, les relations entre les acteurs et les concepts ont été déterminées sur la base d'une chaîne de traitements TAL, qui effectue un étiquetage des rôles sémantiques et des dépendances syntaxiques, entre autres analyses linguistiques. La partie I de la thèse décrit l'état de l'art sur ces technologies, en soulignant en même temps leur emploi en Humanités numériques. Des outils TAL génériques ont été utilisés. Comme l'efficacité des méthodes de TAL dépend du corpus d'application, des développements ont été effectués, décrits dans la partie II, afin de mieux adapter les méthodes d'analyse aux corpus dans nos études de cas. La partie II montre également une évaluation intrinsèque de la technologie développée, avec des résultats satisfaisants. Les technologies ont été appliquées à trois corpus très différents, comme décrit dans la partie III. Tout d'abord, les manuscrits de Jeremy Bentham, un corpus de philosophie politique des 18e et 19e siècles. Deuxièmement, le corpus PoliInformatics, qui contient des matériaux hétérogènes sur la crise financière américaine de 2007--2008. Enfin, le Bulletin des Négociations de la Terre (ENB dans son acronyme anglais), qui couvre des sommets internationaux sur la politique climatique depuis 1995, où des traités comme le Protocole de Kyoto ou les Accords de Paris ont été négociés. Pour chaque corpus, des interfaces de navigation ont été développées. Ces interfaces utilisateur combinent les réseaux, la recherche en texte intégral et la recherche structurée basée sur des annotations TAL. À titre d'exemple, dans l'interface pour le corpus ENB, qui couvre des négociations en politique climatique, des recherches peuvent être effectuées sur la base d'informations relationnelles identifiées dans le corpus: les acteurs de la négociation ayant discuté un sujet concret en exprimant leur soutien ou leur opposition peuvent être recherchés. Le type de la relation entre acteurs et concepts est exploité, au-delà de la simple co-occurrence entre les termes du corpus. Les interfaces ont été évaluées qualitativement avec des experts de domaine, afin d'estimer leur utilité potentielle pour la recherche dans leurs domaines respectifs. Tout d'abord, il a été vérifié si les représentations générées pour le contenu des corpus sont en accord avec les connaissances des experts du domaine, pour déceler des erreurs d'annotation. Ensuite, nous avons essayé de déterminer si les experts pourraient être en mesure d'avoir une meilleure compréhension du corpus grâce à avoir utilisé les applications, par exemple, s'ils ont trouvé de l'évidence nouvelle pour leurs questions de recherche existantes, ou s'ils ont trouvé de nouvelles questions de recherche. On a pu mettre au jour des exemples où un gain de compréhension sur le corpus est observé grâce à l'interface dédiée au Bulletin des Négociations de la Terre, ce qui constitue une bonne validation du travail effectué dans la thèse. En conclusion, les points forts et faiblesses des applications développées ont été soulignés, en indiquant de possibles pistes d'amélioration en tant que travail futur.
- Published
- 2017
38. Concept-based and relation-based corpus navigation : applications of natural language processing in digital humanities
- Author
-
Pablo Ruiz Fabo, Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094 (Lattice), Département Littératures et langage - ENS Paris (LILA), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université Sorbonne Paris Cité (USPC)-Université Sorbonne Nouvelle - Paris 3, Université Paris sciences et lettres, and Thierry Poibeau
- Subjects
Navigation en corpus ,Natural language processing ,Extraction de propositions ,Wikification ,Entity Linking ,Extraction de relations ,Traitement automatique des langues ,Corpus visualization ,Corpus navigation ,[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,Liage d’entité ,Proposition extraction ,Relation extraction ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,Evaluation ,Humanités numériques ,Visualisation de corpus ,Digital humanities - Abstract
Social sciences and Humanities research is often based on large textual corpora, that it would be unfeasible to read in detail. Natural Language Processing (NLP) can identify important concepts and actors mentioned in a corpus, as well as the relations between them. Such information can provide an overview of the corpus useful for domain-experts, and help identify corpus areas relevant for a given research question. To automatically annotate corpora relevant for Digital Humanities (DH), the NLP technologies we applied are, first, Entity Linking, to identify corpus actors and concepts. Second, the relations between actors and concepts were determined based on an NLP pipeline which provides semantic role labeling and syntactic dependencies among other information. Part I outlines the state of the art, paying attention to how the technologies have been applied in DH.Generic NLP tools were used. As the efficacy of NLP methods depends on the corpus, some technological development was undertaken, described in Part II, in order to better adapt to the corpora in our case studies. Part II also shows an intrinsic evaluation of the technology developed, with satisfactory results. The technologies were applied to three very different corpora, as described in Part III. First, the manuscripts of Jeremy Bentham. This is a 18th-19th century corpus in political philosophy. Second, the PoliInformatics corpus, with heterogeneous materials about the American financial crisis of 2007-2008. Finally, the Earth Negotiations Bulletin (ENB), which covers international climate summits since 1995, where treaties like the Kyoto Protocol or the Paris Agreements get negotiated.For each corpus, navigation interfaces were developed. These user interfaces (UI) combine networks, full-text search and structured search based on NLP annotations. As an example, in the ENB corpus interface, which covers climate policy negotiations, searches can be performed based on relational information identified in the corpus: the negotiation actors having discussed a given issue using verbs indicating support or opposition can be searched, as well as all statements where a given actor has expressed support or opposition. Relation information is employed, beyond simple co-occurrence between corpus terms.The UIs were evaluated qualitatively with domain-experts, to assess their potential usefulness for research in the experts' domains. First, we payed attention to whether the corpus representations we created correspond to experts' knowledge of the corpus, as an indication of the sanity of the outputs we produced. Second, we tried to determine whether experts could gain new insight on the corpus by using the applications, e.g. if they found evidence unknown to them or new research ideas. Examples of insight gain were attested with the ENB interface; this constitutes a good validation of the work carried out in the thesis. Overall, the applications' strengths and weaknesses were pointed out, outlining possible improvements as future work.; La recherche en Sciences humaines et sociales repose souvent sur de grandes masses de données textuelles, qu'il serait impossible de lire en détail. Le Traitement automatique des langues (TAL) peut identifier des concepts et des acteurs importants mentionnés dans un corpus, ainsi que les relations entre eux. Ces informations peuvent fournir un aperçu du corpus qui peut être utile pour les experts d'un domaine et les aider à identifier les zones du corpus pertinentes pour leurs questions de recherche. Pour annoter automatiquement des corpus d'intérêt en Humanités numériques, les technologies TAL que nous avons appliquées sont, en premier lieu, le liage d'entités (plus connu sous le nom de Entity Linking), pour identifier les acteurs et concepts du corpus ; deuxièmement, les relations entre les acteurs et les concepts ont été déterminées sur la base d'une chaîne de traitements TAL, qui effectue un étiquetage des rôles sémantiques et des dépendances syntaxiques, entre autres analyses linguistiques. La partie I de la thèse décrit l'état de l'art sur ces technologies, en soulignant en même temps leur emploi en Humanités numériques. Des outils TAL génériques ont été utilisés. Comme l'efficacité des méthodes de TAL dépend du corpus d'application, des développements ont été effectués, décrits dans la partie II, afin de mieux adapter les méthodes d'analyse aux corpus dans nos études de cas. La partie II montre également une évaluation intrinsèque de la technologie développée, avec des résultats satisfaisants. Les technologies ont été appliquées à trois corpus très différents, comme décrit dans la partie III. Tout d'abord, les manuscrits de Jeremy Bentham, un corpus de philosophie politique des 18e et 19e siècles. Deuxièmement, le corpus PoliInformatics, qui contient des matériaux hétérogènes sur la crise financière américaine de 2007--2008. Enfin, le Bulletin des Négociations de la Terre (ENB dans son acronyme anglais), qui couvre des sommets internationaux sur la politique climatique depuis 1995, où des traités comme le Protocole de Kyoto ou les Accords de Paris ont été négociés. Pour chaque corpus, des interfaces de navigation ont été développées. Ces interfaces utilisateur combinent les réseaux, la recherche en texte intégral et la recherche structurée basée sur des annotations TAL. À titre d'exemple, dans l'interface pour le corpus ENB, qui couvre des négociations en politique climatique, des recherches peuvent être effectuées sur la base d'informations relationnelles identifiées dans le corpus: les acteurs de la négociation ayant discuté un sujet concret en exprimant leur soutien ou leur opposition peuvent être recherchés. Le type de la relation entre acteurs et concepts est exploité, au-delà de la simple co-occurrence entre les termes du corpus. Les interfaces ont été évaluées qualitativement avec des experts de domaine, afin d'estimer leur utilité potentielle pour la recherche dans leurs domaines respectifs. Tout d'abord, il a été vérifié si les représentations générées pour le contenu des corpus sont en accord avec les connaissances des experts du domaine, pour déceler des erreurs d'annotation. Ensuite, nous avons essayé de déterminer si les experts pourraient être en mesure d'avoir une meilleure compréhension du corpus grâce à avoir utilisé les applications, par exemple, s'ils ont trouvé de l'évidence nouvelle pour leurs questions de recherche existantes, ou s'ils ont trouvé de nouvelles questions de recherche. On a pu mettre au jour des exemples où un gain de compréhension sur le corpus est observé grâce à l'interface dédiée au Bulletin des Négociations de la Terre, ce qui constitue une bonne validation du travail effectué dans la thèse. En conclusion, les points forts et faiblesses des applications développées ont été soulignés, en indiquant de possibles pistes d'amélioration en tant que travail futur.
- Published
- 2017
39. Text Semantic Annotation: A Distributed Methodology Based on Community Coherence.
- Author
-
Makris, Christos, Pispirigos, Georgios, and Simos, Michael Angelos
- Subjects
- *
ONTOLOGIES (Information retrieval) , *NATURAL language processing , *COMMUNITIES , *ANNOTATIONS , *FOCUS (Linguistics) , *ARTIFICIAL intelligence , *INFORMATION retrieval , *SCIENTIFIC community - Abstract
Text annotation is the process of identifying the sense of a textual segment within a given context to a corresponding entity on a concept ontology. As the bag of words paradigm's limitations become increasingly discernible in modern applications, several information retrieval and artificial intelligence tasks are shifting to semantic representations for addressing the inherent natural language polysemy and homonymy challenges. With extensive application in a broad range of scientific fields, such as digital marketing, bioinformatics, chemical engineering, neuroscience, and social sciences, community detection has attracted great scientific interest. Focusing on linguistics, by aiming to identify groups of densely interconnected subgroups of semantic ontologies, community detection application has proven beneficial in terms of disambiguation improvement and ontology enhancement. In this paper we introduce a novel distributed supervised knowledge-based methodology employing community detection algorithms for text annotation with Wikipedia Entities, establishing the unprecedented concept of community Coherence as a metric for local contextual coherence compatibility. Our experimental evaluation revealed that deeper inference of relatedness and local entity community coherence in the Wikipedia graph bears substantial improvements overall via a focus on accuracy amelioration of less common annotations. The proposed methodology is propitious for wider adoption, attaining robust disambiguation performance. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
40. Wikification of Learning Objects Using Metadata as an Alternative Context for Disambiguation
- Author
-
Reyna Melara Abarca, Claudia Perez-Martinez, Alexander Gelbukh, Gabriel López Morteo, Magally Martinez Reyes, and Moisés Pérez López
- Subjects
wikification ,Word sense disambiguation ,Computación ,natural language processing ,learning objects - Abstract
"We present a methodology to wikify learning objects. Our proposal is focused on two processes: word sense disambiguation and relevant phrase selection. The disambiguation process involves the use of the learning object’s metadata as either additional or alternative context. This increases the probability of success when a learning object has a low quality context. The selection of relevant phrases is perf ormed by identifying the highest values of semantic relat edness between the main subject of a learning object and t he phrases. This criterion is useful for achieving the didactic objectives of the learning object."
- Published
- 2014
41. Entity linking for biomedical literature
- Author
-
Boliang Zhang, Heng Ji, Deborah L. McGuinness, Jin Guang Zheng, Daniel P. Howsmon, Juergen Hahn, and James A. Hendler
- Subjects
Computer science ,Inference ,Health Informatics ,02 engineering and technology ,Scientific literature ,text mining ,entity linking ,computer.software_genre ,Semantics ,Ranking (information retrieval) ,Domain (software engineering) ,03 medical and health sciences ,Entity linking ,semantic web ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Data Mining ,biomedical literature ,Semantic Web ,030304 developmental biology ,Natural Language Processing ,0303 health sciences ,Computational model ,Information retrieval ,business.industry ,Health Policy ,Biological Ontologies ,Data science ,Computer Science Applications ,Knowledge base ,wikification ,Benchmark (computing) ,Artificial intelligence ,business ,computer ,Natural language processing ,signal transduction ,Medical Informatics ,Research Article ,biological ontologies - Abstract
Background The Entity Linking (EL) task links entity mentions from an unstructured document to entities in a knowledge base. Although this problem is well-studied in news and social media, this problem has not received much attention in the life science domain. One outcome of tackling the EL problem in the life sciences domain is to enable scientists to build computational models of biological processes with more efficiency. However, simply applying a news-trained entity linker produces inadequate results. Methods Since existing supervised approaches require a large amount of manually-labeled training data, which is currently unavailable for the life science domain, we propose a novel unsupervised collective inference approach to link entities from unstructured full texts of biomedical literature to 300 ontologies. The approach leverages the rich semantic information and structures in ontologies for similarity computation and entity ranking. Results Without using any manual annotation, our approach significantly outperforms state-of-the-art supervised EL method (9% absolute gain in linking accuracy). Furthermore, the state-of-the-art supervised EL method requires 15,000 manually annotated entity mentions for training. These promising results establish a benchmark for the EL task in the life science domain. We also provide in depth analysis and discussion on both challenges and opportunities on automatic knowledge enrichment for scientific literature. Conclusions In this paper, we propose a novel unsupervised collective inference approach to address the EL problem in a new domain. We show that our unsupervised approach is able to outperform a current state-of-the-art supervised approach that has been trained with a large amount of manually labeled data. Life science presents an underrepresented domain for applying EL techniques. By providing a small benchmark data set and identifying opportunities, we hope to stimulate discussions across natural language processing and bioinformatics and motivate others to develop techniques for this largely untapped domain.
- Published
- 2015
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.