9 results on '"Paweł Kędzia"'
Search Results
2. Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources
- Author
-
Paweł Kędzia, Maciej Piasecki, and Marlena Orlińska
- Subjects
word sense disambiguation ,WSD ,page rank ,plWordNet ,graphs ,lexical resources ,Computational linguistics. Natural language processing ,P98-98.5 ,Semantics ,P325-325.5 ,Lexicography ,P327-327.5 - Abstract
Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources Lexical resources can be applied in many different Natural Language Engineering tasks, but the most fundamental task is the recognition of word senses used in text contexts. The problem is difficult, not yet fully solved and different lexical resources provided varied support for it. Polish CLARIN lexical semantic resources are based on the plWordNet — a very large wordnet for Polish — as a central structure which is a basis for linking together several resources of different types. In this paper, several Word Sense Disambiguation (henceforth WSD) methods developed for Polish that utilise plWordNet are discussed. Textual sense descriptions in the traditional lexicon can be compared with text contexts using Lesk’s algorithm in order to find best matching senses. In the case of a wordnet, lexico-semantic relations provide the main description of word senses. Thus, first, we adapted and applied to Polish a WSD method based on the Page Rank. According to it, text words are mapped on their senses in the plWordNet graph and Page Rank algorithm is run to find senses with the highest scores. The method presents results lower but comparable to those reported for English. The error analysis showed that the main problems are: fine grained sense distinctions in plWordNet and limited number of connections between words of different parts of speech. In the second approach plWordNet expanded with the mapping onto the SUMO ontology concepts was used. Two scenarios for WSD were investigated: two step disambiguation and disambiguation based on combined networks of plWordNet and SUMO. In the former scenario, words are first assigned SUMO concepts and next plWordNet senses are disambiguated. In latter, plWordNet and SUMO are combined in one large network used next for the disambiguation of senses. The additional knowledge sources used in WSD improved the performance. The obtained results and potential further lines of developments were discussed.
- Published
- 2015
- Full Text
- View/download PDF
3. Automatic Prompt System in the Process of Mapping plWordNet on Princeton WordNet
- Author
-
Paweł Kędzia, Maciej Piasecki, Ewa Rudnicka, and Konrad Przybycień
- Subjects
semi-automatic prompt system ,mapping procedure ,plWordnet ,WordNet ,Polish ,interlingual relations ,Computational linguistics. Natural language processing ,P98-98.5 ,Semantics ,P325-325.5 ,Lexicography ,P327-327.5 - Abstract
Automatic Prompt System in the Process of Mapping plWordNet on Princeton WordNet The paper offers a critical evaluation of the power and usefulness of an automatic prompt system based on the extended Relaxation Labelling algorithm in the process of (manual) mapping plWordNet on Princeton WordNet. To this end the results of manual mapping – that is inter-lingual relations between plWN and PWN synsets – are juxtaposed with the automatic prompts that were generated for the source language synsets to be mapped. We check the number and type of inter-lingual relations introduced on the basis of automatic prompts and the distance of the respective prompt synsets from the actual target language synsets.
- Published
- 2015
- Full Text
- View/download PDF
4. Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources
- Author
-
Marlena Orlinska, Maciej Piasecki, and Paweł Kędzia
- Subjects
Linguistics and Language ,Computer Networks and Communications ,Computer science ,WordNet ,Scale (descriptive set theory) ,Ontology (information science) ,computer.software_genre ,Lexicon ,lcsh:P325-325.5 ,WSD ,page rank ,Structure (mathematical logic) ,graphs ,business.industry ,Communication ,lexical resources ,lcsh:P98-98.5 ,Part of speech ,plWordNet ,lcsh:Lexicography ,word sense disambiguation ,SUMO ,Artificial intelligence ,lcsh:Computational linguistics. Natural language processing ,business ,computer ,lcsh:P327-327.5 ,Natural language processing ,Natural language ,Word (computer architecture) ,lcsh:Semantics - Abstract
Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical ResourcesLexical resources can be applied in many different Natural Language Engineering tasks, but the most fundamental task is the recognition of word senses used in text contexts. The problem is difficult, not yet fully solved and different lexical resources provided varied support for it. Polish CLARIN lexical semantic resources are based on the plWordNet — a very large wordnet for Polish — as a central structure which is a basis for linking together several resources of different types. In this paper, several Word Sense Disambiguation (henceforth WSD) methods developed for Polish that utilise plWordNet are discussed. Textual sense descriptions in the traditional lexicon can be compared with text contexts using Lesk’s algorithm in order to find best matching senses. In the case of a wordnet, lexico-semantic relations provide the main description of word senses. Thus, first, we adapted and applied to Polish a WSD method based on the Page Rank. According to it, text words are mapped on their senses in the plWordNet graph and Page Rank algorithm is run to find senses with the highest scores. The method presents results lower but comparable to those reported for English. The error analysis showed that the main problems are: fine grained sense distinctions in plWordNet and limited number of connections between words of different parts of speech. In the second approach plWordNet expanded with the mapping onto the SUMO ontology concepts was used. Two scenarios for WSD were investigated: two step disambiguation and disambiguation based on combined networks of plWordNet and SUMO. In the former scenario, words are first assigned SUMO concepts and next plWordNet senses are disambiguated. In latter, plWordNet and SUMO are combined in one large network used next for the disambiguation of senses. The additional knowledge sources used in WSD improved the performance. The obtained results and potential further lines of developments were discussed.
- Published
- 2015
5. Automatic Prompt System in the Process of Mapping plWordNet on Princeton WordNet
- Author
-
Ewa Rudnicka, Konrad Przybycień, Maciej Piasecki, and Paweł Kędzia
- Subjects
Linguistics and Language ,Computer Networks and Communications ,Computer science ,WordNet ,Relaxation labelling ,computer.software_genre ,lcsh:P325-325.5 ,plWordnet ,mapping procedure ,Polish ,semi-automatic prompt system ,Information retrieval ,business.industry ,Communication ,Process (computing) ,lcsh:P98-98.5 ,lcsh:Lexicography ,Artificial intelligence ,interlingual relations ,lcsh:Computational linguistics. Natural language processing ,business ,computer ,lcsh:P327-327.5 ,Natural language processing ,lcsh:Semantics - Abstract
Automatic Prompt System in the Process of Mapping plWordNet on Princeton WordNetThe paper offers a critical evaluation of the power and usefulness of an automatic prompt system based on the extended Relaxation Labelling algorithm in the process of (manual) mapping plWordNet on Princeton WordNet. To this end the results of manual mapping – that is inter-lingual relations between plWN and PWN synsets – are juxtaposed with the automatic prompts that were generated for the source language synsets to be mapped. We check the number and type of inter-lingual relations introduced on the basis of automatic prompts and the distance of the respective prompt synsets from the actual target language synsets.
- Published
- 2015
6. Graph-Based Approach to Recognizing CST Relations in Polish Texts
- Author
-
Maciej Piasecki, Paweł Kędzia, and Arkadiusz Janz
- Subjects
Computer science ,business.industry ,Graph based ,02 engineering and technology ,Graph similarity ,computer.software_genre ,Graph ,SemEval ,Logistic model tree ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
This paper presents an supervised approach to the recognition of Cross-document Structure Theory (CST) relations in Polish texts. In the proposed, graph-based representation is constructed for sentences. Graphs are built on the basis of lexicalised syntactic-semantic relation extracted from text. Similarity between sentences is calculated from graph, and the similarity values are input to classifiers trained by Logistic Model Tree. Several different configurations of graph, as well as graph similarity methods were analysed for this tasks. The approach was evaluated on a large open corpus annotated manually with 17 types of selected CST relations. The configuration of experiments was similar to those known from SEMEVAL and we obtained very promising results.
- Published
- 2017
- Full Text
- View/download PDF
7. Fextor: A Feature Extraction Framework for Natural Language Processing: A Case Study in Word Sense Disambiguation, Relation Recognition and Anaphora Resolution
- Author
-
Radosław Ramocki, Adam Radziszewski, Michał Marcińczuk, Bartosz Broda, Adam Wardyński, and Paweł Kędzia
- Subjects
Text corpus ,Shallow parsing ,Relation (database) ,Computer science ,business.industry ,Anaphora (linguistics) ,media_common.quotation_subject ,Feature extraction ,Resolution (logic) ,computer.software_genre ,Reading (process) ,Redundancy (engineering) ,Artificial intelligence ,business ,computer ,Natural language processing ,media_common - Abstract
Feature extraction from text corpora is an important step in Natural Language Processing (NLP), especially for Machine Learning (ML) techniques. Various NLP tasks have many common steps, e.g. low level act of reading a corpus and obtaining text windows from it. Some high-level processing steps might also be shared, e.g. testing for morpho-syntactic constraints between words. An integrated feature extraction framework removes wasteful redundancy and helps in rapid prototyping.
- Published
- 2013
- Full Text
- View/download PDF
8. Finding the Optimal Number of Clusters for Word Sense Disambiguation
- Author
-
Bartosz Broda and Paweł Kędzia
- Subjects
Ambiguity resolution ,Word-sense disambiguation ,business.industry ,Computer science ,media_common.quotation_subject ,Speech recognition ,Ambiguity ,computer.software_genre ,Manual labour ,Artificial intelligence ,business ,Cluster analysis ,computer ,Natural language processing ,media_common - Abstract
Ambiguity is an inherent problem for many tasks in Natural Language Processing. Unsupervised and semi-supervised approaches to ambiguity resolution are appealing as they lower the cost of manual labour. Typically, those methods struggle with estimation of number of senses without supervision. This paper shows research on using stopping functions applied to clustering algorithms for estimation of number of senses. The experiments were performed for Polish and English. We found that estimation based on PK2 stopping functions is encouraging, but only when using coarse-grained distinctions between senses.
- Published
- 2011
- Full Text
- View/download PDF
9. Distributionally Extended Network-based Word Sense Disambiguation in Semantic Clustering of Polish Texts
- Author
-
Jan Kocoń, Agnieszka Indyka-Piasecka, Maciej Piasecki, and Paweł Kędzia
- Subjects
Information retrieval ,Word-sense disambiguation ,text classification ,Computer science ,business.industry ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,WordNet ,Document clustering ,computer.software_genre ,Graph ,SemEval ,plWordNet ,Semantic similarity ,Semantic clustering ,Artificial intelligence ,business ,Word Sense Disambiguation ,computer ,wordnet ,Natural language processing - Abstract
In the paper we present an extended version of the graph-based unsupervised Word Sense Disambiguation algorithm. The algorithm is based on the spreading activation scheme applied to the graphs dynamically built on the basis of the text words and a large wordnet. The algorithm, originally proposed for English and Princeton WordNet, was adapted to Polish and plWordNet. An extension based on the knowledge acquired from the corpus-derived Measure of Semantic Relatedness was proposed. The extended algorithm was evaluated against the manually disambiguated corpus. We observed improvement in the case of the disambiguation performed for shorter text contexts. In addition the algorithm application expressed improvement in document clustering task.
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.