189 results on '"textrank"'
Search Results
52. Uyghur–Kazakh–Kirghiz Text Keyword Extraction Based on Morpheme Segmentation
- Author
-
Sardar Parhat, Mutallip Sattar, Askar Hamdulla, and Abdurahman Kadir
- Subjects
Uyghur–Kazakh–Kirghiz ,keyword extraction ,morpheme segmentation ,stem extraction ,stem vector ,TextRank ,Information technology ,T58.5-58.64 - Abstract
In this study, based on a morpheme segmentation framework, we researched a text keyword extraction method for Uyghur, Kazakh and Kirghiz languages, which have similar grammatical and lexical structures. In these languages, affixes and a stem are joined together to form a word. A stem is a word particle with a notional meaning, while the affixes perform grammatical functions. Because of these derivative properties, the vocabularies used for these languages are huge. Therefore, pre-processing is a necessary step in NLP tasks for Uyghur, Kazakh and Kirghiz. Morpheme segmentation enabled us to remove the suffixes as the auxiliary unit while retaining the meaningful stem and it reduced the dimension of the feature space present in the keyword extraction task for Uyghur, Kazakh and Kirghiz texts. We transformed the morpheme segmentation task into the problem of labeling the morpheme sequences, and we used the Bi-LSTM network to bidirectionally obtain the position feature information of character sequences. We applied CRF to effectively learn the information of the preceding and following label sequences to build a highly accurate Bi-LSTM_CRF morpheme segmentation model, and we prepared morpheme-based experimental text sets by using this model. Subsequently, we used the stem vectors’ similarity to modify the TextRank algorithm, subsequent to the training of the stem embedding vector using the Doc2vec algorithm, and then we performed a text keyword extraction experiment. In this experiment, the highest F1 scores of 43.8%, 44% and 43.9% were obtained for three datasets. The experimental results show that the morpheme-based approach provides much better results than the word-based approach, which shows the stem vector similarity weighting is an efficient method for the text keyword extraction task, thus proving the efficiency of morpheme sequence for morphologically derivative languages.
- Published
- 2023
- Full Text
- View/download PDF
53. Improving TextRank Algorithm for Automatic Keyword Extraction with Tolerance Rough Set.
- Author
-
Qiu, Dong and Zheng, Qin
- Subjects
ROUGH sets ,ALGORITHMS ,FUZZY sets ,FUZZY graphs - Abstract
Aiming at the shortcomings of the TextRank method (TM) which only considers the co-occurrence between words and the incipient word importance when extracting keywords, this paper proposes a tolerance rough set (TRS)-based unsupervised keyword extraction method. Generally, how to score the words in a document has a significant influence on the word graph modeling. In this paper, we improve TM in two aspects with TRS theory that is used to mine vocabulary, semantics, grammar and other information in the corpus. First, the degree of words belonging to each document is calculated to form a fuzzy membership matrix, which helps to characterize the incipient word importance. Second, the fuzzy membership of words to each word tolerance class is calculated to form a semantic correlation matrix, which contributes to optimize the transition probability of all graph edges. We apply the proposed methods to the clustering tasks of two datasets, outperforming the strong baselines. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
54. Peringkas Otomatis Teks Berbahasa Arab Menggunakan Algoritma TextRank
- Author
-
Muhammad Fikri Hidayattullah and Ardhiyan Azizi
- Subjects
automatic summarizer ,arabic ,textrank ,artificial intelligence ,machine learning ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Increasingly, the amount of data in the form of text documents scattered on the internet is getting bigger. It took a very long time to get the information from each of these documents. For this reason, several researchers developed the Automatic Text Summarizer to summarize text automatically, so that the time needed to get important information from the entire document can be faster. Research that focuses on automatic summarization of Arabic texts is very rare. In fact, there are more than 300 million Arabic speakers in the world and Arabic is the official language at the United Nations. Therefore, this study develops a model that can perform text summarization automatically using the TextRank algorithm. The test results using Q&A Evaluation show very good results with details of the suitability of the summary results with the original text by 90%, the suitability of the summary results with Arabic grammar is 91.43%, the suitability of the summary results is 90%, the ease of understanding the summary results is 90%. and the useful aspects of the model developed were 91.43%.
- Published
- 2021
- Full Text
- View/download PDF
55. A Template Approach for Summarizing Restaurant Reviews
- Author
-
Yenliang Chen, Chialing Chang, and Jeryeu Gan
- Subjects
Restaurant reviews ,sentiment analysis ,summarization ,template ,TextRank ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
In the era of rapid development of social networks, user reviews of restaurant review websites have grown rapidly. In order to allow users to quickly grasp the key points of review information on review sites, this paper provides an abstractive multi-text summary method that can automatically generate template-based review summaries based on predefined topics and sentiments. In particular, for each predefined topic and each type of sentiment (positive or negative), this study uses the TextRank algorithm to find the most representative sentences to form a summary. This method allows users to quickly grasp the positive and negative opinions of each important aspect of the restaurant. The previous research on generating abstracts from reviews either did not generate abstracts based on topics, or they were based on topics generated by random models. However, the latter method cannot guarantee that the topics generated by the random model are really the topics that the user needs. For a restaurant review, some topics are indispensable. In order to ensure that abstracts can be generated for these essential topics, our method predefines the topics that must be generated, and then generates abstracts for these topics. In the evaluation, this study compared the template method with the Refresh and Gensim systems based on criteria such as informativeness, clarity, usefulness and likes. The results show that the method proposed in this paper is superior to the other two summary methods.
- Published
- 2021
- Full Text
- View/download PDF
56. Chinese News Keyword Extraction Algorithm Based on TextRank and Topic Model
- Author
-
Xiong, Ao, Guo, Qing, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin (Sherman), Editorial Board Member, Stan, Mircea, Editorial Board Member, Xiaohua, Jia, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Han, Shuai, editor, Ye, Liang, editor, and Meng, Weixiao, editor
- Published
- 2019
- Full Text
- View/download PDF
57. Case Facts Analysis Method Based on Deep Learning
- Author
-
Xu, Zihuan, He, Tieke, Lian, Hao, Wan, Jiabing, Wang, Hui, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ni, Weiwei, editor, Wang, Xin, editor, Song, Wei, editor, and Li, Yukun, editor
- Published
- 2019
- Full Text
- View/download PDF
58. Unsupervised Automatic Keyphrases Extraction Algorithms : Experimentations on Paintings
- Author
-
Gagliardi, Isabella, Artese, Maria Teresa, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Debruyne, Christophe, editor, Panetto, Hervé, editor, Guédria, Wided, editor, Bollen, Peter, editor, Ciuciu, Ioana, editor, and Meersman, Robert, editor
- Published
- 2019
- Full Text
- View/download PDF
59. Automatic back transliteration of Romanized Bengali (Banglish) to Bengali
- Author
-
Shibli, G. M. Shahariar, Shawon, Md. Tanvir Rouf, Nibir, Anik Hassan, Miandad, Md. Zabed, and Mandal, Nibir Chandra
- Published
- 2023
- Full Text
- View/download PDF
60. A Comprehensive Analysis of Indian Legal Documents Summarization Techniques
- Author
-
Sharma, Saloni, Srivastava, Surabhi, Verma, Pradeepika, Verma, Anshul, and Chaurasia, Sachchida Nand
- Published
- 2023
- Full Text
- View/download PDF
61. Automatically Generating Release Notes with Content Classification Models.
- Author
-
Nath, Sristy Sumana and Roy, Banani
- Subjects
LATENT semantic analysis ,SOFTWARE maintenance ,MACHINE learning ,CLASSIFICATION - Abstract
Release notes are admitted as an essential technical document in software maintenance. They summarize the main changes, e.g. bug fixes and new features, that have happened in the software since the previous release. Manually producing release notes is a time-consuming and challenging task. For that reason, sometimes developers neglect to write release notes. For example, we collect data from GitHub with over 1900 releases, and among them, 37% of the release notes are empty. To mitigate this problem, we propose an automatic release notes generation approach by applying the text summarization techniques, i.e. TextRank. To improve the keyword extraction method of traditional TextRank, we integrate the GloVe word embedding technique with TextRank. After generating release notes automatically, we apply machine learning algorithms to classify the release note contents (or sentences). We classify the contents into six categories, e.g. bug fixes and performance improvements, to represent the release notes better for users. We use the evaluation metric, e.g. ROUGE, to evaluate the automatically generated release notes. We also compare the performance of our technique with two popular extractive algorithms, e.g. Luhn's and latent semantic analysis (LSA). Our evaluation results show that the improved TextRank method outperforms the two algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
62. Automatic Keywords Extraction Based on Co-Occurrence and Semantic Relationships Between Words
- Author
-
Xiangke Mao, Shaobin Huang, Rongsheng Li, and Linshan Shen
- Subjects
Automatic keywords extraction ,graph model ,semantic similarity ,TextRank ,word co-occurrence ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Automatic keywords extraction is a method that extracts words or phrases from a document which can express the main idea of the document. In this paper, we propose an unsupervised keywords extraction framework for individual documents, which improves the keywords extraction from two aspects. In the step of candidate keywords selection, we use the methods of removing the stopwords, regular matching, and length filtering to reduce the number of candidate keywords, but improve the quality. In the step of scoring words, we use word co-occurrence, semantic relationships (WordNet, Word Embedding, Normalized Google Distance), and three ways to combine word co-occurrence and semantic relationships to measure the weight of edges in the graph model. In experiments, we use Precision, Recall, and F1-measure values as evaluation criteria to compare all keywords extraction methods we proposed with other strong baseline methods in two datasets. According to the results of experiments, methods under our proposed framework achieve good results. We verify that the methods of using both word co-occurrence and semantic relationships have a better effect on keywords extraction than using co-occurrence or semantic relationships only. At the same time, we also find that for the keywords extraction of individual documents, the method of using co-occurrence between words has a better effect than semantic relationships.
- Published
- 2020
- Full Text
- View/download PDF
63. An Empirical Study of TextRank for Keyword Extraction
- Author
-
Mingxi Zhang, Xuemin Li, Shuibo Yue, and Liuqian Yang
- Subjects
Keyword extraction ,Porter Stemmer ,TextRank ,PageRank ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
As a typical keyword extraction technology, TextRank has been used in a wide variety of commercial applications, including text classification, information retrieval and clustering. In these applications, the parameters of TextRank, including the co-occurrence window size, iteration number and decay factor, are set roughly, which might affect the effectiveness of returned results. In this work, we conduct an empirical study on TextRank, towards finding optimal parameter settings for keyword extraction. The experiments are done in Hulth2003 and Krapivin2009 datasets, which are two real datasets. We first remove the stop word by an open published English stop word list XPO6. And then, we extract the word stems by Porter Stemmer. Porter Stemmer is a tool which can find the stems of words with multiple variants, discard redundant information, strengthen the filtering effect, and extract the effective features of the text fully. We carry out extensive experiments to evaluate the effects of the parameters to keywords extraction, and evaluate the effectiveness of corresponding results by Precision, Recall and Accuracy. Experimental results show that TextRank shows the best performance when setting co-occurrence window size w = 3, iteration number t = 20, decay factor c = 0.9 and rank k = 10 respectively, and the results are independent of the text length.
- Published
- 2020
- Full Text
- View/download PDF
64. Building Domain Keywords Using Cognitive Based Sentences Framework
- Author
-
Xu, Zheng, Liu, Weidong, Zhu, Yiwei, Zhang, Shunxiang, Yen, Neil Y., editor, and Hung, Jason C, editor
- Published
- 2018
- Full Text
- View/download PDF
65. A Modification to Graph Based Approach for Extraction Based Automatic Text Summarization
- Author
-
Sehgal, Sunchit, Kumar, Badal, Maheshwar, Rampal, Lakshay, Chaliya, Ankit, Kacprzyk, Janusz, Series editor, Pal, Nikhil R., Advisory editor, Bello Perez, Rafael, Advisory editor, Corchado, Emilio S., Advisory editor, Hagras, Hani, Advisory editor, Kóczy, László T., Advisory editor, Kreinovich, Vladik, Advisory editor, Lin, Chin-Teng, Advisory editor, Lu, Jie, Advisory editor, Melin, Patricia, Advisory editor, Nedjah, Nadia, Advisory editor, Nguyen, Ngoc Thanh, Advisory editor, Wang, Jun, Advisory editor, Saeed, Khalid, editor, Chaki, Nabendu, editor, Pati, Bibudhendu, editor, Bakshi, Sambit, editor, and Mohapatra, Durga Prasad, editor
- Published
- 2018
- Full Text
- View/download PDF
66. Improvement of TextRank Based on Co-occurrence Word Pairs and Context Information
- Author
-
Wang, Yang, Yin, Hua, He, Minwei, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Pandu Rangan, C., Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, and Qiu, Meikang, editor
- Published
- 2018
- Full Text
- View/download PDF
67. Extracting 5W from Baidu Hot News Search Words for Societal Risk Events Analysis
- Author
-
Xu, Nuo, Tang, Xijin, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Meng, Xiaofeng, editor, Li, Ruixuan, editor, Wang, Kanliang, editor, Niu, Baoning, editor, Wang, Xin, editor, and Zhao, Gansen, editor
- Published
- 2018
- Full Text
- View/download PDF
68. Most Important First – Keyphrase Scoring for Improved Ranking in Settings With Limited Keyphrases
- Author
-
Witt, Nils, Milz, Tobias, Seifert, Christin, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Soldatova, Larisa, editor, Vanschoren, Joaquin, editor, Papadopoulos, George, editor, and Ceci, Michelangelo, editor
- Published
- 2018
- Full Text
- View/download PDF
69. A PageRank-Based Method to Extract Fuzzy Expressions as Features in Supervised Classification Problems
- Author
-
Carmona, Pablo, Castro, Juan Luis, Lozano, Jesús, Suárez, José Ignacio, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Herrera, Francisco, editor, Damas, Sergio, editor, Montes, Rosana, editor, Alonso, Sergio, editor, Cordón, Óscar, editor, González, Antonio, editor, and Troncoso, Alicia, editor
- Published
- 2018
- Full Text
- View/download PDF
70. Multimode Summarized Text to Speech Conversion Application
- Author
-
Sehgal, Archit and Khanna, Gitika
- Published
- 2019
71. GKEEP: An Enhanced Graph‐Based Keyword Extractor With Error‐Feedback Propagation for Geoscience Reports
- Author
-
Qinjun Qiu, Zhong Xie, Hong Xie, and Bin Wang
- Subjects
backpropagation ,error feedback ,geoscience reports ,keyword extraction ,TextRank ,Word2Vec ,Astronomy ,QB1-991 ,Geology ,QE1-996.5 - Abstract
Abstract As the amount of published geoscience literature grows, reading and summarizing texts of large collections has become a challenging task. Publication keywords can be considered basic components of knowledge structure representations and have been used to reveal knowledge concerning research domains. In contrast to data used in other research domains, the works on textual geoscience data that entail keyword extraction are limited. In this paper, we propose an unsupervised algorithm, the graph‐based keyword extractor with error‐feedback propagation (GKEEP), that enhances graph‐based keyword extraction approaches by using an error‐feedback mechanism similar to the concept of backpropagation. The proposed approach comprises the following steps. A preprocessed document is used as the input of the proposed model and is represented as a weighted undirected graph, where the vertices represent words and the edges represent the cooccurrence relationship between the words constrained by a window size. Subsequently, its nodes are ranked by their importance scores calculated by a graph‐based ranking algorithm. Consequently, all the words have their own scores, and they are used to compute the scores of keyword candidates. Subsequently, the Word2Vec method is applied to recalculate the scores of keyword candidates and rank the keyword candidates to select the final keyword. It also utilizes error feedback to boost the rankings of the most salient terms that would otherwise be deemed less important. With empirical experiments on two real data sets (including our newly built data set), the proposed GKEEP model outperforms state‐of‐the‐art unsupervised models and the existing graph‐based ranking models. The proposed method can effectively reflect intrinsic keyword semantics and interrelationships.
- Published
- 2021
- Full Text
- View/download PDF
72. Extracting Keywords from Texts based on Word Frequency and Association Features.
- Author
-
Xu, Zhenzhen and Zhang, Junsheng
- Subjects
WORD frequency ,INFORMATION networks ,INFORMATION overload ,INFORMATION retrieval ,INFORMATION technology - Abstract
With the development of information technology such as mobile Internet and social media applications, network information is growing rapidly and leads to the problem of information overload. Keywords help to filter and find interesting information for users from massive text. Automatic extraction of keywords from text as tags of text help to improve recommendation and keyword-based information retrieval. This paper proposes a novel keyword extraction approach from text that combines features such as word frequency and association. Experiment results show that the precision rate, recall rate and F-measure are all better than those of TextRank and TF-IDF. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
73. GKEEP: An Enhanced Graph‐Based Keyword Extractor With Error‐Feedback Propagation for Geoscience Reports.
- Author
-
Qiu, Qinjun, Xie, Zhong, Xie, Hong, and Wang, Bin
- Subjects
- *
GEOLOGY , *EARTH sciences , *SEMANTIC computing , *KNOWLEDGE representation (Information theory) , *UNDIRECTED graphs - Abstract
As the amount of published geoscience literature grows, reading and summarizing texts of large collections has become a challenging task. Publication keywords can be considered basic components of knowledge structure representations and have been used to reveal knowledge concerning research domains. In contrast to data used in other research domains, the works on textual geoscience data that entail keyword extraction are limited. In this paper, we propose an unsupervised algorithm, the graph‐based keyword extractor with error‐feedback propagation (GKEEP), that enhances graph‐based keyword extraction approaches by using an error‐feedback mechanism similar to the concept of backpropagation. The proposed approach comprises the following steps. A preprocessed document is used as the input of the proposed model and is represented as a weighted undirected graph, where the vertices represent words and the edges represent the cooccurrence relationship between the words constrained by a window size. Subsequently, its nodes are ranked by their importance scores calculated by a graph‐based ranking algorithm. Consequently, all the words have their own scores, and they are used to compute the scores of keyword candidates. Subsequently, the Word2Vec method is applied to recalculate the scores of keyword candidates and rank the keyword candidates to select the final keyword. It also utilizes error feedback to boost the rankings of the most salient terms that would otherwise be deemed less important. With empirical experiments on two real data sets (including our newly built data set), the proposed GKEEP model outperforms state‐of‐the‐art unsupervised models and the existing graph‐based ranking models. The proposed method can effectively reflect intrinsic keyword semantics and interrelationships. Plain Language Summary: The common or frequently used terms receive higher scores in traditional graph‐based extraction owing to there are more edges connected to them. This paper proposes a graph‐based KE algorithm called KE using error‐feedback propagation, which utilizes the semantics of word embedding to assist in extracting keywords from geoscience reports. We hope that our approach will serve as an alternative method that deserves further study. Key Points: Word embedding is incorporated to capture the dependency structure as well as the data distribution, and it computes semantic relations to solve the content sparsity problemError feedback is utilized to boost the most salient terms that graph‐based approaches deem less importantA set of experiments to verify the effectiveness of the proposed method on two available manually constructed data sets [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
74. Optimized Focused Web Crawler with Natural Language Processing Based Relevance Measure in Bioinformatics Web Sources
- Author
-
Mani Sekhar S. R., Siddesh G. M., Manvi Sunilkumar S., and Srinivasa K. G.
- Subjects
focused crawler ,data extraction ,natural llanguage processing ,topical crawler ,textrank ,distributed crawler ,master-slave architecture ,bioinformatics ,Cybernetics ,Q300-390 - Abstract
In the fast growing of digital technologies, crawlers and search engines face unpredictable challenges. Focused web-crawlers are essential for mining the boundless data available on the internet. Web-Crawlers face indeterminate latency problem due to differences in their response time. The proposed work attempts to optimize the designing and implementation of Focused Web-Crawlers using Master-Slave architecture for Bioinformatics web sources. Focused Crawlers ideally should crawl only relevant pages, but the relevance of the page can only be estimated after crawling the genomics pages. A solution for predicting the page relevance, which is based on Natural Language Processing, is proposed in the paper. The frequency of the keywords on the top ranked sentences of the page determines the relevance of the pages within genomics sources. The proposed solution uses a TextRank algorithm to rank the sentences, as well as ensuring the correct classification of Bioinformatics web page. Finally, the model is validated by being compared with a breadth first search web-crawler. The comparison shows significant reduction in run time for the same harvest rate.
- Published
- 2019
- Full Text
- View/download PDF
75. Naive Bayesian Automatic Classification of Railway Service Complaint Text Based on Eigenvalue Extraction
- Author
-
Lifeng Li and Wenxing Li
- Subjects
automatic classification ,eigenvalue ,naive Bayes ,railway complaint text ,TextRank ,TF-IDF ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
Railways have developed rapidly in China for several decades. The hardware of railways has already reached the world's leading level, but the level of service of these railways still has room for improvement. The railway management department receives a large number of passenger complaints every year and records them in text, which needs to be classified and analyzed. The text of railway complaints includes characteristics spanning wide business coverage, various events, serious colloquialisms, interference and useless information. When using the direct classification via traditional text categorization, the classification accuracy is low. The key to the automatic classification of such text lies in an eigenvalue extraction. The more accurate the eigenvalue extraction, the higher the accuracy of text classification. In this paper, the TF-IDF algorithm, TextRank algorithm and Word2vec algorithm are selected to extract text eigenvalues, and a railway complaint text classification method is constructed with a naive Bayesian classifier. The three types of eigenvalue extraction algorithms are compared. The TF-IDF algorithm, based on eigenvalue extraction, achieves the highest automatic text classification accuracy.
- Published
- 2019
- Full Text
- View/download PDF
76. A Graph-Based Ranking Model for Automatic Keyphrases Extraction from Arabic Documents
- Author
-
El BazzI, Mohamed Salim, Mammass, Driss, Zaki, Taher, Ennaji, Abdelatif, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, and Perner, Petra, editor
- Published
- 2017
- Full Text
- View/download PDF
77. 融合主题特征的文本自动摘要方法研究.
- Author
-
罗 芳, 汪竞航, 何道森, and 蒲秋梅
- Subjects
- *
THESIS statements (Rhetoric) , *TEXT mining , *PROBABILITY theory , *MATRICES (Mathematics) , *TEXT messages - Abstract
Aiming at the traditional graph models for text summarization only focus on statistical features or shallow semantic features, and lack mining and utilization of deep topic semantic features, this paper proposed MDSR( multi-dimension summarization rank), an automatic text summarization method that combined topic feature. Specifically, this method adopted the LDA model to mine the semantic information of text topics and measured the impact of topic feature on a sentence by defining the importance of the topic. And it improved the construction mode of the probability transition matrix of graph model nodes by combining the topic feature with statistic features and inter-sentence similarity. Finally, it extracted and measured summarization according to the weight of sentence nodes. The results show that the ROUGE value evaluates by MDSR reaches the best when the weight ratio of topic feature, statistic feature and inter-sentence similarity is 3 : 4 : 3. The ROUGE-1, ROUGE-2, ROUGE-SU4 are 53. 35%,35. 18% and 33. 86%, which perform better than other comparisons. It shows that the text summarization method combining topic feature can effectively improve the accuracy of the summarization extraction. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
78. FuzzyFeatureRank. Bringing order into fuzzy classifiers through fuzzy expressions.
- Author
-
Carmona, Pablo and Castro, Juan Luis
- Subjects
- *
FEATURE selection , *WEIGHTED graphs , *DIRECTED graphs , *FUZZY sets , *FEATURE extraction - Abstract
This work presents FuzzyFeatureRank, a new feature reduction method inspired on PageRank to reduce the dimensionality of the feature space in supervised classification problems. More precisely, as it relies on a weighted directed graph, it is ultimately inspired on TextRank, a PageRank based method that adds weights to the edges to express the strength of the connections between nodes. The method is based on dividing each original feature used to describe the data into a set of fuzzy predicates and then ranking all of them by their ability to differentiate among classes in the light of the training set. In order to do that, both the information gained by each predicate and their redundancy with other already selected predicates are taken into account. The fuzzy predicates with the best scores can then be used as a reduced input to construct fuzzy classifiers that consider only the preselected predicates to build the antecedents of the fuzzy rules. The novelty of the proposal relies on being an approach halfway between feature selection and feature extraction approaches, being able to improve the discrimination ability of the original features but preserving the interpretability of the new features in the sense that they are fuzzy expressions. The experimental results support the suitability of the proposal. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
79. From web to SMS: A text summarization of Wikipedia pages with character limitation
- Author
-
J.L.E.K Fendji and B.A.H. Aminatou
- Subjects
character-limitation summarization ,sms ,lsa ,textrank ,rouge ,tacos ,wikipedia ,Technology - Abstract
Wikipedia is one of the main sources of information on the Web. But the access to this content may be difficult especiallywhen using a basic telephone without browsing capability and only a GSM network. The only means of text-basedcommunication remains through SMS. Due to the limitation of the number of characters, a Wikipedia page cannot alwaysbe sent through SMS. This work raises the issue of text summarization with character limitation. To solve this issue, twoextractive approaches have been combined: LSA and TextRank algorithms. Generated summaries have been evaluated usingROUGE metrics. Since ROUGE metrics do not consider character limitation, a new threshold named Threshold ofAcceptability for Character-Oriented Summaries (TACOS) has been proposed to appreciate ROUGE metrics. The evaluationshowed the relevance of the approach for pages of at most 2000 characters. The system has been tested using the SMSsimulator of RapidSMS without a GSM gateway to simulate the deployment in a real environment. To the best of ourknowledge, this is the first work tackling text summarization issue with character limitation.
- Published
- 2020
- Full Text
- View/download PDF
80. From web to SMS: A text summarization of Wikipedia pages with character limitation.
- Author
-
Fendji, J. L. E. K. and Aminatou, B. A. H.
- Subjects
TEXT messages ,WORLD Wide Web ,TELEPHONES ,COMMUNICATION ,INFORMATION & communication technologies - Abstract
Wikipedia is one of the main sources of information on the Web. But the access to this content may be difficult especially when using a basic telephone without browsing capability and only a GSM network. The only means of text-based communication remains through SMS. Due to the limitation of the number of characters, a Wikipedia page cannot always be sent through SMS. This work raises the issue of text summarization with character limitation. To solve this issue, two extractive approaches have been combined: LSA and TextRank algorithms. Generated summaries have been evaluated using ROUGE metrics. Since ROUGE metrics do not consider character limitation, a new threshold named Threshold of Acceptability for Character-Oriented Summaries (TACOS) has been proposed to appreciate ROUGE metrics. The evaluation showed the relevance of the approach for pages of at most 2000 characters. The system has been tested using the SMS simulator of RapidSMS without a GSM gateway to simulate the deployment in a real environment. To the best of our knowledge, this is the first work tackling text summarization issue with character limitation. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
81. Two Improved Topic Word Detection Algorithms.
- Author
-
Yu, Zehao
- Subjects
INFORMATION filtering ,ALGORITHMS ,MACHINE learning ,VOCABULARY - Abstract
Topic word extraction is the task of identifying single or multi-word expressions that represent the main topics of a document. In this paper, two improved algorithms for extracting and discovering topic words are proposed in the Rapid Topic word Detection (RTD) Algorithm and CategoryTextRank (CTextRank) Algorithm, which can effectively obtain information by extracting and filtering the topic words in the text. The algorithms overcome the shortcomings of traditional topic words discovering algorithms that require deep linguistic knowledge, domain or language specific annotated corpora. The two algorithms we proposed can process both short and long text. The biggest advantage of the algorithms is that they are unsupervised machine learning algorithms. They need not be trained to process text directly to get topic words. The Accuracy rate, recall rate and F-measure index have been greatly improved when using the two algorithms which show that the results obtained compare favorably with previously published results on datasets Inspec and SemEval. The first algorithm Rapid Topicword Detection improves the metrics compared to PositionRank and TextRank, the second algorithm CategoryTextRank improves the metrics compared to TextRank, SingleRank and TF-IDF. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
82. Implementation of Automatic Text Summarization with TextRank Method in the Development of Al-Qur'an Vocabulary Encyclopedia.
- Author
-
Fakhrezi, Muhamad Fahmi, Bijaksana, Moch. Arif, and Huda, Arief Fatchul
- Subjects
VOCABULARY ,SEMANTICS - Abstract
Studying the Qur'an by understanding the Qur'an's vocabulary so that understanding its meaning is not easy. Then we need a Qur'anic vocabulary encyclopedia that focuses on explaining the meaning of the words in it. The development of encyclopedias used automatic text summarization with the TextRank method because from one query that is searched there are many meanings of words that must be summarized. This method starts by selecting documents that are relevant to the query, then summarizes the selected documents using the TextRank method so that they get a summary based on all the word meanings, finally testing the summary results by the system by comparing them with summary targets that are constructed manually by humans. The application of the TextRank method for automatic text summarization has an average value of F-Score 0.6173. The results of automatic text summarization using the TextRank method are no duplicates, and for some queries, it is almost the same as the summary results created manually by humans. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
83. Cleveree: an artificially intelligent web service for Jacob voice chatbot.
- Author
-
Octavany and Arya Wicaksana
- Subjects
- *
WEB services , *CHATBOTS , *TECHNOLOGY Acceptance Model , *ARTIFICIAL intelligence , *BUILDING operation management - Abstract
Jacob is a voice chatbot that use Wit.ai to get the context of the question and give an answer based on that context. However, Jacob has no variation in answer and could not recognize the context well if it has not been learned previously by the Wit.ai. Thus, this paper proposes two features of artificial intelligence (AI) built as a web service: the paraphrase of answers using the Stacked Residual LSTM model and the question summarization using Cosine Similarity with pre-trained Word2Vec and TextRank algorithm. These two features are novel designs that are tailored to Jacob, this AI module is called Cleveree. The evaluation of Cleveree is carried out using the technology acceptance model (TAM) method and interview with Jacob admins. The results show that 79.17% of respondents strongly agree that both features are useful and 72.57% of respondents strongly agree that both features are easy to use. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
84. 基于加权 Textrank 的中文自动文本摘要.
- Author
-
黄波 and 刘传才
- Subjects
- *
LEXICON , *CORPORA , *KEYWORDS , *RESEMBLANCE (Philosophy) , *VOCABULARY , *AUTOMOBILE defects - Abstract
The method of Chinese existing automatic text summarization mainly utilizes the text's own information, and its defect is that it cannot make full use of the related semantic information between the words. Therefore, this paper proposed an improved Chinese text summarization method. This method integrated the information of the external corpora into the TextRank algorithm in the form of a word vector. Combined TextRank with word2vec, it mapped each word in the sentence to the high dimensional lexicon to form a sentence vector. This method fully considered the similarity between sentences, the coverage of keywords and the similarity between sentence and title to calculate the influence weights among sentences, and choose the top-ranked sentences used as the summarization of the text. The results of experiment show that this method achieves good results in the data set of this paper, and is more effective than the original method in extracting Chinese summarization automatically. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
85. Automatic document summarization based on statistical information
- Author
-
A. Mussina, S. Aubakirov, D. Ahmed-Zaki, and P. Trigo
- Subjects
summarization ,automatic extraction ,key-words ,n-gram ,textrank ,Mechanical engineering and machinery ,TJ1-1570 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Actual problem in nowadays is to efficiently process the large amount of data that pass through our mind everyday. The object of study of this paper is automatic summarization algorithms. The main goal is to implement and make comparison of different summarization techniques on corpora of news articles parsed from the web. This research work contains the description of three summarization techniques based on TextRank algorithm: General TextRank, BM25, LongestCommonSubstring. It is specially noted the languages of used corpora: Russian and Kazakh languages. The results of summarization processes and their comparison are provided. It should be emphasized that used algorithms are well-known, but the way of their evaluation on defined corpora is different from those which usually used in summary evaluation. The method of summary evaluation proposed use the special dictionary of extracted key-words on the topic of corpora. As the title implies the article describes applying statistical information. The semantic and syntactic features of text are not examined.
- Published
- 2017
86. Towards Evaluating the Impact of Anaphora Resolution on Text Summarisation from a Human Perspective
- Author
-
Bayomi, Mostafa, Levacher, Killian, Ghorab, M. Rami, Lavin, Peter, O’Connor, Alexander, Lawless, Séamus, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Métais, Elisabeth, editor, Meziane, Farid, editor, Saraee, Mohamad, editor, Sugumaran, Vijayan, editor, and Vadera, Sunil, editor
- Published
- 2016
- Full Text
- View/download PDF
87. Автоматизована система каталогізації та класифікації електронних видань у бібліотеці
- Author
-
Ульяницька, Ксенія Олександрівна
- Subjects
каталогізація ,електронні видання ,класифікація ,TextRank ,025 .3/.4 [004] ,автоматизована система ,наївний Баєвський класифікатор - Abstract
Структура та обсяг роботи. Пояснювальна записка дипломного проєкту складається з 5 розділів, містить 16 рисунків, 12 таблиць, 1 додаток, 4 кресленика та посилання на 15 літературних джерел. Метою дипломного проєкту є створення автоматизованої системи каталогізації та класифікації, що дозволить бібліотекарям ефективно керувати електронними виданнями бібліотеки, забезпечувати зручний доступ до цих ресурсів та зменшувати час та зусилля, які необхідні для їх управління. У розділі загальні положення описано бізнес-процеси бібліотеки та процес діяльності, визначено варіанти використання та відповідні функціональні вимоги, а також визначено переваги та недоліки існуючих аналогів системи та відмінності їх від системи, що розробляється. Розділ інформаційного забезпечення присвячено опису вхідних та вихідних даних системи, проектуванню бази даних. У розділі математичного забезпечення обґрунтовано вибір методів розв’язання задачі та описано відповідні алгоритми. Розділ програмного та технічного забезпечення присвячено опису засобів розробки автоматизованої системи та архітектури проєкту. У технологічному розділі визначено та описано випробування автоматизованої системи та наведено їх результати. Розроблена автоматизована система каталогізації та класифікації електронних видань може використовуватись як прототип для інтеграції в існуючі бібліотечні системи управління. Structure and scope of the work. The explanatory report of the diploma project consists of 5 chapters, 16 figures, 12 tables, 1 application, 4 drawings and references to 15 literary sources. The aim of the diploma project is to create an automated cataloguing and classification system that will allow librarians to effectively manage the library's electronic publications, provide convenient access to these resources, and reduce the time and effort required to manage them. The general framework chapter describes the library's business processes and activities, identifies the options for use and the corresponding functional requirements, and identifies the advantages and disadvantages of existing analogues of the system and their differences from the system under development. The chapter on information support is devoted to the description of input and output data of the system, and the design of the database. The chapter on mathematical support substantiates the choice of methods for solving the problem and describes the relevant algorithms. The software and hardware chapter describes the development tools for the automated system and the project architecture. The technological chapter identifies and describes the tests of the automated system and presents their results. The developed automated system for cataloguing and classifying electronic publications can be used as a prototype for integration into existing library management systems.
- Published
- 2023
88. Extractive Article Summarization Using Integrated TextRank and BM25+ Algorithm
- Author
-
Vaibhav Gulati, Deepika Kumar, Daniela Elena Popescu, and Jude D. Hemanth
- Subjects
Computer Networks and Communications ,Hardware and Architecture ,Control and Systems Engineering ,article ,text ,summarization ,extractive ,graph-based ,BM25 ,ROUGE ,TextRank ,Signal Processing ,Electrical and Electronic Engineering - Abstract
The quantity of textual data on the internet is growing exponentially, and it is very tough task to obtain important and relevant information from it. An efficient and effective method is required that provides a concise summary of an article. This can be achieved by the usage of automatic text summarization. In this research, the authors suggested an efficient approach for text summarization where an extractive summary is generated from an article. The methodology was modified by integrating a normalized similarity matrix of both BM25+ and conventional TextRank algorithm, which resulted in the improvised results. A graph is generated by taking the sentences in the article as nodes and edge weights as the similarity score between two sentences. The maximum rank nodes are selected, and the summary is extracted. Empirical evaluation of the proposed methodology was analyzed and compared with baseline methods viz. the conventional TextRank algorithm, term frequency–inverse document frequency (TF–IDF) cosine, longest common consequence (LCS), and BM25+ by taking precision, recall, and F1 score as evaluation criteria. ROUGE-1, ROUGE-2, and ROUGE-L scores were calculated for all the methods. The outcomes demonstrate that the proposed method can efficiently summarize any article irrespective of the category it belongs to.
- Published
- 2023
- Full Text
- View/download PDF
89. A Framework for Generating Extractive Summary from Multiple Malayalam Documents
- Author
-
K. Manju, S. David Peter, and Sumam Mary Idicula
- Subjects
Malayalam language ,extractive mutidocument summarization ,NLP ,sentence encoding ,TextRank ,maximum marginal relevance ,Information technology ,T58.5-58.64 - Abstract
Automatic extractive text summarization retrieves a subset of data that represents most notable sentences in the entire document. In the era of digital explosion, which is mostly unstructured textual data, there is a demand for users to understand the huge amount of text in a short time; this demands the need for an automatic text summarizer. From summaries, the users get the idea of the entire content of the document and can decide whether to read the entire document or not. This work mainly focuses on generating a summary from multiple news documents. In this case, the summary helps to reduce the redundant news from the different newspapers. A multi-document summary is more challenging than a single-document summary since it has to solve the problem of overlapping information among sentences from different documents. Extractive text summarization yields the sensitive part of the document by neglecting the irrelevant and redundant sentences. In this paper, we propose a framework for extracting a summary from multiple documents in the Malayalam Language. Also, since the multi-document summarization data set is sparse, methods based on deep learning are difficult to apply. The proposed work discusses the performance of existing standard algorithms in multi-document summarization of the Malayalam Language. We propose a sentence extraction algorithm that selects the top ranked sentences with maximum diversity. The system is found to perform well in terms of precision, recall, and F-measure on multiple input documents.
- Published
- 2021
- Full Text
- View/download PDF
90. Improved Automatic Keyword Extraction Based on TextRank Using Domain Knowledge
- Author
-
Li, Guangyi, Wang, Houfeng, Junqueira Barbosa, Simone Diniz, Series editor, Chen, Phoebe, Series editor, Cuzzocrea, Alfredo, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kara, Orhun, Series editor, Kotenko, Igor, Series editor, Sivalingam, Krishna M., Series editor, Ślęzak, Dominik, Series editor, Washio, Takashi, Series editor, Yang, Xiaokang, Series editor, Zong, Chengqing, editor, Nie, Jian-Yun, editor, Zhao, Dongyan, editor, and Feng, Yansong, editor
- Published
- 2014
- Full Text
- View/download PDF
91. A Hybrid Method of Sentiment Key Sentence Identification Using Lexical Semantics and Syntactic Dependencies
- Author
-
Feng, Chong, Liao, Chun, Liu, Zhirun, Huang, Heyan, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Han, Weihong, editor, Huang, Zi, editor, Hu, Changjun, editor, Zhang, Hongli, editor, and Guo, Li, editor
- Published
- 2014
- Full Text
- View/download PDF
92. Key word extraction for short text via word2vec, doc2vec, and textrank.
- Author
-
Jun LI, Guimin HUANG, Chunli FAN, Zhenglin SUN, and Hongtao ZHU
- Subjects
- *
KEYWORDS , *SOCIAL values , *SOCIAL media - Abstract
The rapid development of social media encourages people to share their opinions and feelings on the Internet. Every day, a large number of short text comments are generated through Twitter, microblogging, WeChat, etc., and there is high commercial and social value in extracting useful information from these short texts. At present, most studies have focused on extracting text key words. For example, the LDA topic model has good performance with long texts, but it loses effectiveness with short texts because of the noise and sparsity problems. In this paper, we attempt to use Word2Vec and Doc2Vec to improve short-text key word extraction. We first added the method of the collaborative training of word vectors and paragraph vectors and then used the TextRank model's clustering nodes. We adjusted the weights of the key words that were generated by computing the jump probability between nodes and then obtained the node-weighted score, and eventually sorted the generated key words. The experimental results show that the improved method has good performance on the datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
93. 基于词向量与TextRank的关键词提取方法.
- Author
-
周锦章 and 崔晓晖
- Subjects
- *
TRANSFER matrix , *KEYWORDS , *SEMANTICS , *ALGORITHMS , *PROBABILITY theory , *CORPORA - Abstract
This paper studied the influence of lexical semantic difference on TextRank algorithm. and presented a keyword extraction method based on word vector and TextRank. Firstly.it used FastText to represent word vector from the document corpus. Then,based on the idea of implicit subject distribution and used the differences in lexical semantics to build a probability transfer matrix for TextRank. Finally.it iteratively calculated the lexical graph model and extracted keywords. Experimental results show that the extraction performance of this method is significantly improved compared with the traditional method. In addition, it is proved that the use of word vectors can improve the performance of TextRank algorithm simply and effectively. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
94. 基于TextRank的自动摘要优化算法.
- Author
-
李娜娜, 刘培玉, 刘文锋, and 刘伟童
- Subjects
- *
REDUNDANCY in engineering , *CRIMINAL sentencing , *PARAGRAPHS , *RESEMBLANCE (Philosophy) , *HAND , *MULTICASTING (Computer networks) - Abstract
When abstracting Chinese texts, the traditional TextRank algorithm only considers the similarity between nodes and neglects other important information of the text. Firstly, aiming at Chinese single document, on the basis of existing research, this paper used TextRank algorithm,on the one hand, it considered the similarities between sentences, on the other hand, Text-Rank was combined with the overall structural information of texts and the contextual information of sentences, such as the physical position of the document sentences or paragraph, feature sentences, core sentences and other sentences that might increase the weight of the sentence, all were used to generate the digest candidate sentence group of the text. And then, removing high-similarity sentences by redundancy processing technology on the digest candidate sentence group. Finally, the experimental verification shows that the algorithm can improve the accuracy of the generated digest, indicating the effectiveness of the algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
95. A Novel Framework for Automatic Chinese Question Generation Based on Multi-Feature Neural Network Model.
- Author
-
Hai-Tao Zheng, Jinxin Han, Jinyuan Chen, and Sangaiah, Arun Kumar
- Abstract
Automatic question generation from text or paragraph is a great challenging task which attracts broad attention in natural language processing. Because of the verbose texts and fragile ranking methods, the quality of top generated questions is poor. In this paper, we present a novel framework Automatic Chinese Question Generation (ACQG) to generate questions from text or paragraph. In ACQG, we use an adopted TextRank to extract key sentences and a template-based method to construct questions from key sentences. Then a multi-feature neural network model is built for ranking to obtain the top questions. The automatic evaluation result reveals that the proposed framework outperforms the state-of-the-art systems in terms of perplexity. In human evaluation, questions generated by ACQG rate a higher score. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
96. Knowledge aggregation of the WeChat Official Accounts Platform based on tag clustering
- Author
-
Cheng, Zixuan, Zhang, Xiangxian, Lu, Heng, and Guo, Shunli
- Published
- 2021
- Full Text
- View/download PDF
97. Unsupervised Extraction of Keywords from News Archives
- Author
-
Palomino, Marco A., Wuytack, Tom, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, and Vetulani, Zygmunt, editor
- Published
- 2011
- Full Text
- View/download PDF
98. Graphs, Computation, and Language
- Author
-
Dmitry Ustalov
- Subjects
PageRank ,Semantic Networks ,Classification Evaluation ,Co-occurrence Graphs ,ComputingMilieux_COMPUTERSANDEDUCATION ,Centrality ,Knowledge Graphs ,Evaluation ,reCAPTCHA ,Clustering Evaluation ,Ontology ,Spectral Clustering ,TransE ,Bradley-Terry ,Resource Description Framework ,Graph Convolutional Network ,OntoLearn ,Dawid-Skene ,Crowdsourcing ,Graph Attention Network ,Games with a Purpose ,Wisdom of the Crowds ,clustering ,node2vec ,Quality Control ,Statistical Testing ,Graph Clustering ,MaxMax ,Inter-Rater Agreement ,Chinese Whispers ,Language Graphs ,Sparse Matrix ,Representation Learning ,Microtasks ,Markov Clustering ,RDF ,Laplacian Eigenmaps ,Tutorial ,Translating Embeddings ,Word2Vec ,TextRank ,GraphSAGE ,Natural Language Processing ,Semantic Web ,Taxonomy ,Network Science ,lecture ,Louvain ,Quality Assessment ,Ranking Evaluation ,Poincare Embeddings ,language resources ,Hearst Patterns ,Graph Theory ,Linked Data ,Matrix Representations ,Graph Embeddings ,Watset ,DeepWalk ,Power Method ,Wikipedia - Abstract
Employing the properties of linguistic networks allows discovering structure and making predictions. This course seeks answers to three questions: (1) how to express the linguistic phenomena as graphs, (2) how to gain knowledge based on them, and (3) how to assess the quality of this knowledge. We will start with traditional graph-based Natural Language Processing (NLP) methods like TextRank and Markov Clustering and finish with such contemporary Machine Learning techniques as DeepWalk and Graph Convolutional Networks. As the growing interest in NLP methods urges their meaningful evaluation, we pay special attention to quality assessment and human judgements. The course has five lectures on Language Graphs, Graph Clustering, Graph Embeddings, Knowledge Graphs, and Evaluation. They elaborately go through the essential algorithms step-by-step, discuss case studies, and suggest insightful references and datasets. The target audience is undergraduate and graduate students, data analysts, and interdisciplinary researchers (but it is not limited to them). The course was held in person in August 2022 at the 33rd European Summer School in Logic, Language and Information (ESSLLI 2022) in Galway, Ireland: https://2022.esslli.eu/courses-workshops-accepted/week-1-and-2-schedule.html., {"references": ["Agirre, E., L\u00f3pez de Lacalle, O., Soroa, A.: Random Walks for Knowledge-Based Word Sense Disambiguation. Computational Linguistics. 40, 57\u201384 (2014). https://doi.org/10.1162/COLI_a_00164", "von Ahn, L., Dabbish, L.: Labeling Images with a Computer Game. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. pp. 319\u2013326. ACM, Vienna, Austria (2004). https://doi.org/10.1145/985692.985733", "Ali, M., Berrendorf, M., Hoyt, C.T., Vermue, L., Sharifzadeh, S., Tresp, V., Lehmann, J.: PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings. Journal of Machine Learning Research. 22, 1\u20136 (2021)", "Alonso, O., Rose, D.E., Stewart, B.: Crowdsourcing for Relevance Evaluation. SIGIR Forum. 42, 9\u201315 (2008). https://doi.org/10.1145/1480506.1480508", "Ardila, R., Branson, M., Davis, K., Henretty, M., Kohler, M., Meyer, J., Morais, R., Saunders, L., Tyers, F.M., Weber, G.: Common Voice: A Massively-Multilingual Speech Corpus. In: Proceedings of The 12th Language Resources and Evaluation Conference. pp. 4218\u20134222. European Language Resources Association (ELRA), Marseille, France (2020)", "Artstein, R., Poesio, M.: Inter-Coder Agreement for Computational Linguistics. Computational Linguistics. 34, 555\u2013596 (2008). https://doi.org/10.1162/coli.07-034-R2", "Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A Nucleus for a Web of Open Data. In: The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11\u201315, 2007. Proceedings. pp. 722\u2013735. Springer Berlin Heidelberg, Berlin; Heidelberg, Germany (2007). https://doi.org/10.1007/978-3-540-76298-0_52", "Azadani, M.N., Ghadiri, N., Davoodijam, E.: Graph-based biomedical text summarization: An itemset mining and sentence clustering approach. Journal of Biomedical Informatics. 84, 42\u201358 (2018). https://doi.org/10.1016/j.jbi.2018.06.005", "Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet Project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1. pp. 86\u201390. Association for Computational Linguistics, Montr\u00e9al, QC, Canada (1998). https://doi.org/10.3115/980845.980860", "Barab\u00e1si, A.-L., Albert, R.: Emergence of Scaling in Random Networks. Science. 286, 509\u2013512 (1999). https://doi.org/10.1126/science.286.5439.509", "Bartunov, S., Kondrashkin, D., Osokin, A., Vetrov, D.: Breaking Sticks and Ambiguities with Adaptive Skip-gram. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics. pp. 130\u2013138. PMLR, Cadiz, Spain (2016)", "Bavelas, A.: Communication Patterns in Task-Oriented Groups. The Journal of the Acoustical Society of America. 22, 725\u2013730 (1950). https://doi.org/10.1121/1.1906679", "Belkin, M., Niyogi, P.: Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation. 15, 1373\u20131396 (2003). https://doi.org/10.1162/089976603321780317", "Biemann, C., Riedl, M.: Text: now in 2D! A framework for lexical expansion with contextual similarity. Journal of Language Modelling. 1, 55\u201395 (2013). https://doi.org/10.15398/jlm.v1i1.60", "Biemann, C.: Chinese Whispers: An Efficient Graph Clustering Algorithm and Its Application to Natural Language Processing Problems. In: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing. pp. 73\u201380. Association for Computational Linguistics, New York, NY, USA (2006). https://doi.org/10.3115/1654758.1654774", "Biemann, C.: Creating a system for lexical substitutions from scratch using crowdsourcing. Language Resources and Evaluation. 47, 97\u2013122 (2013). https://doi.org/10.1007/s10579-012-9180-5", "Biemann, C.: Structure Discovery in Natural Language. Springer Berlin Heidelberg (2012). https://doi.org/10.1007/978-3-642-25923-4", "Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O'Reilly Media (2017)", "Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment. 2008, P10008 (2008). https://doi.org/10.1088/1742-5468/2008/10/P10008", "Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics. 5, 135\u2013146 (2017). https://doi.org/10.1162/tacl_a_00051", "Bonacich, P.: Power and Centrality: A Family of Measures. American Journal of Sociology. 92, 1170\u20131182 (1987). https://doi.org/10.1086/228631", "Bonnabel, S.: Stochastic Gradient Descent on Riemannian Manifolds. IEEE Transactions on Automatic Control. 58, 2217\u20132229 (2013). https://doi.org/10.1109/TAC.2013.2254619", "Bordea, G., Lefever, E., Buitelaar, P.: SemEval-2016 Task 13: Taxonomy Extraction Evaluation (TExEval-2). In: Proceedings of the 10th International Workshop on Semantic Evaluation. pp. 1081\u20131091. Association for Computational Linguistics, San Diego, CA, USA (2016). https://doi.org/10.18653/v1/S16-1168", "Bordes, A., Chopra, S., Weston, J.: Question Answering with Subgraph Embeddings. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. pp. 615\u2013620. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/D14-1067", "Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating Embeddings for Modeling Multi-relational Data. In: Advances in Neural Information Processing Systems 26. pp. 2787\u20132795. Curran Associates, Inc., Lake Tahoe, NV, USA (2013)", "Boudin, F.: A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing. pp. 834\u2013838. Asian Federation of Natural Language Processing, Nagoya, Japan (2013)", "Brandes, U.: On variants of shortest-path betweenness centrality and their generic computation. Social Networks. 30, 136\u2013145 (2008). https://doi.org/10.1016/j.socnet.2007.11.001", "Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems. 30, 107\u2013117 (1998). https://doi.org/10.1016/S0169-7552(98)00110-X", "Brody, S., Alon, U., Yahav, E.: How Attentive are Graph Attention Networks? In: 10th International Conference on Learning Representations. OpenReview.net, Virtual (2022)", "Buckley, C., Voorhees, E.M.: Evaluating Evaluation Measure Stability. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 33\u201340. Association for Computing Machinery, Athens, Greece (2000). https://doi.org/10.1145/345508.345543", "Cai, H., Zheng, V.W., Chen-Chuan Chang, K.: A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications. IEEE Transactions on Knowledge and Data Engineering. 30, 1616\u20131637 (2018). https://doi.org/10.1109/TKDE.2018.2807452", "Callison-Burch, C.: Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon's Mechanical Turk. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. pp. 286\u2013295. Association for Computational Linguistics; Asian Federation of Natural Language Processing, Singapore (2009). https://doi.org/10.3115/1699510.1699548", "Camacho-Collados, J., Delli Bovi, C., Espinosa-Anke, L., Oramas, S., Pasini, T., Santus, E., Shwartz, V., Navigli, R., Saggion, H.: SemEval-2018 Task 9: Hypernym Discovery. In: Proceedings of The 12th International Workshop on Semantic Evaluation. pp. 712\u2013724. Association for Computational Linguistics, New Orleans, LA, USA (2018). https://doi.org/10.18653/v1/S18-1115", "Chang, J., Boyd-Graber, J., Gerrish, S., Wang, C., Blei, D.M.: Reading Tea Leaves: How Humans Interpret Topic Models. In: Advances in Neural Information Processing Systems 22. pp. 288\u2013296. Curran Associates, Inc., Vancouver, BC, Canada (2009)", "Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P.: Expected Reciprocal Rank for Graded Relevance. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. pp. 621\u2013630. Association for Computing Machinery, Hong Kong, China (2009). https://doi.org/10.1145/1645953.1646033", "Chen, D., Lin, Y., Li, W., Li, P., Zhou, J., Sun, X.: Measuring and Relieving the Over-Smoothing Problem for Graph Neural Networks from the Topological View. Proceedings of the AAAI Conference on Artificial Intelligence. 34, 3438\u20133445 (2020). https://doi.org/10.1609/aaai.v34i04.5747", "Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 21, 6 (2020). https://doi.org/10.1186/s12864-019-6413-7", "Cimiano, P., Chiarcos, C., McCrae, J.P., Gracia, J.: Linguistic Linked Data: Representation, Generation and Applications. Springer International Publishing (2020). https://doi.org/10.1007/978-3-030-30225-2", "Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press (2022)", "Cs\u00e1rdi, G., Nepusz, T.: The igraph software package for complex network research. InterJournal Complex Systems. 1695, 1\u20139 (2006)", "Dacrema, M.F., Cremonesi, P., Jannach, D.: Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches. In: Proceedings of the 13th ACM Conference on Recommender Systems. pp. 101\u2013109. Association for Computing Machinery, Copenhagen, Denmark (2019). https://doi.org/10.1145/3298689.3347058", "Davis, J., Goadrich, M.: The Relationship between Precision-Recall and ROC Curves. In: Proceedings of the 23rd International Conference on Machine Learning. pp. 233\u2013240. Association for Computing Machinery, Pittsburgh, PA, USA (2006). https://doi.org/10.1145/1143844.1143874", "Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). pp. 4171\u20134186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/N19-1423", "Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik. 1, 269\u2013271 (1959). https://doi.org/10.1007/BF01386390", "Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 601\u2013610. Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/2623330.2623623", "van Dongen, S.: Graph Clustering by Flow Simulation, (2000)", "Dorogovtsev, S.N., Mendes, J.F.F.: Language as an evolving word web. Proceedings of the Royal Society of London B: Biological Sciences. 268, 2603\u20132606 (2001). https://doi.org/10.1098/rspb.2001.1824", "Dorogovtsev, S.N., Mendes, J.F.F.: The Nature of Complex Networks. Oxford University Press, Oxford, UK (2022)", "Dorow, B., Widdows, D.: Discovering Corpus-Specific Word Senses. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics - Volume 2. pp. 79\u201382. Association for Computational Linguistics, Budapest, Hungary (2003). https://doi.org/10.3115/1067737.1067753", "Dror, R., Baumer, G., Shlomov, S., Reichart, R.: The Hitchhiker's Guide to Testing Statistical Significance in Natural Language Processing. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1383\u20131392. Association for Computational Linguistics, Melbourne, VIC, Australia (2018). https://doi.org/10.18653/v1/P18-1128", "Estell\u00e9s-Arolas, E., Gonz\u00e1lez-Ladr\u00f3n-de-Guevara, F.: Towards an integrated crowdsourcing definition. Journal of Information Science. 38, 189\u2013200 (2012). https://doi.org/10.1177/0165551512437638", "Faralli, S., Panchenko, A., Biemann, C., Ponzetto, S.P.: Linked Disambiguated Distributional Semantic Networks. In: The Semantic Web \u2013 ISWC 2016, 15th International Semantic Web Conference, Kobe, Japan, October 17\u201321, 2016, Proceedings, Part II. pp. 56\u201364. Springer International Publishing, Cham, Switzerland (2016). https://doi.org/10.1007/978-3-319-46547-0_7", "Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E., Smith, N.A.: Retrofitting Word Vectors to Semantic Lexicons. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 1606\u20131615. Association for Computational Linguistics, Denver, CO, USA (2015). https://doi.org/10.3115/v1/N15-1184", "Fellbaum, C.: WordNet: An Electronic Database. MIT Press, Massachusetts, MA, USA (1998). https://doi.org/10.7551/mitpress/7287.001.0001", "Fey, M., Lenssen, J.E.: Fast Graph Representation Learning with PyTorch Geometric. In: ICLR 2019 Workshop on Representation Learning on Graphs and Manifolds, New Orleans, LA, USA (2019)", "Fillmore, C.J.: Frame Semantics. In: Linguistics in the Morning Calm. pp. 111\u2013137. Hanshin Publishing Co., Seoul, South Korea (1982)", "Florescu, C., Caragea, C.: PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1105\u20131115. Association for Computational Linguistics, Vancouver, BC, Canada (2017). https://doi.org/10.18653/v1/P17-1102", "Fortunato, S.: Community detection in graphs. Physics Reports. 486, 75\u2013174 (2010). https://doi.org/10.1016/j.physrep.2009.11.002", "Fowlkes, E.B., Mallows, C.L.: A Method for Comparing Two Hierarchical Clusterings. Journal of the American Statistical Association. 78, 553\u2013569 (1983). https://doi.org/10.1080/01621459.1983.10478008", "Freeman, L.C.: A Set of Measures of Centrality Based on Betweenness. Sociometry. 40, 35\u201341 (1977). https://doi.org/10.2307/3033543", "Frey, B.J., Dueck, D.: Clustering by Passing Messages Between Data Points. Science. 315, 972\u2013976 (2007). https://doi.org/10.1126/science.1136800", "Fu, R., Guo, J., Qin, B., Che, W., Wang, H., Liu, T.: Learning Semantic Hierarchies via Word Embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics Volume 1: Long Papers. pp. 1199\u20131209. Association for Computational Linguistics, Baltimore, MD, USA (2014). https://doi.org/10.3115/v1/P14-1113", "Gallardo, P.F.: Google's secret and Linear Algebra. EMS Newsletter. 63, 10\u201315 (2007)", "Goldhahn, D., Eckart, T., Quasthoff, U.: Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. In: Proceedings of the Eight International Conference on Language Resources and Evaluation. pp. 759\u2013765. European Language Resources Association (ELRA), Istanbul, Turkey (2012)", "Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: Graph Processing in a Distributed Dataflow Framework. In: 11th USENIX Symposium on Operating Systems Design and Implementation. pp. 599\u2013613. USENIX Association, Broomfield, CO, USA (2014)", "Good, B.H., de Montjoye, Y.-A., Clauset, A.: Performance of modularity maximization in practical contexts. Physical Review E. 81, 046106 (2010). https://doi.org/10.1103/PhysRevE.81.046106", "Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge, MA, USA (2016)", "Gorodkin, J.: Comparing two K-category assignments by a K-category correlation coefficient. Computational Biology and Chemistry. 28, 367\u2013374 (2004). https://doi.org/10.1016/j.compbiolchem.2004.09.006", "Goyal, P., Ferrara, E.: Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems. 151, 78\u201394 (2018). https://doi.org/10.1016/j.knosys.2018.03.022", "Grover, A., Leskovec, J.: node2vec: Scalable Feature Learning for Networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 855\u2013864. ACM, San Francisco, CA, USA (2016). https://doi.org/10.1145/2939672.2939754", "G\u00f6sgens, M., Tikhonov, A., Prokhorenkova, L.: Systematic Analysis of Cluster Similarity Indices: How to Validate Validation Measures. In: Proceedings of the 38th International Conference on Machine Learning. pp. 3799\u20133808. PMLR, Online (2021)", "G\u00f6sgens, M., Zhiyanov, A., Tikhonov, A., Prokhorenkova, L.: Good Classification Measures and How to Find Them. In: Advances in Neural Information Processing Systems 34. pp. 17136\u201317147. Curran Associates, Inc., Online (2021)", "Hagberg, A.A., Schult, D.A., Swart, P.J.: Exploring Network Structure, Dynamics, and Function using NetworkX. In: Proceedings of the 7th Python in Science Conference. pp. 11\u201315, Pasadena, CA, USA (2008)", "Hamilton, W.L., Ying, R., Leskovec, J.: Inductive Representation Learning on Large Graphs. In: Advances in Neural Information Processing Systems 30. pp. 1024\u20131034. Curran Associates, Inc., Vancouver, BC, Canada (2017)", "Han, X., Cao, S., Lv, X., Lin, Y., Liu, Z., Sun, M., Li, J.: OpenKE: An Open Toolkit for Knowledge Embedding. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. pp. 139\u2013144. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/D18-2024", "Hansen, P.C.: The truncatedSVD as a method for regularization. BIT Numerical Mathematics. 27, 534\u2013553 (1987). https://doi.org/10.1007/BF01937276", "Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics). 28, 100\u2013108 (1979). https://doi.org/10.2307/2346830", "Hearst, M.A.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proceedings of the 14th Conference on Computational Linguistics - Volume 2. pp. 539\u2013545. Association for Computational Linguistics, Nantes, France (1992). https://doi.org/10.3115/992133.992154", "Heo, Y.-J., Kim, E.-S., Choi, W.S., Zhang, B.-T.: Hypergraph Transformer: Weakly-Supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 373\u2013390. Association for Computational Linguistics, Dublin, Ireland (2022). https://doi.org/10.18653/v1/2022.acl-long.29", "Hitzler, P.: A Review of the Semantic Web Field. Communications of the ACM. 64, 76\u201383 (2021). https://doi.org/10.1145/3397512", "Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory. Neural Computation. 9, 1735\u20131780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735", "Hogan, A., Blomqvist, E., Cochez, M., D'amato, C., Melo, G.D., Gutierrez, C., Kirrane, S., Gayo, J.E.L., Navigli, R., Neumaier, S., Ngomo, A.-C.N., Polleres, A., Rashid, S.M., Rula, A., Schmelzeisen, L., Sequeda, J., Staab, S., Zimmermann, A.: Knowledge Graphs. ACM Computing Surveys. 54, 1\u201337 (2021). https://doi.org/10.1145/3447772", "Hope, D., Keller, B.: MaxMax: A Graph-Based Soft Clustering Algorithm Applied to Word Sense Induction. In: Computational Linguistics and Intelligent Text Processing, 14th International Conference, CICLing 2013, Samos, Greece, March 24-30, 2013, Proceedings, Part I. pp. 368\u2013381. Springer Berlin Heidelberg, Berlin; Heidelberg, Germany (2013). https://doi.org/10.1007/978-3
- Published
- 2022
- Full Text
- View/download PDF
99. Cultural Heritage Storytelling, Engagement and Management in the Era of Big Data and the Semantic Web.
- Author
-
Dimoulas, Charalampos and Dimoulas, Charalampos
- Subjects
Film, TV & radio ,3D content ,3D modeling ,3D reconstruction ,BERT ,Évora ,Greek NLP ,Greek literature ,IEEE 830 standard ,Instagram ,Katharevousa ,POCITYF ,TextRank ,Transformers ,Twitter ,UNESCO ,audience engagement ,audiovisual heritage ,authoring tools ,big data ,biocultural heritage ,content crowdsourcing ,cultural heritage ,data center ,data-driven storytelling ,deep neural networks ,digital marketing ,digital narrative ,digital storytelling ,distant supervision ,eco-friendly ,energy transition ,environmental communication ,event detection ,evolution analytics ,green culture ,green hosting ,green websites ,heritage communication ,heritage management ,intangible heritage ,interactive documentary ,journalism ,literary fiction ,marine heritage ,marine protected areas of outstanding universal value ,media users' engagement ,metadata extraction ,multimedia tools ,n/a ,news ,relation extraction ,requirements engineering ,semantic analysis ,semantic audio ,semantic indexing ,smart cities ,social media ,software sustainability ,soundscapes ,spectral clustering ,static analysis ,sustainability ,text classification - Abstract
Summary: The current Special Issue launched with the aim of further enlightening important CH areas, inviting researchers to submit original/featured multidisciplinary research works related to heritage crowdsourcing, documentation, management, authoring, storytelling, and dissemination. Audience engagement is considered very important at both sites of the CH production-consumption chain (i.e., push and pull ends). At the same time, sustainability factors are placed at the center of the envisioned analysis. A total of eleven (11) contributions were finally published within this Special Issue, enlightening various aspects of contemporary heritage strategies placed in today's ubiquitous society. The finally published papers are related but not limited to the following multidisciplinary topics:Digital storytelling for cultural heritage;Audience engagement in cultural heritage;Sustainability impact indicators of cultural heritage;Cultural heritage digitization, organization, and management;Collaborative cultural heritage archiving, dissemination, and management;Cultural heritage communication and education for sustainable development;Semantic services of cultural heritage;Big data of cultural heritage;Smart systems for Historical cities - smart cities;Smart systems for cultural heritage sustainability.
100. -种基于 TextRank 的单文本关键字提取算法.
- Author
-
柳林青, 余瀚, 费宁, and 陈春玲
- Abstract
As a classical key-word extracting and abstraction auto-generating algorithm, Text Rank considered the text as a group of terms, and sought a latent semantic relationship between terms according to iteratively calculating the weights of the terms in the nodes graph. Based on the nodes graph model of TextRank, combined node graph and Markov state transform model, weighted the edge between nodes with conditional probability, proposed a new nodes graph model and corresponding algorithm TextRank_Revised(TR-R). According to the verification on labeled and unlabeled samples, it shows that without promotion of time complexity, the new algorithm can get a key-word sorting consequence which is closer to the manual than the original algorithm from the single text. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.