Descriptor: "textrank" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"textrank"' showing total 189 results

Start Over Descriptor "textrank"

189 results on '"textrank"'

51. Chinese News Keyword Extraction Algorithm Based on TextRank and Word-Sentence Collaboration

Author: Guo, Qing, Xiong, Ao, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Liu, Qi, editor, Mısır, Mustafa, editor, Wang, Xin, editor, and Liu, Weiping, editor
Published: 2020
Full Text: View/download PDF

52. Uyghur–Kazakh–Kirghiz Text Keyword Extraction Based on Morpheme Segmentation

Author: Sardar Parhat, Mutallip Sattar, Askar Hamdulla, and Abdurahman Kadir
Subjects: Uyghur–Kazakh–Kirghiz, keyword extraction, morpheme segmentation, stem extraction, stem vector, TextRank, Information technology, T58.5-58.64
Abstract: In this study, based on a morpheme segmentation framework, we researched a text keyword extraction method for Uyghur, Kazakh and Kirghiz languages, which have similar grammatical and lexical structures. In these languages, affixes and a stem are joined together to form a word. A stem is a word particle with a notional meaning, while the affixes perform grammatical functions. Because of these derivative properties, the vocabularies used for these languages are huge. Therefore, pre-processing is a necessary step in NLP tasks for Uyghur, Kazakh and Kirghiz. Morpheme segmentation enabled us to remove the suffixes as the auxiliary unit while retaining the meaningful stem and it reduced the dimension of the feature space present in the keyword extraction task for Uyghur, Kazakh and Kirghiz texts. We transformed the morpheme segmentation task into the problem of labeling the morpheme sequences, and we used the Bi-LSTM network to bidirectionally obtain the position feature information of character sequences. We applied CRF to effectively learn the information of the preceding and following label sequences to build a highly accurate Bi-LSTM_CRF morpheme segmentation model, and we prepared morpheme-based experimental text sets by using this model. Subsequently, we used the stem vectors’ similarity to modify the TextRank algorithm, subsequent to the training of the stem embedding vector using the Doc2vec algorithm, and then we performed a text keyword extraction experiment. In this experiment, the highest F1 scores of 43.8%, 44% and 43.9% were obtained for three datasets. The experimental results show that the morpheme-based approach provides much better results than the word-based approach, which shows the stem vector similarity weighting is an efficient method for the text keyword extraction task, thus proving the efficiency of morpheme sequence for morphologically derivative languages.
Published: 2023
Full Text: View/download PDF

53. Improving TextRank Algorithm for Automatic Keyword Extraction with Tolerance Rough Set.

Author: Qiu, Dong and Zheng, Qin
Subjects: ROUGH sets, ALGORITHMS, FUZZY sets, FUZZY graphs
Abstract: Aiming at the shortcomings of the TextRank method (TM) which only considers the co-occurrence between words and the incipient word importance when extracting keywords, this paper proposes a tolerance rough set (TRS)-based unsupervised keyword extraction method. Generally, how to score the words in a document has a significant influence on the word graph modeling. In this paper, we improve TM in two aspects with TRS theory that is used to mine vocabulary, semantics, grammar and other information in the corpus. First, the degree of words belonging to each document is calculated to form a fuzzy membership matrix, which helps to characterize the incipient word importance. Second, the fuzzy membership of words to each word tolerance class is calculated to form a semantic correlation matrix, which contributes to optimize the transition probability of all graph edges. We apply the proposed methods to the clustering tasks of two datasets, outperforming the strong baselines. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

54. Peringkas Otomatis Teks Berbahasa Arab Menggunakan Algoritma TextRank

Author: Muhammad Fikri Hidayattullah and Ardhiyan Azizi
Subjects: automatic summarizer, arabic, textrank, artificial intelligence, machine learning, Electronic computers. Computer science, QA75.5-76.95
Abstract: Increasingly, the amount of data in the form of text documents scattered on the internet is getting bigger. It took a very long time to get the information from each of these documents. For this reason, several researchers developed the Automatic Text Summarizer to summarize text automatically, so that the time needed to get important information from the entire document can be faster. Research that focuses on automatic summarization of Arabic texts is very rare. In fact, there are more than 300 million Arabic speakers in the world and Arabic is the official language at the United Nations. Therefore, this study develops a model that can perform text summarization automatically using the TextRank algorithm. The test results using Q&A Evaluation show very good results with details of the suitability of the summary results with the original text by 90%, the suitability of the summary results with Arabic grammar is 91.43%, the suitability of the summary results is 90%, the ease of understanding the summary results is 90%. and the useful aspects of the model developed were 91.43%.
Published: 2021
Full Text: View/download PDF

55. A Template Approach for Summarizing Restaurant Reviews

Author: Yenliang Chen, Chialing Chang, and Jeryeu Gan
Subjects: Restaurant reviews, sentiment analysis, summarization, template, TextRank, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: In the era of rapid development of social networks, user reviews of restaurant review websites have grown rapidly. In order to allow users to quickly grasp the key points of review information on review sites, this paper provides an abstractive multi-text summary method that can automatically generate template-based review summaries based on predefined topics and sentiments. In particular, for each predefined topic and each type of sentiment (positive or negative), this study uses the TextRank algorithm to find the most representative sentences to form a summary. This method allows users to quickly grasp the positive and negative opinions of each important aspect of the restaurant. The previous research on generating abstracts from reviews either did not generate abstracts based on topics, or they were based on topics generated by random models. However, the latter method cannot guarantee that the topics generated by the random model are really the topics that the user needs. For a restaurant review, some topics are indispensable. In order to ensure that abstracts can be generated for these essential topics, our method predefines the topics that must be generated, and then generates abstracts for these topics. In the evaluation, this study compared the template method with the Refresh and Gensim systems based on criteria such as informativeness, clarity, usefulness and likes. The results show that the method proposed in this paper is superior to the other two summary methods.
Published: 2021
Full Text: View/download PDF

56. Chinese News Keyword Extraction Algorithm Based on TextRank and Topic Model

Author: Xiong, Ao, Guo, Qing, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin (Sherman), Editorial Board Member, Stan, Mircea, Editorial Board Member, Xiaohua, Jia, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Han, Shuai, editor, Ye, Liang, editor, and Meng, Weixiao, editor
Published: 2019
Full Text: View/download PDF

57. Case Facts Analysis Method Based on Deep Learning

Author: Xu, Zihuan, He, Tieke, Lian, Hao, Wan, Jiabing, Wang, Hui, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ni, Weiwei, editor, Wang, Xin, editor, Song, Wei, editor, and Li, Yukun, editor
Published: 2019
Full Text: View/download PDF

58. Unsupervised Automatic Keyphrases Extraction Algorithms : Experimentations on Paintings

Author: Gagliardi, Isabella, Artese, Maria Teresa, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Debruyne, Christophe, editor, Panetto, Hervé, editor, Guédria, Wided, editor, Bollen, Peter, editor, Ciuciu, Ioana, editor, and Meersman, Robert, editor
Published: 2019
Full Text: View/download PDF

59. Automatic back transliteration of Romanized Bengali (Banglish) to Bengali

Author: Shibli, G. M. Shahariar, Shawon, Md. Tanvir Rouf, Nibir, Anik Hassan, Miandad, Md. Zabed, and Mandal, Nibir Chandra
Published: 2023
Full Text: View/download PDF

60. A Comprehensive Analysis of Indian Legal Documents Summarization Techniques

Author: Sharma, Saloni, Srivastava, Surabhi, Verma, Pradeepika, Verma, Anshul, and Chaurasia, Sachchida Nand
Published: 2023
Full Text: View/download PDF

61. Automatically Generating Release Notes with Content Classification Models.

Author: Nath, Sristy Sumana and Roy, Banani
Subjects: LATENT semantic analysis, SOFTWARE maintenance, MACHINE learning, CLASSIFICATION
Abstract: Release notes are admitted as an essential technical document in software maintenance. They summarize the main changes, e.g. bug fixes and new features, that have happened in the software since the previous release. Manually producing release notes is a time-consuming and challenging task. For that reason, sometimes developers neglect to write release notes. For example, we collect data from GitHub with over 1900 releases, and among them, 37% of the release notes are empty. To mitigate this problem, we propose an automatic release notes generation approach by applying the text summarization techniques, i.e. TextRank. To improve the keyword extraction method of traditional TextRank, we integrate the GloVe word embedding technique with TextRank. After generating release notes automatically, we apply machine learning algorithms to classify the release note contents (or sentences). We classify the contents into six categories, e.g. bug fixes and performance improvements, to represent the release notes better for users. We use the evaluation metric, e.g. ROUGE, to evaluate the automatically generated release notes. We also compare the performance of our technique with two popular extractive algorithms, e.g. Luhn's and latent semantic analysis (LSA). Our evaluation results show that the improved TextRank method outperforms the two algorithms. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

62. Automatic Keywords Extraction Based on Co-Occurrence and Semantic Relationships Between Words

Author: Xiangke Mao, Shaobin Huang, Rongsheng Li, and Linshan Shen
Subjects: Automatic keywords extraction, graph model, semantic similarity, TextRank, word co-occurrence, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Automatic keywords extraction is a method that extracts words or phrases from a document which can express the main idea of the document. In this paper, we propose an unsupervised keywords extraction framework for individual documents, which improves the keywords extraction from two aspects. In the step of candidate keywords selection, we use the methods of removing the stopwords, regular matching, and length filtering to reduce the number of candidate keywords, but improve the quality. In the step of scoring words, we use word co-occurrence, semantic relationships (WordNet, Word Embedding, Normalized Google Distance), and three ways to combine word co-occurrence and semantic relationships to measure the weight of edges in the graph model. In experiments, we use Precision, Recall, and F1-measure values as evaluation criteria to compare all keywords extraction methods we proposed with other strong baseline methods in two datasets. According to the results of experiments, methods under our proposed framework achieve good results. We verify that the methods of using both word co-occurrence and semantic relationships have a better effect on keywords extraction than using co-occurrence or semantic relationships only. At the same time, we also find that for the keywords extraction of individual documents, the method of using co-occurrence between words has a better effect than semantic relationships.
Published: 2020
Full Text: View/download PDF

63. An Empirical Study of TextRank for Keyword Extraction

Author: Mingxi Zhang, Xuemin Li, Shuibo Yue, and Liuqian Yang
Subjects: Keyword extraction, Porter Stemmer, TextRank, PageRank, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: As a typical keyword extraction technology, TextRank has been used in a wide variety of commercial applications, including text classification, information retrieval and clustering. In these applications, the parameters of TextRank, including the co-occurrence window size, iteration number and decay factor, are set roughly, which might affect the effectiveness of returned results. In this work, we conduct an empirical study on TextRank, towards finding optimal parameter settings for keyword extraction. The experiments are done in Hulth2003 and Krapivin2009 datasets, which are two real datasets. We first remove the stop word by an open published English stop word list XPO6. And then, we extract the word stems by Porter Stemmer. Porter Stemmer is a tool which can find the stems of words with multiple variants, discard redundant information, strengthen the filtering effect, and extract the effective features of the text fully. We carry out extensive experiments to evaluate the effects of the parameters to keywords extraction, and evaluate the effectiveness of corresponding results by Precision, Recall and Accuracy. Experimental results show that TextRank shows the best performance when setting co-occurrence window size w = 3, iteration number t = 20, decay factor c = 0.9 and rank k = 10 respectively, and the results are independent of the text length.
Published: 2020
Full Text: View/download PDF

64. Building Domain Keywords Using Cognitive Based Sentences Framework

Author: Xu, Zheng, Liu, Weidong, Zhu, Yiwei, Zhang, Shunxiang, Yen, Neil Y., editor, and Hung, Jason C, editor
Published: 2018
Full Text: View/download PDF

65. A Modification to Graph Based Approach for Extraction Based Automatic Text Summarization

Author: Sehgal, Sunchit, Kumar, Badal, Maheshwar, Rampal, Lakshay, Chaliya, Ankit, Kacprzyk, Janusz, Series editor, Pal, Nikhil R., Advisory editor, Bello Perez, Rafael, Advisory editor, Corchado, Emilio S., Advisory editor, Hagras, Hani, Advisory editor, Kóczy, László T., Advisory editor, Kreinovich, Vladik, Advisory editor, Lin, Chin-Teng, Advisory editor, Lu, Jie, Advisory editor, Melin, Patricia, Advisory editor, Nedjah, Nadia, Advisory editor, Nguyen, Ngoc Thanh, Advisory editor, Wang, Jun, Advisory editor, Saeed, Khalid, editor, Chaki, Nabendu, editor, Pati, Bibudhendu, editor, Bakshi, Sambit, editor, and Mohapatra, Durga Prasad, editor
Published: 2018
Full Text: View/download PDF

66. Improvement of TextRank Based on Co-occurrence Word Pairs and Context Information

Author: Wang, Yang, Yin, Hua, He, Minwei, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Pandu Rangan, C., Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, and Qiu, Meikang, editor
Published: 2018
Full Text: View/download PDF

67. Extracting 5W from Baidu Hot News Search Words for Societal Risk Events Analysis

Author: Xu, Nuo, Tang, Xijin, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Meng, Xiaofeng, editor, Li, Ruixuan, editor, Wang, Kanliang, editor, Niu, Baoning, editor, Wang, Xin, editor, and Zhao, Gansen, editor
Published: 2018
Full Text: View/download PDF

68. Most Important First – Keyphrase Scoring for Improved Ranking in Settings With Limited Keyphrases

Author: Witt, Nils, Milz, Tobias, Seifert, Christin, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Soldatova, Larisa, editor, Vanschoren, Joaquin, editor, Papadopoulos, George, editor, and Ceci, Michelangelo, editor
Published: 2018
Full Text: View/download PDF

69. A PageRank-Based Method to Extract Fuzzy Expressions as Features in Supervised Classification Problems

Author: Carmona, Pablo, Castro, Juan Luis, Lozano, Jesús, Suárez, José Ignacio, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Herrera, Francisco, editor, Damas, Sergio, editor, Montes, Rosana, editor, Alonso, Sergio, editor, Cordón, Óscar, editor, González, Antonio, editor, and Troncoso, Alicia, editor
Published: 2018
Full Text: View/download PDF

70. Multimode Summarized Text to Speech Conversion Application

Author: Sehgal, Archit and Khanna, Gitika
Published: 2019

71. GKEEP: An Enhanced Graph‐Based Keyword Extractor With Error‐Feedback Propagation for Geoscience Reports

Author: Qinjun Qiu, Zhong Xie, Hong Xie, and Bin Wang
Subjects: backpropagation, error feedback, geoscience reports, keyword extraction, TextRank, Word2Vec, Astronomy, QB1-991, Geology, QE1-996.5
Abstract: Abstract As the amount of published geoscience literature grows, reading and summarizing texts of large collections has become a challenging task. Publication keywords can be considered basic components of knowledge structure representations and have been used to reveal knowledge concerning research domains. In contrast to data used in other research domains, the works on textual geoscience data that entail keyword extraction are limited. In this paper, we propose an unsupervised algorithm, the graph‐based keyword extractor with error‐feedback propagation (GKEEP), that enhances graph‐based keyword extraction approaches by using an error‐feedback mechanism similar to the concept of backpropagation. The proposed approach comprises the following steps. A preprocessed document is used as the input of the proposed model and is represented as a weighted undirected graph, where the vertices represent words and the edges represent the cooccurrence relationship between the words constrained by a window size. Subsequently, its nodes are ranked by their importance scores calculated by a graph‐based ranking algorithm. Consequently, all the words have their own scores, and they are used to compute the scores of keyword candidates. Subsequently, the Word2Vec method is applied to recalculate the scores of keyword candidates and rank the keyword candidates to select the final keyword. It also utilizes error feedback to boost the rankings of the most salient terms that would otherwise be deemed less important. With empirical experiments on two real data sets (including our newly built data set), the proposed GKEEP model outperforms state‐of‐the‐art unsupervised models and the existing graph‐based ranking models. The proposed method can effectively reflect intrinsic keyword semantics and interrelationships.
Published: 2021
Full Text: View/download PDF

72. Extracting Keywords from Texts based on Word Frequency and Association Features.

Author: Xu, Zhenzhen and Zhang, Junsheng
Subjects: WORD frequency, INFORMATION networks, INFORMATION overload, INFORMATION retrieval, INFORMATION technology
Abstract: With the development of information technology such as mobile Internet and social media applications, network information is growing rapidly and leads to the problem of information overload. Keywords help to filter and find interesting information for users from massive text. Automatic extraction of keywords from text as tags of text help to improve recommendation and keyword-based information retrieval. This paper proposes a novel keyword extraction approach from text that combines features such as word frequency and association. Experiment results show that the precision rate, recall rate and F-measure are all better than those of TextRank and TF-IDF. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

73. GKEEP: An Enhanced Graph‐Based Keyword Extractor With Error‐Feedback Propagation for Geoscience Reports.

Author: Qiu, Qinjun, Xie, Zhong, Xie, Hong, and Wang, Bin
Subjects: *GEOLOGY, *EARTH sciences, *SEMANTIC computing, *KNOWLEDGE representation (Information theory), *UNDIRECTED graphs
Abstract: As the amount of published geoscience literature grows, reading and summarizing texts of large collections has become a challenging task. Publication keywords can be considered basic components of knowledge structure representations and have been used to reveal knowledge concerning research domains. In contrast to data used in other research domains, the works on textual geoscience data that entail keyword extraction are limited. In this paper, we propose an unsupervised algorithm, the graph‐based keyword extractor with error‐feedback propagation (GKEEP), that enhances graph‐based keyword extraction approaches by using an error‐feedback mechanism similar to the concept of backpropagation. The proposed approach comprises the following steps. A preprocessed document is used as the input of the proposed model and is represented as a weighted undirected graph, where the vertices represent words and the edges represent the cooccurrence relationship between the words constrained by a window size. Subsequently, its nodes are ranked by their importance scores calculated by a graph‐based ranking algorithm. Consequently, all the words have their own scores, and they are used to compute the scores of keyword candidates. Subsequently, the Word2Vec method is applied to recalculate the scores of keyword candidates and rank the keyword candidates to select the final keyword. It also utilizes error feedback to boost the rankings of the most salient terms that would otherwise be deemed less important. With empirical experiments on two real data sets (including our newly built data set), the proposed GKEEP model outperforms state‐of‐the‐art unsupervised models and the existing graph‐based ranking models. The proposed method can effectively reflect intrinsic keyword semantics and interrelationships. Plain Language Summary: The common or frequently used terms receive higher scores in traditional graph‐based extraction owing to there are more edges connected to them. This paper proposes a graph‐based KE algorithm called KE using error‐feedback propagation, which utilizes the semantics of word embedding to assist in extracting keywords from geoscience reports. We hope that our approach will serve as an alternative method that deserves further study. Key Points: Word embedding is incorporated to capture the dependency structure as well as the data distribution, and it computes semantic relations to solve the content sparsity problemError feedback is utilized to boost the most salient terms that graph‐based approaches deem less importantA set of experiments to verify the effectiveness of the proposed method on two available manually constructed data sets [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

74. Optimized Focused Web Crawler with Natural Language Processing Based Relevance Measure in Bioinformatics Web Sources

Author: Mani Sekhar S. R., Siddesh G. M., Manvi Sunilkumar S., and Srinivasa K. G.
Subjects: focused crawler, data extraction, natural llanguage processing, topical crawler, textrank, distributed crawler, master-slave architecture, bioinformatics, Cybernetics, Q300-390
Abstract: In the fast growing of digital technologies, crawlers and search engines face unpredictable challenges. Focused web-crawlers are essential for mining the boundless data available on the internet. Web-Crawlers face indeterminate latency problem due to differences in their response time. The proposed work attempts to optimize the designing and implementation of Focused Web-Crawlers using Master-Slave architecture for Bioinformatics web sources. Focused Crawlers ideally should crawl only relevant pages, but the relevance of the page can only be estimated after crawling the genomics pages. A solution for predicting the page relevance, which is based on Natural Language Processing, is proposed in the paper. The frequency of the keywords on the top ranked sentences of the page determines the relevance of the pages within genomics sources. The proposed solution uses a TextRank algorithm to rank the sentences, as well as ensuring the correct classification of Bioinformatics web page. Finally, the model is validated by being compared with a breadth first search web-crawler. The comparison shows significant reduction in run time for the same harvest rate.
Published: 2019
Full Text: View/download PDF

75. Naive Bayesian Automatic Classification of Railway Service Complaint Text Based on Eigenvalue Extraction

Author: Lifeng Li and Wenxing Li
Subjects: automatic classification, eigenvalue, naive Bayes, railway complaint text, TextRank, TF-IDF, Engineering (General). Civil engineering (General), TA1-2040
Abstract: Railways have developed rapidly in China for several decades. The hardware of railways has already reached the world's leading level, but the level of service of these railways still has room for improvement. The railway management department receives a large number of passenger complaints every year and records them in text, which needs to be classified and analyzed. The text of railway complaints includes characteristics spanning wide business coverage, various events, serious colloquialisms, interference and useless information. When using the direct classification via traditional text categorization, the classification accuracy is low. The key to the automatic classification of such text lies in an eigenvalue extraction. The more accurate the eigenvalue extraction, the higher the accuracy of text classification. In this paper, the TF-IDF algorithm, TextRank algorithm and Word2vec algorithm are selected to extract text eigenvalues, and a railway complaint text classification method is constructed with a naive Bayesian classifier. The three types of eigenvalue extraction algorithms are compared. The TF-IDF algorithm, based on eigenvalue extraction, achieves the highest automatic text classification accuracy.
Published: 2019
Full Text: View/download PDF

76. A Graph-Based Ranking Model for Automatic Keyphrases Extraction from Arabic Documents

Author: El BazzI, Mohamed Salim, Mammass, Driss, Zaki, Taher, Ennaji, Abdelatif, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, and Perner, Petra, editor
Published: 2017
Full Text: View/download PDF

77. 融合主题特征的文本自动摘要方法研究.

Author: 罗　芳, 汪竞航, 何道森, and 蒲秋梅
Subjects: *THESIS statements (Rhetoric), *TEXT mining, *PROBABILITY theory, *MATRICES (Mathematics), *TEXT messages
Abstract: Aiming at the traditional graph models for text summarization only focus on statistical features or shallow semantic features, and lack mining and utilization of deep topic semantic features, this paper proposed MDSR( multi-dimension summarization rank), an automatic text summarization method that combined topic feature. Specifically, this method adopted the LDA model to mine the semantic information of text topics and measured the impact of topic feature on a sentence by defining the importance of the topic. And it improved the construction mode of the probability transition matrix of graph model nodes by combining the topic feature with statistic features and inter-sentence similarity. Finally, it extracted and measured summarization according to the weight of sentence nodes. The results show that the ROUGE value evaluates by MDSR reaches the best when the weight ratio of topic feature, statistic feature and inter-sentence similarity is 3 : 4 : 3. The ROUGE-1, ROUGE-2, ROUGE-SU4 are 53. 35%,35. 18% and 33. 86%, which perform better than other comparisons. It shows that the text summarization method combining topic feature can effectively improve the accuracy of the summarization extraction. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

78. FuzzyFeatureRank. Bringing order into fuzzy classifiers through fuzzy expressions.

Author: Carmona, Pablo and Castro, Juan Luis
Subjects: *FEATURE selection, *WEIGHTED graphs, *DIRECTED graphs, *FUZZY sets, *FEATURE extraction
Abstract: This work presents FuzzyFeatureRank, a new feature reduction method inspired on PageRank to reduce the dimensionality of the feature space in supervised classification problems. More precisely, as it relies on a weighted directed graph, it is ultimately inspired on TextRank, a PageRank based method that adds weights to the edges to express the strength of the connections between nodes. The method is based on dividing each original feature used to describe the data into a set of fuzzy predicates and then ranking all of them by their ability to differentiate among classes in the light of the training set. In order to do that, both the information gained by each predicate and their redundancy with other already selected predicates are taken into account. The fuzzy predicates with the best scores can then be used as a reduced input to construct fuzzy classifiers that consider only the preselected predicates to build the antecedents of the fuzzy rules. The novelty of the proposal relies on being an approach halfway between feature selection and feature extraction approaches, being able to improve the discrimination ability of the original features but preserving the interpretability of the new features in the sense that they are fuzzy expressions. The experimental results support the suitability of the proposal. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

79. From web to SMS: A text summarization of Wikipedia pages with character limitation

Author: J.L.E.K Fendji and B.A.H. Aminatou
Subjects: character-limitation summarization, sms, lsa, textrank, rouge, tacos, wikipedia, Technology
Abstract: Wikipedia is one of the main sources of information on the Web. But the access to this content may be difficult especiallywhen using a basic telephone without browsing capability and only a GSM network. The only means of text-basedcommunication remains through SMS. Due to the limitation of the number of characters, a Wikipedia page cannot alwaysbe sent through SMS. This work raises the issue of text summarization with character limitation. To solve this issue, twoextractive approaches have been combined: LSA and TextRank algorithms. Generated summaries have been evaluated usingROUGE metrics. Since ROUGE metrics do not consider character limitation, a new threshold named Threshold ofAcceptability for Character-Oriented Summaries (TACOS) has been proposed to appreciate ROUGE metrics. The evaluationshowed the relevance of the approach for pages of at most 2000 characters. The system has been tested using the SMSsimulator of RapidSMS without a GSM gateway to simulate the deployment in a real environment. To the best of ourknowledge, this is the first work tackling text summarization issue with character limitation.
Published: 2020
Full Text: View/download PDF

80. From web to SMS: A text summarization of Wikipedia pages with character limitation.

Author: Fendji, J. L. E. K. and Aminatou, B. A. H.
Subjects: TEXT messages, WORLD Wide Web, TELEPHONES, COMMUNICATION, INFORMATION & communication technologies
Abstract: Wikipedia is one of the main sources of information on the Web. But the access to this content may be difficult especially when using a basic telephone without browsing capability and only a GSM network. The only means of text-based communication remains through SMS. Due to the limitation of the number of characters, a Wikipedia page cannot always be sent through SMS. This work raises the issue of text summarization with character limitation. To solve this issue, two extractive approaches have been combined: LSA and TextRank algorithms. Generated summaries have been evaluated using ROUGE metrics. Since ROUGE metrics do not consider character limitation, a new threshold named Threshold of Acceptability for Character-Oriented Summaries (TACOS) has been proposed to appreciate ROUGE metrics. The evaluation showed the relevance of the approach for pages of at most 2000 characters. The system has been tested using the SMS simulator of RapidSMS without a GSM gateway to simulate the deployment in a real environment. To the best of our knowledge, this is the first work tackling text summarization issue with character limitation. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

81. Two Improved Topic Word Detection Algorithms.

Author: Yu, Zehao
Subjects: INFORMATION filtering, ALGORITHMS, MACHINE learning, VOCABULARY
Abstract: Topic word extraction is the task of identifying single or multi-word expressions that represent the main topics of a document. In this paper, two improved algorithms for extracting and discovering topic words are proposed in the Rapid Topic word Detection (RTD) Algorithm and CategoryTextRank (CTextRank) Algorithm, which can effectively obtain information by extracting and filtering the topic words in the text. The algorithms overcome the shortcomings of traditional topic words discovering algorithms that require deep linguistic knowledge, domain or language specific annotated corpora. The two algorithms we proposed can process both short and long text. The biggest advantage of the algorithms is that they are unsupervised machine learning algorithms. They need not be trained to process text directly to get topic words. The Accuracy rate, recall rate and F-measure index have been greatly improved when using the two algorithms which show that the results obtained compare favorably with previously published results on datasets Inspec and SemEval. The first algorithm Rapid Topicword Detection improves the metrics compared to PositionRank and TextRank, the second algorithm CategoryTextRank improves the metrics compared to TextRank, SingleRank and TF-IDF. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

82. Implementation of Automatic Text Summarization with TextRank Method in the Development of Al-Qur'an Vocabulary Encyclopedia.

Author: Fakhrezi, Muhamad Fahmi, Bijaksana, Moch. Arif, and Huda, Arief Fatchul
Subjects: VOCABULARY, SEMANTICS
Abstract: Studying the Qur'an by understanding the Qur'an's vocabulary so that understanding its meaning is not easy. Then we need a Qur'anic vocabulary encyclopedia that focuses on explaining the meaning of the words in it. The development of encyclopedias used automatic text summarization with the TextRank method because from one query that is searched there are many meanings of words that must be summarized. This method starts by selecting documents that are relevant to the query, then summarizes the selected documents using the TextRank method so that they get a summary based on all the word meanings, finally testing the summary results by the system by comparing them with summary targets that are constructed manually by humans. The application of the TextRank method for automatic text summarization has an average value of F-Score 0.6173. The results of automatic text summarization using the TextRank method are no duplicates, and for some queries, it is almost the same as the summary results created manually by humans. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

83. Cleveree: an artificially intelligent web service for Jacob voice chatbot.

Author: Octavany and Arya Wicaksana
Subjects: *WEB services, *CHATBOTS, *TECHNOLOGY Acceptance Model, *ARTIFICIAL intelligence, *BUILDING operation management
Abstract: Jacob is a voice chatbot that use Wit.ai to get the context of the question and give an answer based on that context. However, Jacob has no variation in answer and could not recognize the context well if it has not been learned previously by the Wit.ai. Thus, this paper proposes two features of artificial intelligence (AI) built as a web service: the paraphrase of answers using the Stacked Residual LSTM model and the question summarization using Cosine Similarity with pre-trained Word2Vec and TextRank algorithm. These two features are novel designs that are tailored to Jacob, this AI module is called Cleveree. The evaluation of Cleveree is carried out using the technology acceptance model (TAM) method and interview with Jacob admins. The results show that 79.17% of respondents strongly agree that both features are useful and 72.57% of respondents strongly agree that both features are easy to use. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

84. 基于加权 Textrank 的中文自动文本摘要.

Author: 黄波 and 刘传才
Subjects: *LEXICON, *CORPORA, *KEYWORDS, *RESEMBLANCE (Philosophy), *VOCABULARY, *AUTOMOBILE defects
Abstract: The method of Chinese existing automatic text summarization mainly utilizes the text's own information, and its defect is that it cannot make full use of the related semantic information between the words. Therefore, this paper proposed an improved Chinese text summarization method. This method integrated the information of the external corpora into the TextRank algorithm in the form of a word vector. Combined TextRank with word2vec, it mapped each word in the sentence to the high dimensional lexicon to form a sentence vector. This method fully considered the similarity between sentences, the coverage of keywords and the similarity between sentence and title to calculate the influence weights among sentences, and choose the top-ranked sentences used as the summarization of the text. The results of experiment show that this method achieves good results in the data set of this paper, and is more effective than the original method in extracting Chinese summarization automatically. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

85. Automatic document summarization based on statistical information

Author: A. Mussina, S. Aubakirov, D. Ahmed-Zaki, and P. Trigo
Subjects: summarization, automatic extraction, key-words, n-gram, textrank, Mechanical engineering and machinery, TJ1-1570, Electronic computers. Computer science, QA75.5-76.95
Abstract: Actual problem in nowadays is to efficiently process the large amount of data that pass through our mind everyday. The object of study of this paper is automatic summarization algorithms. The main goal is to implement and make comparison of different summarization techniques on corpora of news articles parsed from the web. This research work contains the description of three summarization techniques based on TextRank algorithm: General TextRank, BM25, LongestCommonSubstring. It is specially noted the languages of used corpora: Russian and Kazakh languages. The results of summarization processes and their comparison are provided. It should be emphasized that used algorithms are well-known, but the way of their evaluation on defined corpora is different from those which usually used in summary evaluation. The method of summary evaluation proposed use the special dictionary of extracted key-words on the topic of corpora. As the title implies the article describes applying statistical information. The semantic and syntactic features of text are not examined.
Published: 2017

86. Towards Evaluating the Impact of Anaphora Resolution on Text Summarisation from a Human Perspective

Author: Bayomi, Mostafa, Levacher, Killian, Ghorab, M. Rami, Lavin, Peter, O’Connor, Alexander, Lawless, Séamus, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Métais, Elisabeth, editor, Meziane, Farid, editor, Saraee, Mohamad, editor, Sugumaran, Vijayan, editor, and Vadera, Sunil, editor
Published: 2016
Full Text: View/download PDF

87. Автоматизована система каталогізації та класифікації електронних видань у бібліотеці

Author: Ульяницька, Ксенія Олександрівна
Subjects: каталогізація, електронні видання, класифікація, TextRank, 025 .3/.4 [004], автоматизована система, наївний Баєвський класифікатор
Abstract: Структура та обсяг роботи. Пояснювальна записка дипломного проєкту складається з 5 розділів, містить 16 рисунків, 12 таблиць, 1 додаток, 4 кресленика та посилання на 15 літературних джерел. Метою дипломного проєкту є створення автоматизованої системи каталогізації та класифікації, що дозволить бібліотекарям ефективно керувати електронними виданнями бібліотеки, забезпечувати зручний доступ до цих ресурсів та зменшувати час та зусилля, які необхідні для їх управління. У розділі загальні положення описано бізнес-процеси бібліотеки та процес діяльності, визначено варіанти використання та відповідні функціональні вимоги, а також визначено переваги та недоліки існуючих аналогів системи та відмінності їх від системи, що розробляється. Розділ інформаційного забезпечення присвячено опису вхідних та вихідних даних системи, проектуванню бази даних. У розділі математичного забезпечення обґрунтовано вибір методів розв’язання задачі та описано відповідні алгоритми. Розділ програмного та технічного забезпечення присвячено опису засобів розробки автоматизованої системи та архітектури проєкту. У технологічному розділі визначено та описано випробування автоматизованої системи та наведено їх результати. Розроблена автоматизована система каталогізації та класифікації електронних видань може використовуватись як прототип для інтеграції в існуючі бібліотечні системи управління. Structure and scope of the work. The explanatory report of the diploma project consists of 5 chapters, 16 figures, 12 tables, 1 application, 4 drawings and references to 15 literary sources. The aim of the diploma project is to create an automated cataloguing and classification system that will allow librarians to effectively manage the library's electronic publications, provide convenient access to these resources, and reduce the time and effort required to manage them. The general framework chapter describes the library's business processes and activities, identifies the options for use and the corresponding functional requirements, and identifies the advantages and disadvantages of existing analogues of the system and their differences from the system under development. The chapter on information support is devoted to the description of input and output data of the system, and the design of the database. The chapter on mathematical support substantiates the choice of methods for solving the problem and describes the relevant algorithms. The software and hardware chapter describes the development tools for the automated system and the project architecture. The technological chapter identifies and describes the tests of the automated system and presents their results. The developed automated system for cataloguing and classifying electronic publications can be used as a prototype for integration into existing library management systems.
Published: 2023

88. Extractive Article Summarization Using Integrated TextRank and BM25+ Algorithm

Author: Vaibhav Gulati, Deepika Kumar, Daniela Elena Popescu, and Jude D. Hemanth
Subjects: Computer Networks and Communications, Hardware and Architecture, Control and Systems Engineering, article, text, summarization, extractive, graph-based, BM25, ROUGE, TextRank, Signal Processing, Electrical and Electronic Engineering
Abstract: The quantity of textual data on the internet is growing exponentially, and it is very tough task to obtain important and relevant information from it. An efficient and effective method is required that provides a concise summary of an article. This can be achieved by the usage of automatic text summarization. In this research, the authors suggested an efficient approach for text summarization where an extractive summary is generated from an article. The methodology was modified by integrating a normalized similarity matrix of both BM25+ and conventional TextRank algorithm, which resulted in the improvised results. A graph is generated by taking the sentences in the article as nodes and edge weights as the similarity score between two sentences. The maximum rank nodes are selected, and the summary is extracted. Empirical evaluation of the proposed methodology was analyzed and compared with baseline methods viz. the conventional TextRank algorithm, term frequency–inverse document frequency (TF–IDF) cosine, longest common consequence (LCS), and BM25+ by taking precision, recall, and F1 score as evaluation criteria. ROUGE-1, ROUGE-2, and ROUGE-L scores were calculated for all the methods. The outcomes demonstrate that the proposed method can efficiently summarize any article irrespective of the category it belongs to.
Published: 2023
Full Text: View/download PDF

89. A Framework for Generating Extractive Summary from Multiple Malayalam Documents

Author: K. Manju, S. David Peter, and Sumam Mary Idicula
Subjects: Malayalam language, extractive mutidocument summarization, NLP, sentence encoding, TextRank, maximum marginal relevance, Information technology, T58.5-58.64
Abstract: Automatic extractive text summarization retrieves a subset of data that represents most notable sentences in the entire document. In the era of digital explosion, which is mostly unstructured textual data, there is a demand for users to understand the huge amount of text in a short time; this demands the need for an automatic text summarizer. From summaries, the users get the idea of the entire content of the document and can decide whether to read the entire document or not. This work mainly focuses on generating a summary from multiple news documents. In this case, the summary helps to reduce the redundant news from the different newspapers. A multi-document summary is more challenging than a single-document summary since it has to solve the problem of overlapping information among sentences from different documents. Extractive text summarization yields the sensitive part of the document by neglecting the irrelevant and redundant sentences. In this paper, we propose a framework for extracting a summary from multiple documents in the Malayalam Language. Also, since the multi-document summarization data set is sparse, methods based on deep learning are difficult to apply. The proposed work discusses the performance of existing standard algorithms in multi-document summarization of the Malayalam Language. We propose a sentence extraction algorithm that selects the top ranked sentences with maximum diversity. The system is found to perform well in terms of precision, recall, and F-measure on multiple input documents.
Published: 2021
Full Text: View/download PDF

90. Improved Automatic Keyword Extraction Based on TextRank Using Domain Knowledge

Author: Li, Guangyi, Wang, Houfeng, Junqueira Barbosa, Simone Diniz, Series editor, Chen, Phoebe, Series editor, Cuzzocrea, Alfredo, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kara, Orhun, Series editor, Kotenko, Igor, Series editor, Sivalingam, Krishna M., Series editor, Ślęzak, Dominik, Series editor, Washio, Takashi, Series editor, Yang, Xiaokang, Series editor, Zong, Chengqing, editor, Nie, Jian-Yun, editor, Zhao, Dongyan, editor, and Feng, Yansong, editor
Published: 2014
Full Text: View/download PDF

91. A Hybrid Method of Sentiment Key Sentence Identification Using Lexical Semantics and Syntactic Dependencies

Author: Feng, Chong, Liao, Chun, Liu, Zhirun, Huang, Heyan, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Han, Weihong, editor, Huang, Zi, editor, Hu, Changjun, editor, Zhang, Hongli, editor, and Guo, Li, editor
Published: 2014
Full Text: View/download PDF

92. Key word extraction for short text via word2vec, doc2vec, and textrank.

Author: Jun LI, Guimin HUANG, Chunli FAN, Zhenglin SUN, and Hongtao ZHU
Subjects: *KEYWORDS, *SOCIAL values, *SOCIAL media
Abstract: The rapid development of social media encourages people to share their opinions and feelings on the Internet. Every day, a large number of short text comments are generated through Twitter, microblogging, WeChat, etc., and there is high commercial and social value in extracting useful information from these short texts. At present, most studies have focused on extracting text key words. For example, the LDA topic model has good performance with long texts, but it loses effectiveness with short texts because of the noise and sparsity problems. In this paper, we attempt to use Word2Vec and Doc2Vec to improve short-text key word extraction. We first added the method of the collaborative training of word vectors and paragraph vectors and then used the TextRank model's clustering nodes. We adjusted the weights of the key words that were generated by computing the jump probability between nodes and then obtained the node-weighted score, and eventually sorted the generated key words. The experimental results show that the improved method has good performance on the datasets. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

93. 基于词向量与TextRank的关键词提取方法.

Author: 周锦章 and 崔晓晖
Subjects: *TRANSFER matrix, *KEYWORDS, *SEMANTICS, *ALGORITHMS, *PROBABILITY theory, *CORPORA
Abstract: This paper studied the influence of lexical semantic difference on TextRank algorithm. and presented a keyword extraction method based on word vector and TextRank. Firstly.it used FastText to represent word vector from the document corpus. Then,based on the idea of implicit subject distribution and used the differences in lexical semantics to build a probability transfer matrix for TextRank. Finally.it iteratively calculated the lexical graph model and extracted keywords. Experimental results show that the extraction performance of this method is significantly improved compared with the traditional method. In addition, it is proved that the use of word vectors can improve the performance of TextRank algorithm simply and effectively. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

94. 基于TextRank的自动摘要优化算法.

Author: 李娜娜, 刘培玉, 刘文锋, and 刘伟童
Subjects: *REDUNDANCY in engineering, *CRIMINAL sentencing, *PARAGRAPHS, *RESEMBLANCE (Philosophy), *HAND, *MULTICASTING (Computer networks)
Abstract: When abstracting Chinese texts, the traditional TextRank algorithm only considers the similarity between nodes and neglects other important information of the text. Firstly, aiming at Chinese single document, on the basis of existing research, this paper used TextRank algorithm,on the one hand, it considered the similarities between sentences, on the other hand, Text-Rank was combined with the overall structural information of texts and the contextual information of sentences, such as the physical position of the document sentences or paragraph, feature sentences, core sentences and other sentences that might increase the weight of the sentence, all were used to generate the digest candidate sentence group of the text. And then, removing high-similarity sentences by redundancy processing technology on the digest candidate sentence group. Finally, the experimental verification shows that the algorithm can improve the accuracy of the generated digest, indicating the effectiveness of the algorithm. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

95. A Novel Framework for Automatic Chinese Question Generation Based on Multi-Feature Neural Network Model.

Author: Hai-Tao Zheng, Jinxin Han, Jinyuan Chen, and Sangaiah, Arun Kumar
Abstract: Automatic question generation from text or paragraph is a great challenging task which attracts broad attention in natural language processing. Because of the verbose texts and fragile ranking methods, the quality of top generated questions is poor. In this paper, we present a novel framework Automatic Chinese Question Generation (ACQG) to generate questions from text or paragraph. In ACQG, we use an adopted TextRank to extract key sentences and a template-based method to construct questions from key sentences. Then a multi-feature neural network model is built for ranking to obtain the top questions. The automatic evaluation result reveals that the proposed framework outperforms the state-of-the-art systems in terms of perplexity. In human evaluation, questions generated by ACQG rate a higher score. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

96. Knowledge aggregation of the WeChat Official Accounts Platform based on tag clustering

Author: Cheng, Zixuan, Zhang, Xiangxian, Lu, Heng, and Guo, Shunli
Published: 2021
Full Text: View/download PDF

97. Unsupervised Extraction of Keywords from News Archives

Author: Palomino, Marco A., Wuytack, Tom, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, and Vetulani, Zygmunt, editor
Published: 2011
Full Text: View/download PDF

98. Graphs, Computation, and Language

Author: Dmitry Ustalov
Subjects: PageRank, Semantic Networks, Classification Evaluation, Co-occurrence Graphs, ComputingMilieux_COMPUTERSANDEDUCATION, Centrality, Knowledge Graphs, Evaluation, reCAPTCHA, Clustering Evaluation, Ontology, Spectral Clustering, TransE, Bradley-Terry, Resource Description Framework, Graph Convolutional Network, OntoLearn, Dawid-Skene, Crowdsourcing, Graph Attention Network, Games with a Purpose, Wisdom of the Crowds, clustering, node2vec, Quality Control, Statistical Testing, Graph Clustering, MaxMax, Inter-Rater Agreement, Chinese Whispers, Language Graphs, Sparse Matrix, Representation Learning, Microtasks, Markov Clustering, RDF, Laplacian Eigenmaps, Tutorial, Translating Embeddings, Word2Vec, TextRank, GraphSAGE, Natural Language Processing, Semantic Web, Taxonomy, Network Science, lecture, Louvain, Quality Assessment, Ranking Evaluation, Poincare Embeddings, language resources, Hearst Patterns, Graph Theory, Linked Data, Matrix Representations, Graph Embeddings, Watset, DeepWalk, Power Method, Wikipedia
Abstract: Employing the properties of linguistic networks allows discovering structure and making predictions. This course seeks answers to three questions: (1) how to express the linguistic phenomena as graphs, (2) how to gain knowledge based on them, and (3) how to assess the quality of this knowledge. We will start with traditional graph-based Natural Language Processing (NLP) methods like TextRank and Markov Clustering and finish with such contemporary Machine Learning techniques as DeepWalk and Graph Convolutional Networks. As the growing interest in NLP methods urges their meaningful evaluation, we pay special attention to quality assessment and human judgements. The course has five lectures on Language Graphs, Graph Clustering, Graph Embeddings, Knowledge Graphs, and Evaluation. They elaborately go through the essential algorithms step-by-step, discuss case studies, and suggest insightful references and datasets. The target audience is undergraduate and graduate students, data analysts, and interdisciplinary researchers (but it is not limited to them). The course was held in person in August 2022 at the 33rd European Summer School in Logic, Language and Information (ESSLLI 2022) in Galway, Ireland: https://2022.esslli.eu/courses-workshops-accepted/week-1-and-2-schedule.html., {"references": ["Agirre, E., L\u00f3pez de Lacalle, O., Soroa, A.: Random Walks for Knowledge-Based Word Sense Disambiguation. Computational Linguistics. 40, 57\u201384 (2014). https://doi.org/10.1162/COLI_a_00164", "von Ahn, L., Dabbish, L.: Labeling Images with a Computer Game. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. pp. 319\u2013326. ACM, Vienna, Austria (2004). https://doi.org/10.1145/985692.985733", "Ali, M., Berrendorf, M., Hoyt, C.T., Vermue, L., Sharifzadeh, S., Tresp, V., Lehmann, J.: PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings. Journal of Machine Learning Research. 22, 1\u20136 (2021)", "Alonso, O., Rose, D.E., Stewart, B.: Crowdsourcing for Relevance Evaluation. SIGIR Forum. 42, 9\u201315 (2008). https://doi.org/10.1145/1480506.1480508", "Ardila, R., Branson, M., Davis, K., Henretty, M., Kohler, M., Meyer, J., Morais, R., Saunders, L., Tyers, F.M., Weber, G.: Common Voice: A Massively-Multilingual Speech Corpus. In: Proceedings of The 12th Language Resources and Evaluation Conference. pp. 4218\u20134222. European Language Resources Association (ELRA), Marseille, France (2020)", "Artstein, R., Poesio, M.: Inter-Coder Agreement for Computational Linguistics. Computational Linguistics. 34, 555\u2013596 (2008). https://doi.org/10.1162/coli.07-034-R2", "Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A Nucleus for a Web of Open Data. In: The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11\u201315, 2007. Proceedings. pp. 722\u2013735. Springer Berlin Heidelberg, Berlin; Heidelberg, Germany (2007). https://doi.org/10.1007/978-3-540-76298-0_52", "Azadani, M.N., Ghadiri, N., Davoodijam, E.: Graph-based biomedical text summarization: An itemset mining and sentence clustering approach. Journal of Biomedical Informatics. 84, 42\u201358 (2018). https://doi.org/10.1016/j.jbi.2018.06.005", "Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet Project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1. pp. 86\u201390. Association for Computational Linguistics, Montr\u00e9al, QC, Canada (1998). https://doi.org/10.3115/980845.980860", "Barab\u00e1si, A.-L., Albert, R.: Emergence of Scaling in Random Networks. Science. 286, 509\u2013512 (1999). https://doi.org/10.1126/science.286.5439.509", "Bartunov, S., Kondrashkin, D., Osokin, A., Vetrov, D.: Breaking Sticks and Ambiguities with Adaptive Skip-gram. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics. pp. 130\u2013138. PMLR, Cadiz, Spain (2016)", "Bavelas, A.: Communication Patterns in Task-Oriented Groups. The Journal of the Acoustical Society of America. 22, 725\u2013730 (1950). https://doi.org/10.1121/1.1906679", "Belkin, M., Niyogi, P.: Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation. 15, 1373\u20131396 (2003). https://doi.org/10.1162/089976603321780317", "Biemann, C., Riedl, M.: Text: now in 2D! A framework for lexical expansion with contextual similarity. Journal of Language Modelling. 1, 55\u201395 (2013). https://doi.org/10.15398/jlm.v1i1.60", "Biemann, C.: Chinese Whispers: An Efficient Graph Clustering Algorithm and Its Application to Natural Language Processing Problems. In: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing. pp. 73\u201380. Association for Computational Linguistics, New York, NY, USA (2006). https://doi.org/10.3115/1654758.1654774", "Biemann, C.: Creating a system for lexical substitutions from scratch using crowdsourcing. Language Resources and Evaluation. 47, 97\u2013122 (2013). https://doi.org/10.1007/s10579-012-9180-5", "Biemann, C.: Structure Discovery in Natural Language. Springer Berlin Heidelberg (2012). https://doi.org/10.1007/978-3-642-25923-4", "Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O'Reilly Media (2017)", "Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment. 2008, P10008 (2008). https://doi.org/10.1088/1742-5468/2008/10/P10008", "Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics. 5, 135\u2013146 (2017). https://doi.org/10.1162/tacl_a_00051", "Bonacich, P.: Power and Centrality: A Family of Measures. American Journal of Sociology. 92, 1170\u20131182 (1987). https://doi.org/10.1086/228631", "Bonnabel, S.: Stochastic Gradient Descent on Riemannian Manifolds. IEEE Transactions on Automatic Control. 58, 2217\u20132229 (2013). https://doi.org/10.1109/TAC.2013.2254619", "Bordea, G., Lefever, E., Buitelaar, P.: SemEval-2016 Task 13: Taxonomy Extraction Evaluation (TExEval-2). In: Proceedings of the 10th International Workshop on Semantic Evaluation. pp. 1081\u20131091. Association for Computational Linguistics, San Diego, CA, USA (2016). https://doi.org/10.18653/v1/S16-1168", "Bordes, A., Chopra, S., Weston, J.: Question Answering with Subgraph Embeddings. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. pp. 615\u2013620. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/D14-1067", "Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating Embeddings for Modeling Multi-relational Data. In: Advances in Neural Information Processing Systems 26. pp. 2787\u20132795. Curran Associates, Inc., Lake Tahoe, NV, USA (2013)", "Boudin, F.: A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing. pp. 834\u2013838. Asian Federation of Natural Language Processing, Nagoya, Japan (2013)", "Brandes, U.: On variants of shortest-path betweenness centrality and their generic computation. Social Networks. 30, 136\u2013145 (2008). https://doi.org/10.1016/j.socnet.2007.11.001", "Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems. 30, 107\u2013117 (1998). https://doi.org/10.1016/S0169-7552(98)00110-X", "Brody, S., Alon, U., Yahav, E.: How Attentive are Graph Attention Networks? In: 10th International Conference on Learning Representations. OpenReview.net, Virtual (2022)", "Buckley, C., Voorhees, E.M.: Evaluating Evaluation Measure Stability. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 33\u201340. Association for Computing Machinery, Athens, Greece (2000). https://doi.org/10.1145/345508.345543", "Cai, H., Zheng, V.W., Chen-Chuan Chang, K.: A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications. IEEE Transactions on Knowledge and Data Engineering. 30, 1616\u20131637 (2018). https://doi.org/10.1109/TKDE.2018.2807452", "Callison-Burch, C.: Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon's Mechanical Turk. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. pp. 286\u2013295. Association for Computational Linguistics; Asian Federation of Natural Language Processing, Singapore (2009). https://doi.org/10.3115/1699510.1699548", "Camacho-Collados, J., Delli Bovi, C., Espinosa-Anke, L., Oramas, S., Pasini, T., Santus, E., Shwartz, V., Navigli, R., Saggion, H.: SemEval-2018 Task 9: Hypernym Discovery. In: Proceedings of The 12th International Workshop on Semantic Evaluation. pp. 712\u2013724. Association for Computational Linguistics, New Orleans, LA, USA (2018). https://doi.org/10.18653/v1/S18-1115", "Chang, J., Boyd-Graber, J., Gerrish, S., Wang, C., Blei, D.M.: Reading Tea Leaves: How Humans Interpret Topic Models. In: Advances in Neural Information Processing Systems 22. pp. 288\u2013296. Curran Associates, Inc., Vancouver, BC, Canada (2009)", "Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P.: Expected Reciprocal Rank for Graded Relevance. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. pp. 621\u2013630. Association for Computing Machinery, Hong Kong, China (2009). https://doi.org/10.1145/1645953.1646033", "Chen, D., Lin, Y., Li, W., Li, P., Zhou, J., Sun, X.: Measuring and Relieving the Over-Smoothing Problem for Graph Neural Networks from the Topological View. Proceedings of the AAAI Conference on Artificial Intelligence. 34, 3438\u20133445 (2020). https://doi.org/10.1609/aaai.v34i04.5747", "Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 21, 6 (2020). https://doi.org/10.1186/s12864-019-6413-7", "Cimiano, P., Chiarcos, C., McCrae, J.P., Gracia, J.: Linguistic Linked Data: Representation, Generation and Applications. Springer International Publishing (2020). https://doi.org/10.1007/978-3-030-30225-2", "Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press (2022)", "Cs\u00e1rdi, G., Nepusz, T.: The igraph software package for complex network research. InterJournal Complex Systems. 1695, 1\u20139 (2006)", "Dacrema, M.F., Cremonesi, P., Jannach, D.: Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches. In: Proceedings of the 13th ACM Conference on Recommender Systems. pp. 101\u2013109. Association for Computing Machinery, Copenhagen, Denmark (2019). https://doi.org/10.1145/3298689.3347058", "Davis, J., Goadrich, M.: The Relationship between Precision-Recall and ROC Curves. In: Proceedings of the 23rd International Conference on Machine Learning. pp. 233\u2013240. Association for Computing Machinery, Pittsburgh, PA, USA (2006). https://doi.org/10.1145/1143844.1143874", "Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). pp. 4171\u20134186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/N19-1423", "Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik. 1, 269\u2013271 (1959). https://doi.org/10.1007/BF01386390", "Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 601\u2013610. Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/2623330.2623623", "van Dongen, S.: Graph Clustering by Flow Simulation, (2000)", "Dorogovtsev, S.N., Mendes, J.F.F.: Language as an evolving word web. Proceedings of the Royal Society of London B: Biological Sciences. 268, 2603\u20132606 (2001). https://doi.org/10.1098/rspb.2001.1824", "Dorogovtsev, S.N., Mendes, J.F.F.: The Nature of Complex Networks. Oxford University Press, Oxford, UK (2022)", "Dorow, B., Widdows, D.: Discovering Corpus-Specific Word Senses. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics - Volume 2. pp. 79\u201382. Association for Computational Linguistics, Budapest, Hungary (2003). https://doi.org/10.3115/1067737.1067753", "Dror, R., Baumer, G., Shlomov, S., Reichart, R.: The Hitchhiker's Guide to Testing Statistical Significance in Natural Language Processing. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1383\u20131392. Association for Computational Linguistics, Melbourne, VIC, Australia (2018). https://doi.org/10.18653/v1/P18-1128", "Estell\u00e9s-Arolas, E., Gonz\u00e1lez-Ladr\u00f3n-de-Guevara, F.: Towards an integrated crowdsourcing definition. Journal of Information Science. 38, 189\u2013200 (2012). https://doi.org/10.1177/0165551512437638", "Faralli, S., Panchenko, A., Biemann, C., Ponzetto, S.P.: Linked Disambiguated Distributional Semantic Networks. In: The Semantic Web \u2013 ISWC 2016, 15th International Semantic Web Conference, Kobe, Japan, October 17\u201321, 2016, Proceedings, Part II. pp. 56\u201364. Springer International Publishing, Cham, Switzerland (2016). https://doi.org/10.1007/978-3-319-46547-0_7", "Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E., Smith, N.A.: Retrofitting Word Vectors to Semantic Lexicons. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 1606\u20131615. Association for Computational Linguistics, Denver, CO, USA (2015). https://doi.org/10.3115/v1/N15-1184", "Fellbaum, C.: WordNet: An Electronic Database. MIT Press, Massachusetts, MA, USA (1998). https://doi.org/10.7551/mitpress/7287.001.0001", "Fey, M., Lenssen, J.E.: Fast Graph Representation Learning with PyTorch Geometric. In: ICLR 2019 Workshop on Representation Learning on Graphs and Manifolds, New Orleans, LA, USA (2019)", "Fillmore, C.J.: Frame Semantics. In: Linguistics in the Morning Calm. pp. 111\u2013137. Hanshin Publishing Co., Seoul, South Korea (1982)", "Florescu, C., Caragea, C.: PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1105\u20131115. Association for Computational Linguistics, Vancouver, BC, Canada (2017). https://doi.org/10.18653/v1/P17-1102", "Fortunato, S.: Community detection in graphs. Physics Reports. 486, 75\u2013174 (2010). https://doi.org/10.1016/j.physrep.2009.11.002", "Fowlkes, E.B., Mallows, C.L.: A Method for Comparing Two Hierarchical Clusterings. Journal of the American Statistical Association. 78, 553\u2013569 (1983). https://doi.org/10.1080/01621459.1983.10478008", "Freeman, L.C.: A Set of Measures of Centrality Based on Betweenness. Sociometry. 40, 35\u201341 (1977). https://doi.org/10.2307/3033543", "Frey, B.J., Dueck, D.: Clustering by Passing Messages Between Data Points. Science. 315, 972\u2013976 (2007). https://doi.org/10.1126/science.1136800", "Fu, R., Guo, J., Qin, B., Che, W., Wang, H., Liu, T.: Learning Semantic Hierarchies via Word Embeddings. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics Volume 1: Long Papers. pp. 1199\u20131209. Association for Computational Linguistics, Baltimore, MD, USA (2014). https://doi.org/10.3115/v1/P14-1113", "Gallardo, P.F.: Google's secret and Linear Algebra. EMS Newsletter. 63, 10\u201315 (2007)", "Goldhahn, D., Eckart, T., Quasthoff, U.: Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. In: Proceedings of the Eight International Conference on Language Resources and Evaluation. pp. 759\u2013765. European Language Resources Association (ELRA), Istanbul, Turkey (2012)", "Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: Graph Processing in a Distributed Dataflow Framework. In: 11th USENIX Symposium on Operating Systems Design and Implementation. pp. 599\u2013613. USENIX Association, Broomfield, CO, USA (2014)", "Good, B.H., de Montjoye, Y.-A., Clauset, A.: Performance of modularity maximization in practical contexts. Physical Review E. 81, 046106 (2010). https://doi.org/10.1103/PhysRevE.81.046106", "Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge, MA, USA (2016)", "Gorodkin, J.: Comparing two K-category assignments by a K-category correlation coefficient. Computational Biology and Chemistry. 28, 367\u2013374 (2004). https://doi.org/10.1016/j.compbiolchem.2004.09.006", "Goyal, P., Ferrara, E.: Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems. 151, 78\u201394 (2018). https://doi.org/10.1016/j.knosys.2018.03.022", "Grover, A., Leskovec, J.: node2vec: Scalable Feature Learning for Networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 855\u2013864. ACM, San Francisco, CA, USA (2016). https://doi.org/10.1145/2939672.2939754", "G\u00f6sgens, M., Tikhonov, A., Prokhorenkova, L.: Systematic Analysis of Cluster Similarity Indices: How to Validate Validation Measures. In: Proceedings of the 38th International Conference on Machine Learning. pp. 3799\u20133808. PMLR, Online (2021)", "G\u00f6sgens, M., Zhiyanov, A., Tikhonov, A., Prokhorenkova, L.: Good Classification Measures and How to Find Them. In: Advances in Neural Information Processing Systems 34. pp. 17136\u201317147. Curran Associates, Inc., Online (2021)", "Hagberg, A.A., Schult, D.A., Swart, P.J.: Exploring Network Structure, Dynamics, and Function using NetworkX. In: Proceedings of the 7th Python in Science Conference. pp. 11\u201315, Pasadena, CA, USA (2008)", "Hamilton, W.L., Ying, R., Leskovec, J.: Inductive Representation Learning on Large Graphs. In: Advances in Neural Information Processing Systems 30. pp. 1024\u20131034. Curran Associates, Inc., Vancouver, BC, Canada (2017)", "Han, X., Cao, S., Lv, X., Lin, Y., Liu, Z., Sun, M., Li, J.: OpenKE: An Open Toolkit for Knowledge Embedding. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. pp. 139\u2013144. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/D18-2024", "Hansen, P.C.: The truncatedSVD as a method for regularization. BIT Numerical Mathematics. 27, 534\u2013553 (1987). https://doi.org/10.1007/BF01937276", "Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics). 28, 100\u2013108 (1979). https://doi.org/10.2307/2346830", "Hearst, M.A.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proceedings of the 14th Conference on Computational Linguistics - Volume 2. pp. 539\u2013545. Association for Computational Linguistics, Nantes, France (1992). https://doi.org/10.3115/992133.992154", "Heo, Y.-J., Kim, E.-S., Choi, W.S., Zhang, B.-T.: Hypergraph Transformer: Weakly-Supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 373\u2013390. Association for Computational Linguistics, Dublin, Ireland (2022). https://doi.org/10.18653/v1/2022.acl-long.29", "Hitzler, P.: A Review of the Semantic Web Field. Communications of the ACM. 64, 76\u201383 (2021). https://doi.org/10.1145/3397512", "Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory. Neural Computation. 9, 1735\u20131780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735", "Hogan, A., Blomqvist, E., Cochez, M., D'amato, C., Melo, G.D., Gutierrez, C., Kirrane, S., Gayo, J.E.L., Navigli, R., Neumaier, S., Ngomo, A.-C.N., Polleres, A., Rashid, S.M., Rula, A., Schmelzeisen, L., Sequeda, J., Staab, S., Zimmermann, A.: Knowledge Graphs. ACM Computing Surveys. 54, 1\u201337 (2021). https://doi.org/10.1145/3447772", "Hope, D., Keller, B.: MaxMax: A Graph-Based Soft Clustering Algorithm Applied to Word Sense Induction. In: Computational Linguistics and Intelligent Text Processing, 14th International Conference, CICLing 2013, Samos, Greece, March 24-30, 2013, Proceedings, Part I. pp. 368\u2013381. Springer Berlin Heidelberg, Berlin; Heidelberg, Germany (2013). https://doi.org/10.1007/978-3
Published: 2022
Full Text: View/download PDF

99. Cultural Heritage Storytelling, Engagement and Management in the Era of Big Data and the Semantic Web.

Author: Dimoulas, Charalampos and Dimoulas, Charalampos
Subjects: Film, TV & radio, 3D content, 3D modeling, 3D reconstruction, BERT, Évora, Greek NLP, Greek literature, IEEE 830 standard, Instagram, Katharevousa, POCITYF, TextRank, Transformers, Twitter, UNESCO, audience engagement, audiovisual heritage, authoring tools, big data, biocultural heritage, content crowdsourcing, cultural heritage, data center, data-driven storytelling, deep neural networks, digital marketing, digital narrative, digital storytelling, distant supervision, eco-friendly, energy transition, environmental communication, event detection, evolution analytics, green culture, green hosting, green websites, heritage communication, heritage management, intangible heritage, interactive documentary, journalism, literary fiction, marine heritage, marine protected areas of outstanding universal value, media users' engagement, metadata extraction, multimedia tools, n/a, news, relation extraction, requirements engineering, semantic analysis, semantic audio, semantic indexing, smart cities, social media, software sustainability, soundscapes, spectral clustering, static analysis, sustainability, text classification
Abstract: Summary: The current Special Issue launched with the aim of further enlightening important CH areas, inviting researchers to submit original/featured multidisciplinary research works related to heritage crowdsourcing, documentation, management, authoring, storytelling, and dissemination. Audience engagement is considered very important at both sites of the CH production-consumption chain (i.e., push and pull ends). At the same time, sustainability factors are placed at the center of the envisioned analysis. A total of eleven (11) contributions were finally published within this Special Issue, enlightening various aspects of contemporary heritage strategies placed in today's ubiquitous society. The finally published papers are related but not limited to the following multidisciplinary topics:Digital storytelling for cultural heritage;Audience engagement in cultural heritage;Sustainability impact indicators of cultural heritage;Cultural heritage digitization, organization, and management;Collaborative cultural heritage archiving, dissemination, and management;Cultural heritage communication and education for sustainable development;Semantic services of cultural heritage;Big data of cultural heritage;Smart systems for Historical cities - smart cities;Smart systems for cultural heritage sustainability.

100. -种基于 TextRank 的单文本关键字提取算法.

Author: 柳林青, 余瀚, 费宁, and 陈春玲
Abstract: As a classical key-word extracting and abstraction auto-generating algorithm, Text Rank considered the text as a group of terms, and sought a latent semantic relationship between terms according to iteratively calculating the weights of the terms in the nodes graph. Based on the nodes graph model of TextRank, combined node graph and Markov state transform model, weighted the edge between nodes with conditional probability, proposed a new nodes graph model and corresponding algorithm TextRank_Revised(TR-R). According to the verification on labeled and unlabeled samples, it shows that without promotion of time complexity, the new algorithm can get a key-word sorting consequence which is closer to the manual than the original algorithm from the single text. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

189 results on '"textrank"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources