11 results on '"TextRank"'
Search Results
2. Biomedical Text Summarization: A Graph-Based Ranking Approach
- Author
-
Gupta, Supriya, Sharaff, Aakanksha, Nagwani, Naresh Kumar, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Iyer, Brijesh, editor, Ghosh, Debashis, editor, and Balas, Valentina Emilia, editor
- Published
- 2022
- Full Text
- View/download PDF
3. Hybrid Recommendation System for Scientific Literature
- Author
-
Amara, Indraneel, Sai Pranav, K, Mamatha, H. R., Xhafa, Fatos, Series Editor, Hemanth, Jude, editor, Bestak, Robert, editor, and Chen, Joy Iong-Zong, editor
- Published
- 2021
- Full Text
- View/download PDF
4. A Framework for Generating Extractive Summary from Multiple Malayalam Documents
- Author
-
K. Manju, S. David Peter, and Sumam Mary Idicula
- Subjects
Malayalam language ,extractive mutidocument summarization ,NLP ,sentence encoding ,TextRank ,maximum marginal relevance ,Information technology ,T58.5-58.64 - Abstract
Automatic extractive text summarization retrieves a subset of data that represents most notable sentences in the entire document. In the era of digital explosion, which is mostly unstructured textual data, there is a demand for users to understand the huge amount of text in a short time; this demands the need for an automatic text summarizer. From summaries, the users get the idea of the entire content of the document and can decide whether to read the entire document or not. This work mainly focuses on generating a summary from multiple news documents. In this case, the summary helps to reduce the redundant news from the different newspapers. A multi-document summary is more challenging than a single-document summary since it has to solve the problem of overlapping information among sentences from different documents. Extractive text summarization yields the sensitive part of the document by neglecting the irrelevant and redundant sentences. In this paper, we propose a framework for extracting a summary from multiple documents in the Malayalam Language. Also, since the multi-document summarization data set is sparse, methods based on deep learning are difficult to apply. The proposed work discusses the performance of existing standard algorithms in multi-document summarization of the Malayalam Language. We propose a sentence extraction algorithm that selects the top ranked sentences with maximum diversity. The system is found to perform well in terms of precision, recall, and F-measure on multiple input documents.
- Published
- 2021
- Full Text
- View/download PDF
5. Extracción de palabras claves de reviews de portales e-commerce
- Author
-
Meza Tudela, Alejandro Felipe, Escuela Técnica Superior de Ingeniería Industrial, Informática y de Telecomunicación, Industria, Informatika eta Telekomunikazio Ingeniaritzako Goi Mailako Eskola Teknikoa, and Armendáriz Íñigo, José Enrique
- Subjects
LDA ,Transformers ,TF-IDF ,Reviews ,Deep learning ,E-commerce ,TextRank ,NLP ,BERT - Abstract
En la era actual; la era de la información, las reviews de productos tienen un gran peso en las decisiones de compra de los usuarios. Es por ello, que este trabajo de fin de grado se centra en automatizar la extracción de palabras claves de reviews de portales de e-commerce. Para conseguir esto, se eligió el portal de Amazon como fuente de reviews de usuarios. La automatización, se conseguirá a través de la utilización de diferentes métodos de uno los campos más populares de inteligencia artificial, como es el de NLP - Natural Language Processing -. Graduado o Graduada en Ingeniería Informática por la Universidad Pública de Navarra Informatika Ingeniaritzako Graduatua Nafarroako Unibertsitate Publikoan
- Published
- 2022
6. Recommender system using NLP techniques
- Author
-
Rubio Pacho, Enric, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Sànchez-Marrè, Miquel, and Felip i Díaz, Bernat
- Subjects
recommender system ,MultipartiteRank ,keyphrase extraction ,word embedding ,NLP ,sistema de recomanacions ,Informàtica::Sistemes d'informació::Emmagatzematge i recuperació de la informació [Àrees temàtiques de la UPC] ,Recommender systems (Information filtering) ,extraccio de paraules clau ,REACH ,TopicRank ,TextRank ,PositionRank ,SingleRank ,Sistemes recomanadors (Filtratge d'informació) - Abstract
MY REACH es una startup nacida en 2018 la cual se dedica a establecer un servicio de almacenamiento centralizado de los diferentes archivos y datos de los usuarios. El principal objetivo que se plantea en REACH es el hecho de poder reducir de manera drástica el tiempo de búsqueda de los datos de cada usuario. El elemento diferenciador que supone REACH delante de las plataformas que estan actualmente en el mercado es el hecho de poder guardar los diferentes archivos y datos del usuario (actualmente páginas web, documentos y categorías) en un sistema de grafo. Esta estructura proporciona al usuario un contexto más entendedor en sus datos que la estructura tradicional de carpetas. Con este potencial proporcionara al usuario poder crear relaciones entre los diferentes elementos haciendo así de sus datos un conjunto con más significado y siendo mucho más compresible. El sistema de recomendaciones se posiciona como uno de los elementos más importantes a tener en cuenta en esta plataforma, ya que el usuario cuando tiene una zona de trabajo con mucha información, un buen sistema de recomendaciones le permite conectar su grafo de manera sencilla creando relaciones con los otros datos y archivos. De esta manera, el usuario podrá utilizar de manera más eficiente la búsqueda y las ventajas que ofrece la plataforma. El objetivo de este proyecto es mejorar el actual sistema de recomendaciones implementado actualmente en la plataforma, ya que la implementación actual recomienda enlaces con otros archivos y datos que no son parecidos. La propuesta que se plantea en este proyecto es la mejora de la extracción de palabras claves de un archivo o dato del usuario y, además, la mejora en las recomendaciones dadas entre datos. En un futuro, también se plantea implementar la creación de relaciones automáticas. Después de un estado previo del estado del arte de este tipo de aplicaciones y de las diferentes técnicas que puedan ser utilizadas para llevar a cabo el trabajo. El procedimiento propuesto en este proyecto se basa en la creación de dos microservicios. El primero encargado de extraer palabras clave de un archivo o dato del usuario utilizando técnicas de procesamiento de lenguaje natural. El segundo microservicio se encara de recomendar relaciones utilizando también técnicas de procesamiento de lenguaje natural. Los resultados de la aplicación de este tipo de técnicas han resultado bastante buenos. Ha sido el ponente, el director y el equipo de REACH los encargados de evaluar el sistema, aun así, es un proyecto que necesita tiempo para que los usuarios puedan probarlo y mostrar su opinión al respecto. MY REACH is a startup born in 2018 which dedicates to establish a centralized storage service with files and data from different users. The main aim of REACH is to improve highly the time of a user's search. Currently, the most notorious difference between REACH and the rest of the companies, that exist on the market, is the way that how files and data of users (currently web pages, documents and subjects) are saved on a graph system. This structure brings to the user a better and more understable context of his data than on the traditional file system. With this potential tool will give to the user the possibility to make his data a more comprehensive set. The recommendation system is positioned as one of the most important elements to take into account in this platform, because when the user has a workspace with a lot of information, a good recommendation system allows him to connect his network in a simple way by creating relationships with other data and files. In this way, the user will be able to make a more efficient use of the search and the advantages offered by the software. The goal of this project is to improve the current recommender system which is currently implemented in the software, as the current implementation recommends links to other files and data that are not similar. The proposal put forward in this project is to improve the extraction of keywords/keyphrases from a file or user data and, in addition, to improve the links given among data. In the future, it is also proposed to implement the creation of automatic relations. After a previous state of the art of this type of applications and the different techniques that can be used to carry out the work. The procedure proposed in this project is based on the creation of two microservices. The first one is in charge of extracting keyphrases from a file or user data using natural language processing. The second microservice is in charge of recommending relationships also using natural language processing. The results of the application of this type of technique have resulted good enough. It was the professor, the director and REACH's team to evaluate the system, even so, it is a project that needs time because it is needed that users be able to test and bring their feedback.
- Published
- 2021
7. Klustring och Summering av Chatt-Dialoger
- Author
-
Hidén, Oskar and Björelind, David
- Subjects
Extractive summarization ,Word Mover's Distance (WMD) ,NLP ,DBSCAN ,Clustering ,Datorteknik ,Machine Learning ,FastText ,LSA ,S-BERT ,TextRank ,Computer Engineering ,K-means ,Text Representations ,TFIDF ,HDBSCAN - Abstract
The Customer Success department at Visma handles about 200 000 customer chats each year, the chat dialogues are stored and contain both questions and answers. In order to get an idea of what customers ask about, the Customer Success department has to read a random sample of the chat dialogues manually. This thesis develops and investigates an analysis tool for the chat data, using the approach of clustering and summarization. The approach aims to decrease the time spent and increase the quality of the analysis. Models for clustering (K-means, DBSCAN and HDBSCAN) and extractive summarization (K-means, LSA and TextRank) are compared. Each algorithm is combined with three different text representations (TFIDF, S-BERT and FastText) to create models for evaluation. These models are evaluated against a test set, created for the purpose of this thesis. Silhouette Index and Adjusted Rand Index are used to evaluate the clustering models. ROUGE measure together with a qualitative evaluation are used to evaluate the extractive summarization models. In addition to this, the best clustering model is further evaluated to understand how different data sizes impact performance. TFIDF Unigram together with HDBSCAN or K-means obtained the best results for clustering, whereas FastText together with TextRank obtained the best results for extractive summarization. This thesis applies known models on a textual domain of customer chat dialogues, something that, to our knowledge, has previously not been done in literature.
- Published
- 2021
8. Extractive Multi-document Summarization of News Articles
- Author
-
Grant, Harald
- Subjects
multi-document ,Transformer ,transfer learning ,NLP ,textrank ,attention ,Language Technology (Computational Linguistics) ,text-to-text generation ,neural embeddings ,information extraction ,extractive summarization ,ROUGE ,fine-tuning ,Språkteknologi (språkvetenskaplig databehandling) ,BERT - Abstract
Publicly available data grows exponentially through web services and technological advancements. To comprehend large data-streams multi-document summarization (MDS) can be used. In this research, the area of multi-document summarization is investigated. Multiple systems for extractive multi-document summarization are implemented using modern techniques, in the form of the pre-trained BERT language model for word embeddings and sentence classification. This is combined with well proven techniques, in the form of the TextRank ranking algorithm, the Waterfall architecture and anti-redundancy filtering. The systems are evaluated on the DUC-2002, 2006 and 2007 datasets using the ROUGE metric. Where the results show that the BM25 sentence representation implemented in the TextRank model using the Waterfall architecture and an anti-redundancy technique outperforms the other implementations, providing competitive results with other state-of-the-art systems. A cohesive model is derived from the leading system and tried in a user study using a real-world application. The user study is conducted using a real-time news detection application with users from the news-domain. The study shows a clear favour for cohesive summaries in the case of extractive multi-document summarization. Where the cohesive summary is preferred in the majority of cases.
- Published
- 2019
9. Big Data Text Summarization - Hurricane Harvey
- Author
-
Geissinger, Jack, Long, Theo, Jung, James, Parent, Jordan, and Rizzo, Robert
- Subjects
abstractive summarization ,text summarization ,deep learning ,topic summarization ,neural networks ,NLP ,computational linguistics ,big data text summarization ,pointer-generator network ,big data ,template filling ,multi-document summarization ,Hurricane Harvey ,hurricanes ,TextRank ,information extraction ,natural language processing ,extractive summarization ,event summarization - Abstract
Natural language processing (NLP) has advanced in recent years. Accordingly, we present progressively more complex generated text summaries on the topic Hurricane Harvey. We utilized TextRank, which is an unsupervised extractive summarization algorithm. TextRank is computationally expensive, and the sentences generated by the algorithm aren’t always directly related or essential to the topic at hand. When evaluating TextRank, we found that a single sentence interjected and ruined the flow of the summary. We also found that ROUGE evaluation for our TextRank summary was quite low compared to a golden standard that was prepared for us. However, the TextRank summary had high marks for ROUGE evaluation compared to the Wikipedia article lead for Hurricane Harvey. To improve upon the TextRank algorithm, we utilized template summarization with named entities. Template summarization takes less time to run than TextRank but is supervised by the author of the template and script to choose valuable named entities. Thus, it is highly dependent on human intervention to produce reasonable and readable summaries that aren’t error-prone. As expected, the template summary evaluated well compared to the Gold Standard and the Wikipedia article lead. This result is mainly due to our ability to include named entities we thought were pertinent to the summary. Beyond extractive summaries like TextRank and template summarization, we pursued abstractive summarization using pointer-generator networks and multi-document summarization with pointer-generator networks and maximal marginal relevance. The benefit of using abstractive summarization is that it is more in-line with how humans summarize documents. Pointer-generator networks, however, require GPUs to run properly and a large amount of training data. Luckily, we were able to use a pre-trained network to generate summaries. The pointer-generator network is the centerpiece of our abstractive methods and allowed us to create summaries in the first place. NLP is at an inflection point due to deep learning, and our generated summaries using a state-of-the-art pointer-generator neural network are filled with details about Hurricane Harvey, including damage incurred, the average amount of rainfall, and the locations it affected the most. The summary is also free of grammatical errors. We also use a novel Python library, written by Logan Lebanoff at the University of Central Florida, for multi-document summarization using deep learning to summarize our Hurricane Harvey dataset of 500 articles and the Wikipedia article for Hurricane Harvey. The summary of the Wikipedia article is our final summary and has the highest ROUGE scores that we could attain. NSF: IIS-1619028 - BDTS_Hurricane_Harvey_final_report.docx: Editable version of the final report - BDTS_Hurricane_Harvey_final_report.pdf: PDF version of the final report - BDTS_Hurricane_Harvey_presentation.pptx: Editable version of the presentation slides - BDTS_Hurricane_Harvey_presentation.pdf: PDF version of the presentation slides Source file in zip: - freq_words.py - Finds the most frequent words in a JSON file that contains a sentences field. Requires a file to be passed through the -f option. - pos_tagging.py - Performs basic part-of-speech tagging on a JSON file that contains a sentences field. Requires a file to be passed through the -f option. - textrank_summarizer.py - Performs TextRank summarization with a JSON file that contains a sentences field. Requires a file to be passed through the -f option. - template_summarizer.py - Performs template summarization with a JSON file that contains a sentences field. Requires a file to be passed through the -f option. - wikipedia_content.py - Extracts content from a Wikipedia page given a topic and formats the information for the pointer-generator network using the “make_datafiles.py” script. Requires a topic to be given in the -t option and an output directory for “make_datafiles.py” to read from with the -o option. - make_datafiles.py - Called by "wikipedia_content.py" to convert story files to .bin files. - jusText.py - Used to clean up the large dataset - requirements.txt - Used with Anaconda for installing all of the dependencies. - small_dataset.json - Properly formatted JSON file for use with other files.
- Published
- 2018
10. Enhancing Readability of Privacy Policies Through Ontologies
- Author
-
Audich, Dhiren, Nonnecke, Blair, and Dara, Rozita
- Subjects
HCI ,data privacy ,feature extraction ,TF-IDF ,RAKE ,Human Computer Interaction ,artificial intelligence ,privacy ,AlchemyAPI ,NLP ,taxonomy ,keyword extraction ,comprehensive taxonomy ,online privacy ,ontology ,TextRank ,privacy policy ,natural language processing ,law - Abstract
Privacy policies operate as memorandums of understanding (MOUs) between the users and providers of online services. Research suggests that users are deterred from reading policies because of their length, difficult language, and insufficient information. Users are more likely to read short excerpts if they immediately addresses their concerns. As a first step in helping users find pertinent information in privacy policies, this thesis presents the development of a domain ontology using natural language processing (NLP) algorithms as a way to reduce costs and speed up development. By using the ontology to locate key parts of privacy policies, average reading times were substantially reduced from 8-12 minutes to 45 seconds. In the process of extracting keywords from the privacy policy corpus, a supervised NLP algorithm performed marginally better (7%) but showed greater promise with larger training sets. Additionally, trained non-domain experts achieved a combined F1-score of 71% when compared to a domain expert, and did so when extracting keywords from fewer policies.
- Published
- 2018
11. A Framework for Generating Extractive Summary from Multiple Malayalam Documents.
- Author
-
Manju, K., David Peter, S., and Idicula, Sumam Mary
- Subjects
SENTENCING guidelines (Criminal procedure) ,PERFORMANCE standards ,JOB performance ,DEEP learning ,PROBLEM solving ,ELECTRONIC newspapers - Abstract
Automatic extractive text summarization retrieves a subset of data that represents most notable sentences in the entire document. In the era of digital explosion, which is mostly unstructured textual data, there is a demand for users to understand the huge amount of text in a short time; this demands the need for an automatic text summarizer. From summaries, the users get the idea of the entire content of the document and can decide whether to read the entire document or not. This work mainly focuses on generating a summary from multiple news documents. In this case, the summary helps to reduce the redundant news from the different newspapers. A multi-document summary is more challenging than a single-document summary since it has to solve the problem of overlapping information among sentences from different documents. Extractive text summarization yields the sensitive part of the document by neglecting the irrelevant and redundant sentences. In this paper, we propose a framework for extracting a summary from multiple documents in the Malayalam Language. Also, since the multi-document summarization data set is sparse, methods based on deep learning are difficult to apply. The proposed work discusses the performance of existing standard algorithms in multi-document summarization of the Malayalam Language. We propose a sentence extraction algorithm that selects the top ranked sentences with maximum diversity. The system is found to perform well in terms of precision, recall, and F-measure on multiple input documents. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.