21 results on '"TEXT summarization"'
Search Results
2. Pre-training Classification and Clustering Models for Vietnamese Automatic Text Summarization
- Author
-
Nguyen, Ti-Hon, Do, Thanh-Nghi, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Sharma, Harish, editor, Shrivastava, Vivek, editor, Bharti, Kusum Kumari, editor, and Wang, Lipo, editor
- Published
- 2023
- Full Text
- View/download PDF
3. Extractive Text Summarization on Large-Scale Dataset Using K-Means Clustering and Word Embedding
- Author
-
Nguyen, Ti-Hon, Do, Thanh-Nghi, Xhafa, Fatos, Series Editor, Smys, S., editor, Lafata, Pavel, editor, Palanisamy, Ram, editor, and Kamel, Khaled A., editor
- Published
- 2023
- Full Text
- View/download PDF
4. Extractive text summarization using clustering-based topic modeling.
- Author
-
Belwal, Ramesh Chandra, Rai, Sawan, and Gupta, Atul
- Subjects
- *
TEXT summarization , *DOCUMENT clustering , *DISTRIBUTION (Probability theory) - Abstract
Text summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. Extractive summarizers select a few best sentences out of the input document, while abstractive methods may modify the sentence structure or introduce new sentences. The proposed approach is an extractive text summarization technique, where we have expanded topic modeling specifically to be applied to multiple lower-level specialized entities (i.e., groups) embedded in a single document. Our goal is to overcome the lack of coherence issues found in the summarization techniques. Topic modeling was initially proposed to model text data at the multi-document and word levels without considering sentence modeling. Subsequently, it has been applied at the sentence level and used for the document summarization; however, certain limitations were associated. Topic modeling does not perform as expected when applied to a single document at the sentence level. To address this shortcoming, we have proposed a summarization approach that is incorporated at the individual document and clusters level (instead of the sentence level). We aim to choose the best statement from each group (containing sentences of the same kind) found in the given text. We have tried to select the perfect topic by evaluating the probability distribution of the words and respective topics' at the cluster level. The method is evaluated on two standard datasets and shows significant performance gains over existing text summarization techniques. Compared to other text summarization techniques, the Rouge parameters for automatic evaluation show a considerable improvement in F-measure, precision, and recall of the generated summary. Furthermore, a manual evaluation has demonstrated that the proposed approach outperforms the current state-of-the-art text summarization approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
5. Extractive Text Summarization on Large-scale Dataset Using K-Means Clustering
- Author
-
Nguyen, Ti-Hon, Do, Thanh-Nghi, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Fujita, Hamido, editor, Fournier-Viger, Philippe, editor, Ali, Moonis, editor, and Wang, Yinglin, editor
- Published
- 2022
- Full Text
- View/download PDF
6. Frequent item-set mining and clustering based ranked biomedical text summarization.
- Author
-
Gupta, Supriya, Sharaff, Aakanksha, and Nagwani, Naresh Kumar
- Subjects
- *
TEXT summarization , *SCIENTIFIC literature , *LITERARY form , *VALUES (Ethics) - Abstract
The difficulty of deriving value out of vast available scientific literature in a condensed form lead us to look for a proficient theme based summarization solution which can preserve precise biomedical content. The study targets to analyze impact of combining semantic biomedical concepts extraction, frequent item-set mining and clustering techniques over information retention, objective functions and ROUGE values for the obtained final summary. The suggested frequent item-set mining and clustering (FRI-CL) graph-based framework uses UMLS metathesarus and BERT-based semantic embeddings to identify domain-relevant concepts. The scrutinized concepts are mined according to their relationship with neighbors and frequency via an amended FP-Growth model. The framework utilizes S-DPMM clustering, which is a probabilistic mixture model and aids in the identification and clubbing of complex relevant patterns to increase coverage of important sub-themes. The sentences with the frequent concepts are scored via PageRank to form an efficient and compelling summary. The research experiments on the 100 sample biomedical documents taken from PubMed archives are evaluated via calculation of ROUGE scores, coverage, readability, non-redundancy, memory utilization and information retention from the summary output. The results with the FRI-CL summarization system showcased 10% ROUGE performance improvement and are at par with the other baseline methods. On an average 30–40% improvement in memory utilization is observed with up to 50% information retention when experiments are performed using S-DPMM clustering. The research indicates that the fusion of semantic mapping, clustering, along with frequent-item set mining of biomedical concepts enhance the overall co-related information covering all sub-themes. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. An Approach for Journal Summarization Using Clustering Based Micro-Summary Generation
- Author
-
Mojeed, Hammed A., Sanoh, Ummu, Salihu, Shakirat A., Balogun, Abdullateef O., Bajeh, Amos O., Akintola, Abimbola G., Mabayoje, Modinat A., Usman-Hamzah, Fatimah E., Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Silhavy, Radek, editor, Silhavy, Petr, editor, and Prokopova, Zdenka, editor
- Published
- 2020
- Full Text
- View/download PDF
8. An Approach for Video Summarization Using Graph-Based Clustering Algorithm
- Author
-
Yasmin, Ghazaala, Chaterjee, Aditya, Das, Asit Kumar, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Das, Asit Kumar, editor, Nayak, Janmenjoy, editor, Naik, Bighnaraj, editor, Pati, Soumen Kumar, editor, and Pelusi, Danilo, editor
- Published
- 2020
- Full Text
- View/download PDF
9. Scientific document summarization in multi-objective clustering framework.
- Author
-
Mishra, Santosh Kumar, Saini, Naveen, Saha, Sriparna, and Bhattacharyya, Pushpak
- Subjects
DIFFERENTIAL evolution ,SCIENTIFIC method ,PROBLEM solving - Abstract
The exponential growth in the number of scientific articles has made it difficult for the researchers to keep themselves updated with the new developments. Scientific document summarization solves this problem by providing a summary of essential contributions. In this paper, we have presented a novel method of scientific document summarization using a multi-objective differential evolution technique. Here, firstly distinct important sentences are extracted by using citation contextualization. These sentences are further clustered using the concept of multi-objective clustering. Two objective functions, PBM index, and XB index, measuring the compactness and separation of sentence clusters, are simultaneously optimized utilizing the search capability of multi-objective differential evolution. We have conducted our experiments on CL-SciSumm 2016, CL-SciSumm 2017, CL-SciSumm 2018, and CL-SciSumm 2019 datasets. Obtained results of CL-SciSumm 2016 and CL-SciSumm 2017 are compared with the state-of-the-art methods. Evaluation results demonstrate that our method outperforms others in terms of ROUGE-2, ROUGE-3, and ROUGE-SU4 scores. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
10. A New Technique for Extrinsic Text Summarization
- Author
-
Kindo, Nishita, Bhuyan, Gananatha, Padhy, Ronali, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Peng, Sheng-Lung, editor, Dey, Nilanjan, editor, and Bundele, Mahesh, editor
- Published
- 2019
- Full Text
- View/download PDF
11. A Variable Dimension Optimization Approach for Text Summarization
- Author
-
Verma, Pradeepika, Om, Hari, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Yadav, Neha, editor, Yadav, Anupam, editor, Bansal, Jagdish Chand, editor, Deep, Kusum, editor, and Kim, Joong Hoon, editor
- Published
- 2019
- Full Text
- View/download PDF
12. MAPREDUCE FRAMEWORK BASED BIG DATA SUMMARIZATION USING HIDDEN MARKOV MODEL AND DBSCAN.
- Author
-
Belerao, Krushnadeo and Chaudhari, S. B.
- Subjects
BIG data ,HIDDEN Markov models ,DATA analysis ,COMPUTER algorithms ,MACHINE learning - Abstract
With the advent of the internet there is vast increase in the storage of information. Almost all the information exists in digital form which reduces lots of the paper work and increases ease of storage. Searching relevant information in collection of documents is a tedious task. The solution comes in picture for this problem is automatic text summarization. In this paper, abstract summary generation of multiple documents for big data is proposed which will consider user input as topic. The proposed technique is designed using DBSCAN algorithm which works with Map Reduce framework for clustering and Hidden Markov Model for summarization. The summarization process is performed in three main stages and provides a modular implementation of multiple documents summarization. The pro-posed method follows preprocessing step in which documents are scanned with similarity and various machine learning technique are applied. The result of applying clustering enhances the summarizer system to collect exact words rather than copying redundant words. Topic based abstract summarization from big data is challenging task particularly when there are multiple documents with same or different content. Hadoop with its programming techniques can provide better ways of generating summary and it also enhances the complexity of summarization process using distributed computing. [ABSTRACT FROM AUTHOR]
- Published
- 2018
13. Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms
- Author
-
M. S. Bewoor and S. H. Patil
- Subjects
Text mining ,text summarization ,clustering ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Technology (General) ,T1-995 ,Information technology ,T58.5-58.64 - Abstract
The availability of various digital sources has created a demand for text mining mechanisms. Effective summary generation mechanisms are needed in order to utilize relevant information from often overwhelming digital data sources. In this view, this paper conducts a survey of various single as well as multi-document text summarization techniques. It also provides analysis of treating a query sentence as a common one, segmented from documents for text summarization. Experimental results show the degree of effectiveness in text summarization over different clustering algorithms.
- Published
- 2018
14. Hybrid Approach To Abstractive Summarization.
- Author
-
Sahoo, Deepak, Bhoi, Ashutosh, and Balabantaray, Rakesh Chandra
- Subjects
INFORMATION retrieval ,ABSTRACTING ,MARKOV processes ,CLUSTER analysis (Statistics) ,DATA fusion (Statistics) - Abstract
Text summarization is an application of information retrieval where short and non-redundant version of comparatively large text is presented to the end user. In this paper a hybrid approach is presented to generate abstract summary in which sentences are clustered using sentence level relationships among sentences in association with Markov clustering principle. Then sentence ranking is done in each cluster and if possible the top weighted sentence of each cluster is fused using some linguistic rules with its best-fit sentence(if found) within that cluster to generate a new sentence. Then top ranked sentences from each cluster are compressed using classification technique to generate the abstract summary. The proposed system is evaluated with DUC 2002 data set and the result is performing better than other existing systems. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
15. Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms.
- Author
-
Bewoor, Mrunal S. and Patil, Suhas H.
- Subjects
TEXT mining ,INFORMATION retrieval ,DOCUMENT clustering ,ALGORITHMS ,ELECTRONIC information resources - Abstract
The availability of various digital sources has created a demand for text mining mechanisms. Effective summary generation mechanisms are needed in order to utilize relevant information from often overwhelming digital data sources. In this view, this paper conducts a survey of various single as well as multi-document text summarization techniques. It also provides analysis of treating a query sentence as a common one, segmented from documents for text summarization. Experimental results show the degree of effectiveness in text summarization over different clustering algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
16. Opinion mining with reviews summarization based on clustering
- Author
-
Marzijarani, Shabnam Bagheri and Sajedi, Hedieh
- Published
- 2020
- Full Text
- View/download PDF
17. Automatic Text Summarization Using Latent Drichlet Allocation (LDA) for Document Clustering.
- Author
-
Hidayat, Erwin Yudi, Firdausillah, Fahri, Hastuti, Khafiizh, Dewi, Ika Novita, and Azhari
- Subjects
DOCUMENT clustering ,FEATURE extraction ,COMPUTER algorithms ,K-means clustering ,ELECTRONIC data processing - Abstract
In this paper, we present Latent Drichlet Allocation in automatic text summarization to improve accuracy in document clustering. The experiments involving 398 data set from public blog article obtained by using python scrapy crawler and scraper. Several steps of clustering in this research are preprocessing, automatic document compression using feature method, automatic document compression using LDA, word weighting and clustering algorithm The results show that automatic document summarization with LDA reaches 72% in LDA 40%, compared to traditional k-means method which only reaches 66%. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
18. A topic modeled unsupervised approach to single document extractive text summarization.
- Author
-
Srivastava, Ridam, Singh, Prabhav, Rana, K.P.S., and Kumar, Vineet
- Subjects
- *
DEEP learning , *NATURAL language processing , *SENTIMENT analysis , *TEXT summarization - Abstract
Automatic Text Summarization (ATS) is an essential field in natural language processing that attempts to condense large text documents so that users can assimilate information quickly. It finds uses in medical document summarization, review generation, and opinion mining. This work investigated an unsupervised extractive summarization approach that combined clustering with topic modeling to reduce topic bias. Latent Dirichlet Allocation was used for topic modeling, while K-Medoids clustering was employed for summary generation. The approach was evaluated on three datasets—Wikihow, CNN/DailyMail, and the DUC2002 Corpus. The Recall Oriented-Understudy for Gisting Evaluation (ROUGE) metrics were used for comparative analysis against recently reported techniques, specifically ROUGE-1 (R-1), ROUGE-2 (R-2), and ROUGE-L (R-L). The suggested framework offered scores of 34.80%, 9.13%, and 32.30% on the Wikihow Dataset, 43.90%, 19.01%, and 41.50% on the CNN/DailyMail Dataset, and 49.35%, 31.53%, and 41.72% on the DUC2002 Corpus (R-1, R-2, R-L respectively). These reported metrics are found to be superior when compared to similar recent works. Further, execution time of the proposed method was also recorded and compared with counterparts, which established its superior speed. Based on these promising outcomes, it was concluded that an unsupervised extractive summarization approach with greater subtopic focus significantly improves over generic topic modeling semantic and deep learning approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
19. Extractive Text Summarization of Greek News Articles Based on Sentence-Clusters
- Author
-
Kantzola, Evangelia
- Subjects
word embeddings ,sentence embeddings ,machine learning ,text summarization ,nlp ,Språkteknologi (språkvetenskaplig databehandling) ,clustering ,Language Technology (Computational Linguistics) - Abstract
This thesis introduces an extractive summarization system for Greek news articles based on sentence clustering. The main purpose of the paper is to evaluate the impact of three different types of text representation, Word2Vec embeddings, TF-IDF and LASER embeddings, on the summarization task. By taking these techniques into account, we build three different versions of the initial summarizer. Moreover, we create a new corpus of gold standard summaries to evaluate them against the system summaries. The new collection of reference summaries is merged with a part of the MultiLing Pilot 2011 in order to constitute our main dataset. We perform both automatic and human evaluation. Our automatic ROUGE results suggest that System A which employs Average Word2Vec vectors to create sentence embeddings, outperforms the other two systems by yielding higher ROUGE-L F-scores. Contrary to our initial hypotheses, System C using LASER embeddings fails to surpass even the Word2Vec embeddings method, showing sometimes a weak sentence representation. With regard to the scores obtained by the manual evaluation task, we observe that System A using Average Word2Vec vectors and System C with LASER embeddings tend to produce more coherent and adequate summaries than System B employing TF-IDF. Furthermore, the majority of system summaries are rated very high with respect to non-redundancy. Overall, System A utilizing Average Word2Vec embeddings performs quite successfully according to both evaluations.
- Published
- 2020
20. Text Analytics and Machine Learning (TML) CS5604 Fall 2019
- Author
-
Mansur, Rifat Sabbir, Mandke, Prathamesh, Gong, Jiaying, Bharadwaj, Sandhya M., Juvekar, Adheesh Sunil, and Chougule, Sharvari
- Subjects
recommender system ,electronic thesis and dissertation ,named-entity recognition ,sentiment analysis ,tobacco documents ,text summarization ,search optimization ,clustering - Abstract
In order to use the burgeoning amount of data for knowledge discovery, it is becoming increasingly important to build efficient and intelligent information retrieval systems.The challenge in informational retrieval lies not only in fetching the documents relevant to a query but also in ranking them in the order of relevance. The large size of the corpora as well as the variety in the content and the format of information pose additional challenges in the retrieval process. This calls for the use of text analytics and machine learning techniques to analyze and extract insights from the data to build an efficient retrieval system that enhances the overall user experience. With this background, the goal of the Text Analytics and Machine Learning team is to suitably augment the document indexing and demonstrate a qualitative improvement in the document retrieval. Further, we also plan to make use of document browsing and viewing logs to provide meaningful recommendations to the user. The goal of the class is to build an end-to-end information retrieval system for two document corpora, viz., Electronic Theses & Dissertations (ETDs) and Tobacco Settlement Records (TSRs). The ETDs are a collection of over 33,000 thesis and dissertation documents in VTechWorks at Virginia Tech. The challenge in building a retrieval system around this corpus lies in the distinct nature of ETDs as opposed to other well studied document formats such as conference/journal publications and web-pages. The TSR corpus consists of over 14M records covering formats ranging from letters and memos to image based advertisements. We seek to understand the nature of both these corpora as well as the information need patterns of the users in order to augment the index based search with domain specific information using machine learning based methods. Extending prior experiments, we investigate reasons for the unbalanced nature of the clusters from the previous iterations of the K-Means algorithm on the tobacco data. In addition, we explore and present preliminary results of running Agglomerative Clustering on a small subset of the tobacco data. We also explored different pre-trained models of detecting sentiments. We identified a package, empath, that shows better results in identifying emotions in the tobacco deposition documents. Besides, we implemented text summarization based on both Latent Semantic Analysis and the Luhn Algorithm on the tobacco (article) data (38,038 documents). We also implemented text summarization on a sample ETD chapter dataset. IMLS LG-37-19-0078-19 Dr. David M. Townsend # TMLreport.pdf = This is the report of our overall motivation, design, procedures, evaluations, etc. The report was created using Latex via Overleaf. # TMLreportOverleaf.zip = This is the source files from the Overleaf of our report. This can be used to make future changes to our report. # TMLpresentationPDF.pdf = This is our final presentation slides in PDF format. It contains our overall exploration and findings. # TMLpresentationSlides.pptx = This is our editable PowerPoint files to our presentation slides. # TMLcodeClustering.zip = This contains all the source code for clustering. # TMLcodeTextSummarization = This contains all the source code for text summarization # TMLcodeNER = This contains all the source code and sample output for named-entity recognition (NER) # TMLcodeSentimentAnalysis = This contains all the source code for sentiment analysis
- Published
- 2019
21. Deep contextualized embeddings for quantifying the informative content in biomedical text summarization.
- Author
-
Moradi, Milad, Dorffner, Georg, and Samwald, Matthias
- Subjects
- *
EMBEDDINGS (Mathematics) , *NATURAL language processing , *HYBRID systems , *DEEP learning - Abstract
• A deep bidirectional language model is used to capture the context of sentences. • The shared content between sentences is quantified using contextualized embeddings. • A hierarchical clustering algorithm is utilized to identify the most relevant sentences. • The summarizer improves the performance of biomedical text summarization. • Contextualized embeddings can effectively capture the context in biomedical summarization. Capturing the context of text is a challenging task in biomedical text summarization. The objective of this research is to show how contextualized embeddings produced by a deep bidirectional language model can be utilized to quantify the informative content of sentences in biomedical text summarization. We propose a novel summarization method that utilizes contextualized embeddings generated by the Bidirectional Encoder Representations from Transformers (BERT) model, a deep learning model that recently demonstrated state-of-the-art results in several natural language processing tasks. We combine different versions of BERT with a clustering method to identify the most relevant and informative sentences of input documents. Using the ROUGE toolkit, we evaluate the summarizer against several methods previously described in literature. The summarizer obtains state-of-the-art results and significantly improves the performance of biomedical text summarization in comparison to a set of domain-specific and domain-independent methods. The largest language model not specifically pretrained on biomedical text outperformed other models. However, among language models of the same size, the one further pretrained on biomedical text obtained best results. We demonstrate that a hybrid system combining a deep bidirectional language model and a clustering method yields state-of-the-art results without requiring labor-intensive creation of annotated features or knowledge bases or computationally demanding domain-specific pretraining. This study provides a starting point towards investigating deep contextualized language models for biomedical text summarization. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.