Descriptor: "TEXT summarization" / Search Limiters: Academic (Peer-Reviewed) Journals / Topic: text mining - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"TEXT summarization"' showing total 33 results

Start Over Descriptor "TEXT summarization" Search Limiters Academic (Peer-Reviewed) Journals Topic text mining

33 results on '"TEXT summarization"'

1. A New Algorithm for Arabic Document Clustering Utilizing Maximal Wordsets.

Author: Salman, Khitam A. and Khafaji, Hussein K.
Subjects: DOCUMENT clustering, NATURAL language processing, DATA structures, TEXT summarization, ALGORITHMS, TEXT mining
Abstract: Arabic document clustering (ADC) is a critical task in Arabic Natural Language Processing (ANLP), with applications in text mining, information retrieval, Arabic search engines, sentiment analysis, topic modeling, document summarization, and user review analysis. In spite of the critical needs of ADC, the available ADC algorithms achieved limited success based on the evaluation metrics used for clustering. This paper proposes a novel method for clustering Arabic documents. The method leverages Maximal Frequent Wordsets (MFWs). The MFWs are extracted using the FPMax algorithm, a data mining technique adept at identifying significant recurring word patterns within the documents. These MFWSs serve as features for a new clustering approach that groups documents based on content similarity. Each MFW serves as a data structure housing features, their respective strengths in clustering, and the corresponding documents, simplifying the clustering process to a mere measurement of similarity. The proposed approach offers various clustering results for varying numbers of clusters in one training session. The effectiveness of the proposed method is assessed using two well-known benchmark datasets (CNN and OSAC), achieving accuracy of 80% and 81% respectively. This approach offers a promising contribution to the field of ANLP. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Bidirectional recommendation in HR analytics through text summarization.

Author: Arandi, Channabasamma and Yeresime, Suresh
Subjects: TEXT summarization, TEXT mining, JOB resumes, ELECTRONIC information resource searching, JOB qualifications, SUPPLY & demand
Abstract: For over a decade, online job portals have been providing their services to both job seekers and employers in search of hiring opportunities. Because of the high demand for recruitment, it is insufficient to use conventional hiring methods to find a suitable candidate to fill the position. Validating resumes online is challenging due to the potential for manual errors, making the process inherently risky. The bidirectional method comprises named entity recognition (NER) for extracting the required resumes for recruiters. Cosine similarity shows the match percentage of resumes for the job requirements and vice versa. In an attempt to tackle an issue of unregistered words, a solution called decoder attention with pointer network (DA-PN) has been introduced. This method incorporates the use of coverage mechanism to prevent word repetition through generated text summary. DA-PN+Cover method with mixed learning objective (MLO) (DA-PN+Cover+MLO) is utilized for protecting grow of increasing faults in generated text summary. Performance of proposed method is estimated using evaluation indicator recall oriented understudy for gisting evaluation (ROUGE) and attains an average of 27.47 which is comparatively higher than existing methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. From Text to Action: NLP Techniques for Washing Machine Manual Processing.

Author: Biju, Vinai George, Babu, Bibin, Asghar, Ali, Prathap, Boppuru Rudra, and Reddy, Vandana
Subjects: WASHING machines, TEXT summarization, LANGUAGE models, QUESTION answering systems, NATURAL language processing, TEXT mining
Abstract: This scientific research study focuses on the advancements in Natural Language Processing (NLP) driven by large-scale parallel corpora and presents a comprehensive methodology for creating a parallel, multilingual corpus using NLP techniques and semantic technologies, with a particular focus on washing machine manuals. The study highlights the significant progress made in NLP through the utilization of large-scale parallel corpora and advanced NLP techniques. The successful creation of a parallel, multilingual corpus for washing machine manuals, coupled with the integration of semantic technologies and ontology modeling, demonstrates the broad applicability and potential of NLP in diverse domains.The research covers various aspects, including text extraction, segmentation, and the development of specialized pipelines for question-answering, translation, and text summarization tailored for washing machine manuals. Translation experiments using fine-tuned models demonstrated the feasibility of providing washing machine manuals in local languages, expanding accessibility and understanding for users worldwide. Additionally, the study explored text summarization using a powerful transformer-based model, which exhibited remarkable proficiency in generating concise and coherent summaries from complex input texts. The implementation of a question-answering pipeline showcased the effectiveness of various language models in handling question-answering tasks with high accuracy and effectiveness.Additionally, the article discusses the processes of data collection, information preparation, ontology creation, alignment strategies, and text analytics. Furthermore, the study addresses the challenges and potential future developments in this field, offering insights into the promising applications of NLP in the context of washing machine manuals. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Hybrid Technique of Topic Modelling and Text Summarization: A Case Study on Predicting Trends in Green Computing.

Author: Pandey, Mansi, Sharma, Chetan, Sharma, Shamneesh, and Aggarwal, Trapty
Subjects: TEXT summarization, AUTOMATIC summarization, NATURAL language processing, TEXT mining, LATENT semantic analysis
Abstract: Text mining techniques are used for trend prediction using keyword analysis; however, these processes result in the formation of relevant and irrelevant keywords. Based on the keywords, various clusters are formed, resulting in various topics. Due to the presence of irrelevant keywords, there are chances of the formation of wrong topics. To overcome this problem, this research contributes to developing an algorithm that deals with topic prediction using a noble technique wherein text summarization is inculcated into topic modeling algorithms. This research focuses on implementing text summarizing to generate summaries of published publications by diverse researchers using the Genism library with an extractive text summarization approach and then applying text mining to it to predict trends in various fields. The current approach was compared with existing techniques based on the parameters used in automatic and semi-automatic text mining techniques. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. LemmaQuest Lemmatizer: A Morphological Analyzer Handling Nominalization.

Author: Gupta, Rupam and Jivani, Anjali G.
Subjects: *TEXT summarization, *TENSE (Grammar), *ENCYCLOPEDIAS & dictionaries, *TEXT mining
Abstract: The discussion in this paper is related to extracting a single lemma from different morphological variants related to a particular dictionary root word. The existing popular online lemmatizers like the Stanford LemmaProcessor, Spacy Lemmatizer, LemmaGen, MorphAdorner, etc. generate the correct lemmas for all singular-plural nouns and all verbal words existing in different tenses, but all these lemmatizers are not able to derive the correct lemma for any type of derived words; specially for nominalized derived words. The proposed lemmatizer – 'LemmaQuest' is designed and implemented to overcome these limitations. LemmaQuest first creates distinct groups for all allied morphed words like singular-plural nouns, verbs in all tenses, and nominalized words. These groups are created based on a combination of different statistical distance measures considering all possible pairs of input words. After that, lemmas are generated for each group. The main objective of this proposed model is to extract the correct lemma for a set of a large number of input words in an optimized time, which leads to a vast improvement in text simplification, keyword extraction, text summarization and other text mining applications. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

6. Combination of Graph-based Approach and Sequential Pattern Mining for Extractive Text Summarization with Indonesian Language.

Author: Maylawati, Dian Sa'adillah, Kumar, Yogan Jaya, and Kasmin, Fauziah Binti
Subjects: SEQUENTIAL pattern mining, TEXT summarization, TEXT mining, INDONESIAN language
Abstract: The great challenge in Indonesian automatic text summarization research is producing readable summaries. The quality of text summary can be reached if the meaning of the text can be maintained properly. As a result, the purpose of this study is to improve the quality of extractive Indonesian automatic text summarization by considering the quality of structured text representation. This study employs Sequential Pattern Mining (SPM) to generate a sequence of words as a structured representation of text and a graph-based approach to generate automatic text summarization. The SPM algorithm used is PrefixSpan, and the graph-based approach uses the Bellman-Ford algorithm. The results of an experiment using the IndoSum dataset show that combining SPM and Bellman-Ford can improve the precision, recall, and f-measure of ROUGE-1, ROUGE-2, and ROUGE-L. When Bellman-Ford is combined with SPM, the F-measure of ROUGE-1 increases from 0.2299 to 0.3342. The ROUGE-2 f-measure increases from 0.1342 to 0.2191, and the ROUGE-L f-measure increases from 0.1904 to 0.2878. This result demonstrates that SPM can improve the performance of the Bellman-Ford algorithm in producing Indonesian text summaries. [ABSTRACT FROM AUTHOR]
Published: 2023

7. A Comparative Study and Analysis of Text Summarization Methods

Author: Akinul Islam Jony, Anika Tahsin Rithin, and Siam Ibne Edrish
Subjects: NLP, Text mining, Text Summarization, Extractive, Abstractive, Technology
Abstract: This Various text summarization methods, such as extractive, abstractive, and human abstraction concepts have been compared in terms of performance, each with its specialties and limitations. This research analyses comparisons among the methods and some of their techniques used in text summarization. Our initial contribution is to suggest a thorough overview of the methods. The research methodology aims to compare text summarization methods through a systematic literature review to understand the topic and select appropriate methods. The search method involves keyword-based and citation-based techniques using academic search engines. The comparison of methods will consider various evaluation criteria such as document structure, content importance, quantitative approach, qualitative approach, dependency on machine learning, sentence generation, central concept identification, human involvement, representation in mathematics, and historical approaches. The methods would be evaluated based on these criteria to provide an objective and comprehensive comparison. No method consistently produces accurate text summaries. The best course of action will depend on the particulars and constraints of the current work because each method has both positive and negative aspects. The two primary methods for text summarization were discovered to be extractive and abstractive. This comparison study analysed various text summary and revealing each method's positive attributes and drawbacks. By giving a comprehensive overview of the main two methods, this comparative analysis advances the subject of text summarizing.
Published: 2024
Full Text: View/download PDF

8. Graph-Based Extractive Text Summarization Sentence Scoring Scheme for Big Data Applications.

Author: Verma, Jai Prakash, Bhargav, Shir, Bhavsar, Madhuri, Bhattacharya, Pronaya, Bostani, Ali, Chowdhury, Subrata, Webber, Julian, and Mehbodniya, Abolfazl
Subjects: *TEXT summarization, *COMPUTATIONAL linguistics, *BIG data, *NATURAL language processing, *TEXT mining, *SENTIMENT analysis
Abstract: The recent advancements in big data and natural language processing (NLP) have necessitated proficient text mining (TM) schemes that can interpret and analyze voluminous textual data. Text summarization (TS) acts as an essential pillar within recommendation engines. Despite the prevalent use of abstractive techniques in TS, an anticipated shift towards a graph-based extractive TS (ETS) scheme is becoming apparent. The models, although simpler and less resource-intensive, are key in assessing reviews and feedback on products or services. Nonetheless, current methodologies have not fully resolved concerns surrounding complexity, adaptability, and computational demands. Thus, we propose our scheme, GETS, utilizing a graph-based model to forge connections among words and sentences through statistical procedures. The structure encompasses a post-processing stage that includes graph-based sentence clustering. Employing the Apache Spark framework, the scheme is designed for parallel execution, making it adaptable to real-world applications. For evaluation, we selected 500 documents from the WikiHow and Opinosis datasets, categorized them into five classes, and applied the recall-oriented understudying gisting evaluation (ROUGE) parameters for comparison with measures ROUGE-1, 2, and L. The results include recall scores of 0.3942, 0.0952, and 0.3436 for ROUGE-1, 2, and L, respectively (when using the clustered approach). Through a juxtaposition with existing models such as BERTEXT (with 3-gram, 4-gram) and MATCHSUM, our scheme has demonstrated notable improvements, substantiating its applicability and effectiveness in real-world scenarios. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

9. A Taxonomy of Text Mining

Author: Huma Gul, Sadaqat Jan, Ibrar Ali Shah, and Shams Ur Rehman
Subjects: text mining, opinion mining, sentiment analysis, topic modeling, text summarization, Information technology, T58.5-58.64, Electronic computers. Computer science, QA75.5-76.95
Abstract: With a rapid increase in the volume of textual data on the Internet, extracting useful information through innovative text mining techniques has become crucial. In this context, terminology jargon in the literature related to text-mining creates ambiguity and has made it very difficult for researchers to focus in a specific direction and bring innovation. For example, review mining and opinion mining may have different applications, however, from a technical perspective, they are very similar. In this paper, we propose a classification of the text mining terminologies from the perspectives of technical and text-mining processes. The classification is based on a comprehensive literature survey and analysis. This research study presents a clear classification of text mining terminologies based on technical and text mining processes to resolve the issue of terminology jargon. By utilizing the proposed classification, researchers will be able to easily choose a specific direction instead of diverging amongst similar research problems, thereby, driving innovation. Further, the proposed classification will help advance and improve the overall research progress in all text-mining related fields.
Published: 2022

10. Shielding against online harm: A survey on text analysis to prevent cyberbullying.

Author: Mishra, Akanksha, Sinha, Sharad, and George, Clint Pazhayidam
Subjects: *CYBERBULLYING, *SOCIAL media, *TEXT summarization, *INSTANT messaging software, *EVIDENCE gaps, *SPATIAL behavior
Abstract: Cyberbullying poses a digital threat to society. In this survey, we explain what cyberbullying is and its various forms. We focus on social media platforms and instant messaging apps that are susceptible to cyberbullying, discussing how we can identify such behavior in these spaces. Moving on, we conduct a systematic review of publicly available datasets in different languages, exploring techniques for data preprocessing, feature representation, and methodologies used in textual analysis for cyberbullying detection. We specifically look at natural language-based and platform-specific preprocessing methods. We also cover popular feature representation techniques like sentiment analysis, user information, text summarization, symbols, images, and word embedding for detecting cyberbullying. Next, we categorize existing techniques, including machine learning and neural networks, highlighting research gaps. Additionally, we discuss the challenges associated with current datasets and methods. This survey aims to provide early researchers with insights into cyberbullying literature and guide them in exploring potential research directions. • Analysis of prior literature for cyberbullying detection. • Proposes possible future research directions based on identified research challenges. • Identifies the need for high-quality datasets with contextual information. • Develop mitigation systems relevant in real-time. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. An Improved Method for Extractive Based Opinion Summarization Using Opinion Mining.

Author: Bhatia, Surbhi and AlOjail, Mohammed
Subjects: SENTIMENT analysis, DATA extraction, TEXT mining, NATURAL language processing, DEEP learning, AUTOMATIC summarization
Abstract: Opinion summarization recapitulates the opinions about a common topic automatically. The primary motive of summarization is to preserve the properties of the text and is shortened in a way with no loss in the semantics of the text. The need of automatic summarization efficiently resulted in increased interest among communities of Natural Language Processing and Text Mining. This paper emphasis on building an extractive summarization system combining the features of principal component analysis for dimensionality reduction and bidirectional Recurrent Neural Networks and Long Short-Term Memory (RNN-LSTM) deep learning model for short and exact synopsis using seq2seq model. It presents a paradigm shift with regard to the way extractive summaries are generated. Novel algorithms for word extraction using assertions are proposed. The semantic framework is well-grounded in this research facilitating the correct decision making process after reviewing huge amount of online reviews, considering all its important features into account. The advantages of the proposed solution provides greater computational efficiency, better inferences from social media, data understanding, robustness and handling sparse data. Experiments on the different datasets also outperforms the previous researches and the accuracy is claimed to achieve more than the baselines, showing the efficiency and the novelty in the research paper. The comparisons are done by calculating accuracy with different baselines using Rouge tool. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

12. An Efficient Machine Learning-based Text Summarization in the Malayalam Language.

Author: Haroon, Rosna P., M., Abdul Gafur, and U, Barakkath Nisha
Subjects: TEXT summarization, MALAYALAM language, SUPPORT vector machines, CLASSIFICATION algorithms, LINGUISTIC complexity
Abstract: Automatic text summarization is a procedure that packs enormous content into a more limited book that incorporates significant data. Malayalam is one of the toughest languages utilized in certain areas of India, most normally in Kerala and in Lakshadweep. Natural language processing in the Malayalam language is relatively low due to the complexity of the language as well as the scarcity of available resources. In this paper, a way is proposed to deal with the text summarization process in Malayalam documents by training a model based on the Support Vector Machine classification algorithm. Different features of the text are taken into account for training the machine so that the system can output the most important data from the input text. The classifier can classify the most important, important, average, and least significant sentences into separate classes and based on this, the machine will be able to create a summary of the input document. The user can select a compression ratio so that the system will output that much fraction of the summary. The model performance is measured by using different genres of Malayalam documents as well as documents from the same domain. The model is evaluated by considering content evaluation measures precision, recall, F score, and relative utility. Obtained precision and recall value shows that the model is trustable and found to be more relevant compared to the other summarizers. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

13. A Novel Approach for Semantic Extractive Text Summarization.

Author: Waseemullah, Fatima, Zainab, Zardari, Shehnila, Fahim, Muhammad, Andleeb Siddiqui, Maria, Ibrahim, Ag. Asri Ag., Nisar, Kashif, and Naz, Laviza Falak
Subjects: RECOMMENDER systems, INFORMATION filtering, TEXT mining
Abstract: Text summarization is a technique for shortening down or exacting a long text or document. It becomes critical when someone needs a quick and accurate summary of very long content. Manual text summarization can be expensive and time-consuming. While summarizing, some important content, such as information, concepts, and features of the document, can be lost; therefore, the retention ratio, which contains informative sentences, is lost, and if more information is added, then lengthy texts can be produced, increasing the compression ratio. Therefore, there is a tradeoff between two ratios (compression and retention). The model preserves or collects all the informative sentences by taking only the long sentences and removing the short sentences with less of a compression ratio. It tries to balance the retention ratio by avoiding text redundancies and also filters irrelevant information from the text by removing outliers. It generates sentences in chronological order as the sentences are mentioned in the original document. It also uses a heuristic approach for selecting the best cluster or group, which contains more meaningful sentences that are present in the topmost sentences of the summary. Our proposed model extractive summarizer overcomes these deficiencies and tries to balance between compression and retention ratios. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

14. A comprehensive survey for automatic text summarization: Techniques, approaches and perspectives.

Author: Luo, Mengqi, Xue, Bowen, and Niu, Ben
Subjects: *TEXT summarization, *LANGUAGE models, *MACHINE learning, *AUTOMATIC summarization, *TEXT mining
Abstract: The enormous quantity of text makes it challenging for users to obtain the key information and knowledge. Automatic text summarization can alleviate this problem by providing reliable summaries for massive text documents. During the last decade, significant achievements have been made in text summarization. We conduct this survey to explore what research community is focused on, the application scenarios of summarization, the state-of-the-art techniques and methods, and to analyze the challenges and future direction. We summarize that incorporating with natural language processing, previous text summarization research applied knowledge-based methods, graph-based methods, statistical learning methods, and deep learning methods. Applying large language model to text summarization is still in its early stages. By analyzing current research progress, we conclude that understand semantic information and specific domain knowledge is required for text summarization, and the conciseness and readability of the summary should be ensured. The future research opportunity is automatic knowledge summarization, and more research effort is urgently needed to explore. • Significant achievements of automatic text summarization have been made. • We aim to explore the research community of automatic text summarization. • Incorporating with NLP, text summarization applied knowledge-based, graph-based and machine learning methods. • The combination of extractive and abstractive strategies is required, which can ensure the quality of the summary. • The future research opportunity is automatic knowledge summarization. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. POCASUM: policy categorizer and summarizer based on text mining and machine learning.

Author: Deotale, Rushikesh, Rawat, Shreyash, Vijayarajan, V., and Surya Prasath, V. B.
Subjects: *MACHINE learning, *ARTIFICIAL neural networks, *PROBLEM solving
Abstract: Having control over your data is a right and a duty that every citizen has in our digital society. It is often that users skip entire policies of applications or websites to save time and energy without realizing the potential sticky points in these policies. Due to obscure language and verbose explanations majority of users of hypermedia do not bother to read them. Further, sometimes digital media companies do not spend enough effort in stating their policies clearly which often time can also be incomplete. A summarized version of these privacy policies that can be categorized into the useful information can help the users. To solve this problem, in this work we propose to use machine learning-based models for policy categorizer that classifies the policy paragraphs under the attributes proposed like security, contact, etc. By benchmarking different machine learning-based classifier models, we show that artificial neural network model performs with higher accuracy on a challenging dataset of textual privacy policies. We thus show that machine learning can help summarize the relevant paragraphs under the various attributes so that the user can get the gist of that topic within a few lines. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

16. Extractive Text Summarization Using Recent Approaches: A Survey.

Author: Yadav, Avaneesh Kumar, Maurya, Ashish Kumar, Ranvijay, and Yadav, Rama Shankar
Subjects: DIGITAL media, SUPERVISED learning, TEXT mining
Abstract: In this era of growing digital media, the volume of text data increases day by day from various sources and may contain entire documents, books, articles, etc. This amount of text is a source of information that may be insignificant, redundant, and sometimes may not carry any meaningful representation. Therefore, we require some techniques and tools that can automatically summarize the enormous amounts of text data and help us to decide whether they are useful or not. Text summarization is a process that generates a brief version of the document in the form of a meaningful summary. It can be classified into abstractive text summarization and extractive text summarization. Abstractive text summarization generates an abstract type of summary from the given document. In extractive text summarization, a summary is created from the given document that contains crucial sentences of the document. Many authors proposed various techniques for both types of text summarization. This paper presents a survey of extractive text summarization on graphicalbased techniques. Specifically, it focuses on unsupervised and supervised techniques. This paper shows the recent works and advances on them and focuses on the strength and weaknesses of surveys of previous works in tabular form. At last, it concentrates on the evaluation measure techniques of summary. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

17. 融合主题特征的文本自动摘要方法研究.

Author: 罗　芳, 汪竞航, 何道森, and 蒲秋梅
Subjects: *THESIS statements (Rhetoric), *TEXT mining, *PROBABILITY theory, *MATRICES (Mathematics), *TEXT messages
Abstract: Aiming at the traditional graph models for text summarization only focus on statistical features or shallow semantic features, and lack mining and utilization of deep topic semantic features, this paper proposed MDSR( multi-dimension summarization rank), an automatic text summarization method that combined topic feature. Specifically, this method adopted the LDA model to mine the semantic information of text topics and measured the impact of topic feature on a sentence by defining the importance of the topic. And it improved the construction mode of the probability transition matrix of graph model nodes by combining the topic feature with statistic features and inter-sentence similarity. Finally, it extracted and measured summarization according to the weight of sentence nodes. The results show that the ROUGE value evaluates by MDSR reaches the best when the weight ratio of topic feature, statistic feature and inter-sentence similarity is 3 : 4 : 3. The ROUGE-1, ROUGE-2, ROUGE-SU4 are 53. 35%,35. 18% and 33. 86%, which perform better than other comparisons. It shows that the text summarization method combining topic feature can effectively improve the accuracy of the summarization extraction. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

18. A Light-Weight Text Summarization System for Fast Access to Medical Evidence

Author: Abeed Sarker, Yuan-Chi Yang, Mohammed Ali Al-Garadi, and Aamir Abbas
Subjects: medical text processing, text summarization, text mining, natural language processing, health informatics, extractive summarization, Medicine, Public aspects of medicine, RA1-1270, Electronic computers. Computer science, QA75.5-76.95
Abstract: As the volume of published medical research continues to grow rapidly, staying up-to-date with the best-available research evidence regarding specific topics is becoming an increasingly challenging problem for medical experts and researchers. The current COVID19 pandemic is a good example of a topic on which research evidence is rapidly evolving. Automatic query-focused text summarization approaches may help researchers to swiftly review research evidence by presenting salient and query-relevant information from newly-published articles in a condensed manner. Typical medical text summarization approaches require domain knowledge, and the performances of such systems rely on resource-heavy medical domain-specific knowledge sources and pre-processing methods (e.g., text classification) for deriving semantic information. Consequently, these systems are often difficult to speedily customize, extend, or deploy in low-resource settings, and they are often operationally slow. In this paper, we propose a fast and simple extractive summarization approach that can be easily deployed and run, and may thus aid medical experts and researchers obtain fast access to the latest research evidence. At runtime, our system utilizes similarity measurements derived from pre-trained medical domain-specific word embeddings in addition to simple features, rather than computationally-expensive pre-processing and resource-heavy knowledge bases. Automatic evaluation using ROUGE—a summary evaluation tool—on a public dataset for evidence-based medicine shows that our system's performance, despite the simple implementation, is statistically comparable with the state-of-the-art. Extrinsic manual evaluation based on recently-released COVID19 articles demonstrates that the summarizer performance is close to human agreement, which is generally low, for extractive summarization.
Published: 2020
Full Text: View/download PDF

19. Extractive summarization of clinical trial descriptions.

Author: Gulden, Christian, Kirchner, Melanie, Schüttler, Christina, Hinderer, Marc, Kampf, Marvin, Prokosch, Hans-Ulrich, and Toddenroth, Dennis
Abstract: Purpose: Text summarization of clinical trial descriptions has the potential to reduce the time required to familiarize oneself with the subject of studies by condensing long-form detailed descriptions to concise, meaning-preserving synopses. This work describes the process and quality of automatically generated summaries of clinical trial descriptions using extractive text summarization methods.Methods: We generated a novel dataset from the detailed descriptions and brief summaries of trials registered on clinicaltrials.gov. We executed several text summarization algorithms on the detailed descriptions in this corpus and calculated the standard ROUGE metrics using the brief summaries included in the record as a reference. To investigate the correlation of these metrics with human sentiments, four reviewers assessed the content-completeness of the generated summaries and the helpfulness of both the generated and reference summaries via a Likert scale questionnaire.Results: The filtering stages of the dataset generation process reduce the 277,228 trials registered on clinicaltrials.gov to 101,016 records usable for the summarization task. On average, the summaries in this corpus are 25% the length of the detailed descriptions. Of the evaluated text summarization methods, the TextRank algorithm exhibits the overall best performance with a ROUGE-1 F1 score of 0.3531, ROUGE-2 F1 score of 0.1723, and ROUGE-L F1 score of 0.3003. These scores correlate with the assessment of the helpfulness and content similarity by the human reviewers. Inter-rater agreement for the helpfulness and content similarity was slight and fair respectively (Fleiss' kappa of 0.12 and 0.22).Conclusions: Extractive summarization is a viable tool for generating meaning-preserving synopses of detailed clinical trial descriptions. Further, the human evaluation has shown that the ROUGE-L F1 score is useful for rating the general quality of generated summaries of clinical trial descriptions in an automated way. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

20. A survey of multiple types of text summarization with their satellite contents based on swarm intelligence optimization algorithms.

Author: Mosa, Mohamed Atef, Anwar, Arshad Syed, and Hamouda, Alaa
Subjects: *SWARM intelligence, *CELLULAR automata, *REAL-time control, *SOCIAL network theory, *MACHINE learning
Abstract: Abstract Due to the tremendous increment of data on the web, extracting the most important data as a conceptual brief would be valuable for certain users. Therefore, there is a massive enthusiasm concerning the generation of automatic text summary frameworks to constitute abstracts automatically from the text, web, and social network messages associated with their satellite content. This survey highlights, for the first time, how the swarm intelligence (SI) optimization techniques are performed to solve the text summarization task efficiently. Additionally, a convincing justification of why SI, especially Ant Colony Optimization (ACO), has been presented. Unfortunately, three types of text summarization tasks using SI indicate bit utilizing in the literature when contrasted with the other summarization techniques as machine learning and genetic algorithms, in spite of the fact that there are seriously promising outcomes of the SI methods. On the other hand, it has been noticed that the summarization task with multiple types has not been formalized as a multi-objective optimization (MOO) task before, despite that there are many objectives which can be considered. Moreover, the SI was not employed before to support the real-time summary approaches. Thus, a new model has been proposed to be adequate for achieving many objectives and to satisfy the real-time needs. Eventually, this study will enthuse researchers to further consider the various types of SI when solving the summarization tasks, particularly, in the short text summarization (STS) field. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

21. Open information extraction as an intermediate semantic structure for Persian text summarization.

Author: Rahat, Mahmoud and Talebpour, Alireza
Subjects: *SEMANTICS, *COMPUTATIONAL linguistics, *ARTIFICIAL intelligence, *TEXT mining, *DIGITAL libraries
Abstract: Semantic applications typically exploit structures such as dependency parse trees, phrase-chunking, semantic role labeling or open information extraction. In this paper, we introduce a novel application of Open IE as an intermediate layer for text summarization. Text summarization is an important method for providing relevant information in large digital libraries. Open IE is referred to the process of extracting machine-understandable structural propositions from text. We use these propositions as a building block to shorten the sentence and generate a summary of the text. The proposed system offers a new form of summarization that is able to break the structure of the sentence and extract the most significant sub-sentential elements. Other advantages include the ability to identify and eliminate less important sections of the sentence (such as adverbs, adjectives, appositions or dependent clauses), or duplicate pieces of sentences which in turn opens up the space for entering more sentences in the summary to enhance the coverage and coherency of it. The proposed system is localized for Persian language; however, it can be adopted to other languages. Experiments performed on a standard data set “Pasokh” with a standard comparison tool showed promising results for the proposed approach. We used summaries produced by the system in a real-world application in the virtual library of Shahid Beheshti University and received good feedbacks from users. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

22. Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms

Author: M. S. Bewoor and S. H. Patil
Subjects: Text mining, text summarization, clustering, Engineering (General). Civil engineering (General), TA1-2040, Technology (General), T1-995, Information technology, T58.5-58.64
Abstract: The availability of various digital sources has created a demand for text mining mechanisms. Effective summary generation mechanisms are needed in order to utilize relevant information from often overwhelming digital data sources. In this view, this paper conducts a survey of various single as well as multi-document text summarization techniques. It also provides analysis of treating a query sentence as a common one, segmented from documents for text summarization. Experimental results show the degree of effectiveness in text summarization over different clustering algorithms.
Published: 2018

23. ASHUR: EVALUATION OF THE RELATION SUMMARY-CONTENT WITHOUT HUMAN REFERENCE USING ROUGE.

Author: RAMÍREZ-NORIEGA, Alan, JUÁREZ-RAMÍREZ, reyes, JIMÉNEZ, Samantha, INZUNZA, Sergio, and MARTÍNEZ-RAMÍREZ, Yobani
Subjects: TEXT mining, SENTENCES (Grammar), INTELLIGENT tutoring systems, LATENT semantic analysis, REGRESSION analysis
Abstract: In written documents, the summary is a brief description of important aspects of a text. The degree of similarity between the summary and the content of a document provides reliability about the summary. Some efforts have been done in order to automate the evaluation of a summary. ROUGE metrics can automatically evaluate a summary, but it needs a model summary built by humans. The goal of this study is to find a quantitative relation between an article content and its summary using ROUGE tests without a model summary built by humans. This work proposes a method for automatic text summarization to evaluate a summary (ASHuR) based on extraction of sentences. ASHuR extracts the best sentences of an article based on the frequency of concepts, cue-words, title words, and sentence length. Extracted sentences constitute the essence of the article; these sentences construct the model summary. We performed two experiments to assess the reliability of ASHuR. The first experiment compared ASHuR against similar approaches based on sentences extraction; the experiment placed ASHuR in the first place in each applied test. The second experiment compared ASHuR against human-made summaries, which yielded a Pearson correlation value of 0.86. Assessments made to ASHuR show reliability to evaluate summaries written by users in collaborative sites (e.g. Wikipedia) or to review texts generated by students in online learning systems (e.g. Moodle). [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

24. Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms.

Author: Bewoor, Mrunal S. and Patil, Suhas H.
Subjects: TEXT mining, INFORMATION retrieval, DOCUMENT clustering, ALGORITHMS, ELECTRONIC information resources
Abstract: The availability of various digital sources has created a demand for text mining mechanisms. Effective summary generation mechanisms are needed in order to utilize relevant information from often overwhelming digital data sources. In this view, this paper conducts a survey of various single as well as multi-document text summarization techniques. It also provides analysis of treating a query sentence as a common one, segmented from documents for text summarization. Experimental results show the degree of effectiveness in text summarization over different clustering algorithms. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

25. Feature network-driven quadrant mapping for summarizing customer reviews.

Author: Cho, Su and Kim, Seoung
Abstract: With the rapid growth of e-commerce, customers increasingly write online reviews of the product they purchase. These customer reviews are one of the most valuable sources of information affecting selection of products or services. Summarizing these customer reviews is becoming an interesting area of research, inspiring researchers to develop a more condensed, concise summarization for users. However, most of the current efforts at summarization are based on general product features without feature's relationship. As a result, these summaries either ignore feedback from customers or do a poor job of reflecting the opinions expressed in customer reviews. To remedy this summarization shortcoming, we propose a feature network-driven quadrant mapping that captures and incorporates opinions from customer reviews. Our focus is on construction of a feature network, which is based on co-occurrence and sematic similarities, and a quadrant display showing the opinions polarity of feature groups. Moreover, the proposed approach involves clustering similar product features, and thus, it is different from standard text summarization based on abstraction and extraction. The summarized results can help customers better understand the overall opinions about a product. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

26. Building a text retrieval system for the Sanskrit language: Exploring indexing, stemming, and searching issues.

Author: Sahu, Siba Sankar and Pal, Sukomal
Subjects: *SANSKRIT language, *STEMMING (Linguistics), *TEXT mining, *INFORMATION retrieval, *TEXT summarization
Abstract: Stemming is an important pre-processing step in the text analysis domains such as text mining, text summarization and information retrieval (IR). In this study, we build a Sanskrit text collection and explore different indexing, stemming and searching strategies in Sanskrit. We also propose two stemmers: a 'light' and an 'aggressive' and evaluate their effectiveness in the text analysis task. The performance of the stemmers is evaluated in two ways: a direct and an indirect IR-based evaluation. In direct evaluation, we found that the stemmers are effective. In indirect evaluation, we apply different retrieval models such as BM25, TF–IDF, Divergence from Randomness (DFR) based and language models. The proposed stemmers are compared with GRAS stemmer, language-independent indexing (trunc-n) and no stemming approach. Among different stemming methods, aggressive stemmers provide the best performance. Hiemstra language model outperforms other retrieval models we experimented with. In statistical analysis, we found that the proposed stemming approaches produce significantly better results than the no-stemming approach. • We build a Sanskrit text collection for the text analysis domain. • We proposed an inflectional and derivational stemmer in Sanskrit. • The performance of different stemmer is evaluated in the text analysis domain. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

27. Recent automatic text summarization techniques: a survey.

Author: Gambhir, Mahak and Gupta, Vishal
Subjects: INTERNET, ABSTRACTING, NATURAL language processing, MATHEMATICAL optimization, TEXT mining
Abstract: As information is available in abundance for every topic on internet, condensing the important information in the form of summary would benefit a number of users. Hence, there is growing interest among the research community for developing new approaches to automatically summarize the text. Automatic text summarization system generates a summary, i.e. short length text that includes all the important information of the document. Since the advent of text summarization in 1950s, researchers have been trying to improve techniques for generating summaries so that machine generated summary matches with the human made summary. Summary can be generated through extractive as well as abstractive methods. Abstractive methods are highly complex as they need extensive natural language processing. Therefore, research community is focusing more on extractive summaries, trying to achieve more coherent and meaningful summaries. During a decade, several extractive approaches have been developed for automatic summary generation that implements a number of machine learning and optimization techniques. This paper presents a comprehensive survey of recent text summarization extractive approaches developed in the last decade. Their needs are identified and their advantages and disadvantages are listed in a comparative manner. A few abstractive and multilingual text summarization approaches are also covered. Summary evaluation is another challenging issue in this research field. Therefore, intrinsic as well as extrinsic both the methods of summary evaluation are described in detail along with text summarization evaluation conferences and workshops. Furthermore, evaluation results of extractive summarization approaches are presented on some shared DUC datasets. Finally this paper concludes with the discussion of useful future directions that can help researchers to identify areas where further research is needed. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

28. Conducting sparse feature selection on arbitrarily long phrases in text corpora with a focus on interpretability.

Author: Miratrix, Luke and Ackerman, Robin
Subjects: *DIRICHLET problem, *TEXT mining, *CARBON monoxide, *REGRESSION analysis
Abstract: We propose a general framework for topic-specific summarization of large text corpora, and illustrate how it can be used for analysis in two quite different contexts: an Occupational Safety and Health Administration (OSHA) database of fatality and catastrophe reports (to facilitate surveillance for patterns in circumstances leading to injury or death), and legal decisions on workers' compensation claims (to explore relevant case law). Our summarization framework, built on sparse classification methods, is a compromise between simple word frequency-based methods currently in wide use, and more heavyweight, model-intensive methods such as latent Dirichlet allocation (LDA). For a particular topic of interest (e.g., mental health disability, or carbon monoxide exposure), we regress a labeling of documents onto the high-dimensional counts of all the other words and phrases in the documents. The resulting small set of phrases found as predictive are then harvested as the summary. Using a branch-and-bound approach, this method can incorporate phrases of arbitrary length, which allows for potentially rich summarization. We discuss how focus on the purpose of the summaries can inform choices of tuning parameters and model constraints. We evaluate this tool by comparing the computational time and summary statistics of the resulting word lists to three other methods in the literature. We also present a new R package, textreg. Overall, we argue that sparse methods have much to offer in text analysis and is a branch of research that should be considered further in this context. © 2016 Wiley Periodicals, Inc. Statistical Analysis and Data Mining: The ASA Data Science Journal, 2016 [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

29. How to Improve Text Summarization and Classification by Mutual Cooperation on an Integrated Framework.

Author: Jeong, Hyoungil, Ko, Youngjoong, and Seo, Jungyun
Subjects: *SMARTPHONES, *BIG data, *TEXT mining, *TABLET computers, *TEXT files
Abstract: Text summarization and classification are core techniques to analyze a huge amount of text data in the big data environment. Moreover, as the need to read texts on smart phones, tablets and television as well as personal computers continues to grow, text summarization and classification techniques become more important and both of them do essential processes for text analysis in many applications. Traditional text summarization and classification techniques have individually been considered as different research fields in this literature. However, we find out that they can help each other as text summarization makes use of category information from text classification and text classification does summary information from text summarization. Therefore, we propose an effective integrated learning framework using both of summary and category information in this paper. In this framework, the feature-weighting method for text summarization utilizes a language model to combine feature distributions in each category and text, and one for text classification does the sentence importance scores estimated from the text summarization. In the experiments, the performances of the integrated framework are better than ones of individual text summarization and classification. In addition, the framework has some advantages of easy implementation and language independence because it is based on only simple statistical approaches and POS tagger. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

30. Improving Unstructured Text Summarization Using An Ensemble Approach.

Author: Elfayoumy, Sherif and Thoppil, Jenny
Subjects: *TEXT mining, *DATA analytics, *DATA mining, *BIG data, *ELECTRONIC data processing, *ONLINE information services
Abstract: Due to the explosive amounts of text data being created and organizations increased desire to leverage their data corpora, especially with the availability of Big Data platforms, there is not usually enough time to read and understand each document and make decisions based on document contents. Hence, there is a great demand for summarizing text documents to provide a precise substitute for the original documents. In this article we present an ensemble approach that combines several of the well-researched text summarization techniques to produce better document summaries than individual techniques. An experiment that uses the ensemble approach was designed and results were evaluated. For the purpose of the experiment the ensemble combined the cosine similarity, enhanced latent semantic analysis using SVD, and maximal marginal relevance measure algorithms. The ensemble was applied on two datasets and the results were found to be promising when compared to the manual summaries developed by human evaluators. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

31. MCMR: Maximum coverage and minimum redundant text summarization model

Author: Alguliev, Rasim M., Aliguliyev, Ramiz M., Hajirahimova, Makrufa S., and Mehdiyev, Chingiz A.
Subjects: *SUPERVISED learning, *TEXT mining, *LINEAR programming, *INTEGER programming, *EXPERT systems, *INFORMATION technology, *PARTICLE swarm optimization
Abstract: Abstract: In paper, we propose an unsupervised text summarization model which generates a summary by extracting salient sentences in given document(s). In particular, we model text summarization as an integer linear programming problem. One of the advantages of this model is that it can directly discover key sentences in the given document(s) and cover the main content of the original document(s). This model also guarantees that in the summary can not be multiple sentences that convey the same information. The proposed model is quite general and can also be used for single- and multi-document summarization. We implemented our model on multi-document summarization task. Experimental results on DUC2005 and DUC2007 datasets showed that our proposed approach outperforms the baseline systems. [Copyright &y& Elsevier]
Published: 2011
Full Text: View/download PDF

32. Frontiers of biomedical text mining: current progress.

Author: Zweigenbaum, Pierre, Demner-Fushman, Dina, Hong Yu, and Cohen, Kevin B.
Subjects: *DATA mining, *TEXT mining, *INFORMATION retrieval, *INFORMATION resources, *MEDICAL sciences, *MOLECULAR genetics, *SEARCH engines, *INFORMATION science
Abstract: It is now almost 15 years since the publication of the first paper on text mining in the genomics domain, and decades since the first paper on text mining in the medical domain. Enormous progress has been made in the areas of information retrieval, evaluation methodologies and resource construction. Some problems, such as abbreviation- handling, can essentially be considered solved problems, and others, such as identification of gene mentions in text, seem likely to be solved soon. However, a number of problems at the frontiers of biomedical text mining continue to present interesting challenges and opportunities for great improvements and interesting research. In this article we review the current state of the art in biomedical text mining or ‘BioNLP’ in general, focusing primarily on papers published within the past year. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

33. Using a Text Mining Tool to Support Text Summarization.

Author: Reategui, Eliseo, Klemann, Miriam, and Finco, Mateus David
Abstract: This paper presents a mining tool that is able to extract graphs from texts, and proposes their use in helping students to write summaries. The text summarization method is based on the use of the graphs as graphic organizers, leading students to further reflect about the main ideas of the text before getting to the actual task of writing. An experiment carried out demonstrated that the tool helped students reflect about the main ideas of the text and supported the writing of the summaries. [ABSTRACT FROM PUBLISHER]
Published: 2012
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

33 results on '"TEXT summarization"'

1. A New Algorithm for Arabic Document Clustering Utilizing Maximal Wordsets.

2. Bidirectional recommendation in HR analytics through text summarization.

3. From Text to Action: NLP Techniques for Washing Machine Manual Processing.

4. Hybrid Technique of Topic Modelling and Text Summarization: A Case Study on Predicting Trends in Green Computing.

5. LemmaQuest Lemmatizer: A Morphological Analyzer Handling Nominalization.

6. Combination of Graph-based Approach and Sequential Pattern Mining for Extractive Text Summarization with Indonesian Language.

7. A Comparative Study and Analysis of Text Summarization Methods

8. Graph-Based Extractive Text Summarization Sentence Scoring Scheme for Big Data Applications.

9. A Taxonomy of Text Mining

10. Shielding against online harm: A survey on text analysis to prevent cyberbullying.

11. An Improved Method for Extractive Based Opinion Summarization Using Opinion Mining.

12. An Efficient Machine Learning-based Text Summarization in the Malayalam Language.

13. A Novel Approach for Semantic Extractive Text Summarization.

14. A comprehensive survey for automatic text summarization: Techniques, approaches and perspectives.

15. POCASUM: policy categorizer and summarizer based on text mining and machine learning.

16. Extractive Text Summarization Using Recent Approaches: A Survey.

17. 融合主题特征的文本自动摘要方法研究.

18. A Light-Weight Text Summarization System for Fast Access to Medical Evidence

19. Extractive summarization of clinical trial descriptions.

20. A survey of multiple types of text summarization with their satellite contents based on swarm intelligence optimization algorithms.

21. Open information extraction as an intermediate semantic structure for Persian text summarization.

22. Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms

23. ASHUR: EVALUATION OF THE RELATION SUMMARY-CONTENT WITHOUT HUMAN REFERENCE USING ROUGE.

24. Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms.

25. Feature network-driven quadrant mapping for summarizing customer reviews.

26. Building a text retrieval system for the Sanskrit language: Exploring indexing, stemming, and searching issues.

27. Recent automatic text summarization techniques: a survey.

28. Conducting sparse feature selection on arbitrarily long phrases in text corpora with a focus on interpretability.

29. How to Improve Text Summarization and Classification by Mutual Cooperation on an Integrated Framework.

30. Improving Unstructured Text Summarization Using An Ensemble Approach.

31. MCMR: Maximum coverage and minimum redundant text summarization model

32. Frontiers of biomedical text mining: current progress.

33. Using a Text Mining Tool to Support Text Summarization.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

33 results on '"TEXT summarization"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources