65 results on '"TEXT summarization"'
Search Results
2. ارتقای کاربرد پذیری وبگاه های کتابخانه ای...
- Author
-
اعظم نجفقلی نژاد
- Subjects
PROTOCOL analysis (Cognition) ,TEXT summarization ,LIBRARY administration ,ARTIFICIAL intelligence ,KEYWORD searching - Abstract
The purpose of this research is to deal with some artificial intelligence solutions to help improving the usability of library websites for visually impaired users. The current research is a qualitative one conducted with interview and think aloud protocol. In this research, through interview with visually impaired users, the best and worst features of library websites were questioned. Users were observed while using a library website and performing daily tasks while verbally expressing their thoughts, feelings, and opinions about their interaction experience. The interview was conducted individually and online or at a specific location. Four large library websites of the country (the National Library and Archives of Iran; Organization of Libraries, Museums and Documents Center of Astan Quds Razavi; Library, Museum and Document Center of Islamic Consultative Assembly, and the Central Library and Document Center of Tehran University) were selected to perform the tasks. The research community was visually impaired users who were selected using the targeted sampling method in the number of 33 people. The analysis of the users’ conversations was done based on qualitative content analysis, and the MaxQDA software was used and placed in 90 final codes and 3 general categories and 8 subcategories. These codes were continuously expanded and revised while reviewing the transcripts. Another researcher participated in the content analysis and reviewed the transcripts and extracted categories, and the data were evaluated several times. The best features of the websites from the users’ point of view were: standard and accessibility of the elements on the pages, valuable content and honesty in introducing and presenting the content, logical segmentation and headings and proper organization of page elements, allocation of Alt tag for graphics, optimal organization of results search, etc. The worst features of the websites were: image and unknown security codes (captchas), lack of automatic correction of keywords spelling mistakes, problems of online conversations and unknown sending and receiving messages, dynamic content, lack of adjustment of elements on the page with the keyboard, not having labels for graphics and user input elements, and nonprincipled page design, etc. Automatic correction, intelligent voice assistants, result clustering, intelligent filtering, intelligent question and answer, text and image processing/ image description, text summarization, semantic search and natural language and user interface personalization are some of the ways to improve the usability of library websites. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Deep Learning based Named Entity Recognition for the Bodo Language.
- Author
-
Narzary, Sanjib, Brahma, Anjali, Nandi, Sukumar, and Som, Bidisha
- Subjects
DEEP learning ,NATURAL language processing ,TEXT summarization ,MACHINE translating ,DATA mining ,DATA augmentation - Abstract
One of the important application of natural language processing (NLP) is Name Entity Recognition (NER). It automatically recognise and categorise named entities in a document. Named Entities can be the name of an individual, group, place, etc. It is crucial to the success of NER applications including text summarization, machine translation and information extraction and retrieval. It is one of the most useful application tools for a variety of topics and languages. Despite its widespread use and effectiveness in English, this field is currently under investigation for other Indian languages, such as Bodo. Due to the lack of resources and a high-quality dataset, NER in Bodo is a difficult task. In this research, a deep learning-based NER tagger is investigated for the Bodo language and NER tagged dataset is generated for Bodo language using Docanno and enlarge the dataset size by employing a data augmentation technique. As there is no Bodo NER baseline model to compared with, we employed several deep learning techniques for Bodo NER System and compared their results. We achieved an accuracy of 99.62%, precision of 99.75%, recall of 98.74% and F-score of 99.35% when employed with LSTM and character based. This study also highlights GRU and CNN based models performance in Bodo NER task. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. An Efficient Summarisation and Search Tool for Research Articles.
- Author
-
Garg, Shruti, Anand, Pushkar, Chanda, Parnab Kumar, and Payyavula, Srinivasa Rao
- Subjects
AUTOMATIC summarization ,TEXT summarization ,WEB development ,SEARCH engines ,USER experience ,NATURAL language processing ,EXPERTISE ,DATABASE management - Abstract
Building an efficient summarization and search tool for research articles is a complex task that involves interdisciplinary expertise in NLP, database management, web development, and user experience design. With the rapid growth of the scientific content, manually reading and selecting important content of research articles became challenging. Thus, there is a need for a summarization tool to help scholars reading their content fast along with a search tool that will find important contents and keep them in organized way. To summarize the contents of different articles a summarization tool is proposed in this work that generates extractive and abstractive summaries. Along with summarization a search engine also been proposed in this work that save the searched results in a comma-separated value (CSV) format including the search queries and meta information of articles such as keyword, title, author name, URL, year of publication, abstracts and summaries. These CSVs help users to get idea about article contents in offline mode without reading or searching the whole text. The efficiency of the summarizer tool is evaluated in terms of precision(pr), recall(re) and F-measure(F-m) of Rouge-1(R1), Rouge-2(R2), Rouge_sum(R_sum) and Bertscore(BS) measures for ten research articles. The average pr, re and F-m obtained from BS are 42%, 42% and 42% for extractive summarization and 41%, 41% and 41% for abstractive summarization. This tool will be helpful to research scholars in the collection of literature and the preparation of related work for their research. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. From Text to Action: NLP Techniques for Washing Machine Manual Processing.
- Author
-
Biju, Vinai George, Babu, Bibin, Asghar, Ali, Prathap, Boppuru Rudra, and Reddy, Vandana
- Subjects
WASHING machines ,TEXT summarization ,LANGUAGE models ,QUESTION answering systems ,NATURAL language processing ,TEXT mining - Abstract
This scientific research study focuses on the advancements in Natural Language Processing (NLP) driven by large-scale parallel corpora and presents a comprehensive methodology for creating a parallel, multilingual corpus using NLP techniques and semantic technologies, with a particular focus on washing machine manuals. The study highlights the significant progress made in NLP through the utilization of large-scale parallel corpora and advanced NLP techniques. The successful creation of a parallel, multilingual corpus for washing machine manuals, coupled with the integration of semantic technologies and ontology modeling, demonstrates the broad applicability and potential of NLP in diverse domains.The research covers various aspects, including text extraction, segmentation, and the development of specialized pipelines for question-answering, translation, and text summarization tailored for washing machine manuals. Translation experiments using fine-tuned models demonstrated the feasibility of providing washing machine manuals in local languages, expanding accessibility and understanding for users worldwide. Additionally, the study explored text summarization using a powerful transformer-based model, which exhibited remarkable proficiency in generating concise and coherent summaries from complex input texts. The implementation of a question-answering pipeline showcased the effectiveness of various language models in handling question-answering tasks with high accuracy and effectiveness.Additionally, the article discusses the processes of data collection, information preparation, ontology creation, alignment strategies, and text analytics. Furthermore, the study addresses the challenges and potential future developments in this field, offering insights into the promising applications of NLP in the context of washing machine manuals. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Sentence Selection for Extractive Text Summarization using TOPSIS Approach.
- Author
-
Pati, Siba Prasad and Rautray, Rasmita
- Subjects
TEXT summarization ,AUTOMATIC summarization ,PARTICLE swarm optimization ,ANT algorithms ,TOPSIS method - Abstract
In the present era, it is challenging for people to extract crucial information due to the ongoing evolution of textual data on the internet as well as in online resources. Thus, only a procedure known as Text Summarization (TS) can be used to acquire the required information. Therefore, the most common and practical method of summary is known as extractive text summarization, which involves selecting the most pertinent sentences from the original text material. In a relatively short period of time, it can gather the most priceless information. In order to extract the phrases that serve as a document's summary, this study applies scoring-based optimisation models employing Ant Colony Optimisation (ACO), Butterfly Optimisation (BO), and Particle Swarm Optimisation (PSO). The model is simulated on the DUC 2006 dataset, and the TOPSIS approach is used to validate the results for various DUC documents. The PSO-based summarizer, however, performs noticeably better than the other two summarizers based on rank. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Attention-based Transformer for Assamese Abstractive Text Summarization.
- Author
-
Goutom, Pritom Jyoti, Baruah, Nomi, and Sonowal, Paramananda
- Subjects
TEXT summarization ,NATURAL language processing ,AUTOMATIC summarization ,TRANSFORMER models - Abstract
The difficulty of accurately summarising Assamese text content is a significant barrier in natural language processing (NLP). Manually summarising lengthy Assamese texts is time-consuming and labor-intensive. As a result, automatic text summarization has developed as a critical NLP study topic. In this study, we integrate the Transformer and Self-Attention approaches to develop an abstract text summarization model. This Transformer-based technique uses self-attention approaches to successfully manage co-reference concerns in Assamese text, enhancing overall system understanding. This proposed approach improves the efficiency of text summarization greatly. We exhaustively evaluated the model using the Assamese dataset (AD-50), which contains human-produced summaries, to assess its performance. When compared to current state-of-the-art baseline models, our model outperformed them. On the AD-50 dataset, for example, our suggested model obtained a low training loss of 0.0022 during 20 training epochs, as well as an amazing model accuracy of 47.15 percentage. This research marks a substantial advancement in the field of Assamese abstractive text summarization, with intriguing implications for practical applications in NLP. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. DOMINER: Domain Feature Mining from Unstructured Data for Effective Text Summarization.
- Author
-
Thakkar, Hiren Kumar, Singh, Priyanka, and Kumar, Yogesh
- Subjects
TEXT summarization ,AUTOMATIC summarization ,FEATURE extraction ,SENTIMENT analysis - Abstract
In the current era of internet, a huge volume of opinionated sentiment reviews is generated on a daily basis. Normally, such reviews are enriched with deep sentiments and express the opinions on various features of a product or service under consideration. Identification of a set of important domain features opens an opportunity to build automatic product and service summarization system. It is observed that the common phenomena is to consider the frequently appearing words as the domainb features. However, not all frequently appearing words can be domain-specific features. In this paper, a novel Domain Feature Miner (DOMINER) approach is proposed for robust extractive summarization. The entire domain feature set mining problem is modelled as a clustering problem. Firstly, the bond energy based clustering technique is employed to cluster the domain features based on their frequency and co-appearance counts. Later, relevant clusters are extracted for the final set of domain feature set retrieval. The Proposed DOMINER scheme is extensively evaluated against quantitative performance metrics such as precision, recall, and F-score for six diversified domains such as Cellphone, Camera, Laptop, Tablet, Television, and Hotel. Experimental results on benchmark data sets reveal that proposed DOMINER scheme mines high quality of domain feature from unstructured reviews evident from precision, recall, and F-score values as 78.10%, 64.21%, and 70.48%, respectively against state of the art existing schemes. The high quality domain features extraction from DOMINER help generate the robust extractive summaries for product and services. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Evaluating the Impact of Text Data Augmentation on Text Classification Tasks using DistilBERT.
- Author
-
Nair, Aarathi Rajagopalan, Singh, Rimjhim Padam, Gupta, Deepa, and Kumar, Priyanka
- Subjects
DATA augmentation ,LANGUAGE models ,TEXT summarization ,TRANSLATING & interpreting ,SENTIMENT analysis ,NATURAL language processing ,MACHINE translating - Abstract
Data augmentation entails artificially expanding the dataset's size by applying various transformations to the existing raw data. Enhancing the quality and quantity of the datasets with varying sizes by employing varieddata augmentation techniques has immense importance in the field on Natural Language Processing. Several notable applications for instance text classification, sentiment analysis, text summarization, etc. have proven to be benefitted immensely with the employment of text augmentation techniques. Hence, the paper focuses on efficient text classification using varied datasets of different sizes; small- 500 instances, medium-5564 instances and large-43934 instances.The work considers the standard DistilBERT model, a popular transformer-based language model and presents the impact on the performance of the modelafter employing different text augmentation techniques. The study specifically focuses on three augmentation methods: (a) Synonym augmentation:that involves replacing words with their synonyms to enhance vocabulary diversity and generalization, (b) Contextual word embeddings that enriches semantic understanding by leveraging pre-trained language models, and (c) Black translation that entails translating the text into a another different language and then translating it back, introducing variations in the data and capturing different linguistic patterns.Additionally,the work also discusses the combined effect of employing all three augmentation techniques simultaneously. Moreover, the study also aims compares the relation between the dataset sizes and the performance of the augmentation techniques. The study considers three standard datasets for the analysis and presents a comprehensive analysis using accuracy and F1 score as evaluation metrics. The results highlight the efficacy of each technique across small, medium, and large datasets, enabling a nuanced understanding of their benefits in different data scenarios. The findings indicate the varying degrees of improvement achieved through each augmentation technique.The enhancement achieved by applying text augmentation varied from around 2% on large datasets to 20% on smaller datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. A Brief Survey on Safety of Large Language Models.
- Author
-
Zhengjie Gao, Xuanzi Liu, Yuanshuai Lan, and Zheng Yang
- Subjects
LANGUAGE models ,TEXT summarization ,NATURAL language processing ,CHATBOTS ,INDUSTRIAL safety - Abstract
Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) and have been widely adopted in various applications such as ma)chine translation, chatbots, text summarization, and so on. However, the use of LLMs has raised concerns about their potential safety and security risks. In this survey, we explore the safety implications of LLMs, including ethical considerations, hallucination, and prompt injection. We also discuss current research efforts to mitigate these risks and identify areas for future research. Our survey provides a comprehen)sive overview of the safety concerns related to LLMs, which can help researchers and practitioners in the NLP community develop more safe and ethical appli)cations of LLMs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Hybrid Technique of Topic Modelling and Text Summarization: A Case Study on Predicting Trends in Green Computing.
- Author
-
Pandey, Mansi, Sharma, Chetan, Sharma, Shamneesh, and Aggarwal, Trapty
- Subjects
TEXT summarization ,AUTOMATIC summarization ,NATURAL language processing ,TEXT mining ,LATENT semantic analysis - Abstract
Text mining techniques are used for trend prediction using keyword analysis; however, these processes result in the formation of relevant and irrelevant keywords. Based on the keywords, various clusters are formed, resulting in various topics. Due to the presence of irrelevant keywords, there are chances of the formation of wrong topics. To overcome this problem, this research contributes to developing an algorithm that deals with topic prediction using a noble technique wherein text summarization is inculcated into topic modeling algorithms. This research focuses on implementing text summarizing to generate summaries of published publications by diverse researchers using the Genism library with an extractive text summarization approach and then applying text mining to it to predict trends in various fields. The current approach was compared with existing techniques based on the parameters used in automatic and semi-automatic text mining techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Analysis of Abstractive and Extractive Summarization Methods.
- Author
-
Kirmani, Mahira, Kaur, Gagandeep, and Mohd, Mudasir
- Subjects
AUTOMATIC summarization ,TEXT summarization ,NATURAL language processing - Abstract
This paper explains the existing approaches employed for (automatic) text summarization. The summarizing method is part of the natural language processing (NLP) field and is applied to the source document to produce a compact version that preserves its aggregate meaning and key concepts. On a broader scale, approaches for text-based summarization are categorized into two groups: abstractive and extractive. In abstractive summarization, the main contents of the input text are paraphrased, possibly using vocabulary that is not present in the source document, while in extractive summarization, the output summary is a subset of the input text and is generated by using the sentence ranking technique. In this paper, the main ideas behind the existing methods used for abstractive and extractive summarization are discussed broadly. A comparative study of these methods is also highlighted. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Revolutionizing Text Summarization: A Breakthrough in Content Compression.
- Author
-
Mishra, Nidhi, Khan, Farhan, and Mishra, Amit
- Subjects
TEXT summarization ,NATURAL language processing ,MACHINE learning - Abstract
In the current digital epoch, the vast expanse of information has revolutionized the accessibility of knowledge and perspectives. Nevertheless, this information abundance has introduced challenges in navigating and comprehending the deluge of textual data. The surge in online news articles, research papers, reports, and diverse document genres has accentuated the necessity for proficient document summarization techniques. Traditional manual methods of summarization are time-intensive and influenced by subjective biases. In contrast, the synergy between Natural Language Processing (NLP) and machine learning has unlocked the potential for automated document summarization, promising efficient information consumption and informed decision-making. This research paper delves into the convergence of these factors. It is driven by the Longformer model's distinctive capability to manage extensive texts while retaining contextual coherence--a potential solution to the hurdle of large document summarization. By capitalizing on the Longformer's architecture, this study endeavors to exploit its prowess in generating cohesive summaries from lengthy source documents, thereby amplifying the accessibility of intricate information. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. A parallel optimization and transfer learning approach for summarization in electrical power systems.
- Author
-
Priya, V., Praveena, V., and Sujithra, L. R.
- Subjects
ELECTRIC power ,LANGUAGE models ,TEXT summarization ,PARALLEL processing ,ELECTRONIC data processing ,NATURAL language processing - Abstract
Transfer learning approaches in natural language processing have been explored and evolved as a potential solution for solving many problems in recent days. The current research on aspect-based summarization shows unsatisfactory accuracy and low-quality generated summaries. Additionally, the potential advantages of combining language models with parallel processing have not been explored in the existing literature. This paper aims to address the problem of aspect-based extractive text summarization using a transfer learning approach and an optimization method based on map reduce. The proposed approach utilizes transfer learning with language models to extract significant aspects from the text. Subsequently, an optimization process using map reduce is employed. This optimization framework includes an in-node mapper and reducer algorithm to generate summaries for important aspects identified by the language model. This enhances the quality of the summary, leading to improved accuracy, particularly when applied to electrical power system documents. By leveraging the strengths of natural language models and parallel data processing techniques, this model presents an opportunity to achieve better text summary generation. The performance metric used is accuracy, measured with the ROUGE tool, incorporating precision, recall and f-measure. The proposed model demonstrates a 6% improvement in scores compared to state-of-the-art techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
15. YouTube Video Summarizer using NLP: A Review.
- Author
-
Singh, Yogendra, Kumar, Rishu, Kabdal, Soumya, and Upadhyay, Prashant
- Subjects
TEXT summarization ,VIDEO summarization ,NATURAL language processing ,DIGITAL technology ,SENTIMENT analysis ,STREAMING video & television - Abstract
This review paper delves into the emerging realm of YouTube video summarization utilizing Natural Language Processing (NLP) techniques, a critical area of research with increasing prominence in our multimedia-rich digital age. The paper commences with a broad overview of the field, elaborating on the need for automated video summarization tools to navigate and condense the massive, ever-growing sea of YouTube content. Further, we systematically scrutinize the role and implementation of NLP methods in extracting meaningful textual data from videos, focusing on video transcripts, closed captions, user comments, and associated metadata. Subsequent sections dissect seminal and recent works, studying various NLP techniques such as text summarization, sentiment analysis, topic modeling, and deep learning architectures employed in this context. The paper also focuses on the various metrics used for evaluation and shows datasets generally used to assess the performance of these summarization systems. Finally, we identify current challenges and potential future directions for research in the area, acknowledging the evolving landscape of online video platforms and AI technologies. This review aims to provide researchers and practitioners with an encompassing perspective on the pivotal role of NLP in enabling more efficient, accurate, and intuitive navigation of YouTube content ultimately shaping our digital consumption experiences. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
16. Web Scraping using Natural Language Processing: Exploiting Unstructured Text for Data Extraction and Analysis.
- Author
-
Pichiyan, Vijayaragavan, Muthulingam, S, G, Sathar, Nalajala, Sunanda, Ch, Akhil, and Das, Manmath Nath
- Subjects
NATURAL language processing ,DATA extraction ,DATA analysis ,TEXT summarization ,INTERNET content - Abstract
In recent years, combining web scraping techniques with Natural Language Processing (NLP) has emerged as a powerful approach to unlock deeper insights from unstructured textual data. This research study presents a detailed exploration of web scraping using Natural Language Processing (NLP) techniques, demonstrating how these methodologies can be synergistically integrated to extract and analyze unstructured text from diverse web sources. This research study analyzes the challenges posed by unstructured data on the web and how NLP can play a pivotal role in converting this text into structured and actionable information. The first part of the paper covers an overview of web scraping methods, including rule-based parsing, XPath queries, and the use of web scraping libraries such as BeautifulSoup and Scrapy. The second part of this research work focuses on applying NLP techniques to process and analyze the extracted textual data. Further, the preprocessing steps such as tokenization, stemming, and stop word removal, are analyzed followed by more advanced techniques like Named Entity Recognition. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. Graph Ranked Clustering Based Biomedical Text Summarization Using Top k Similarity.
- Author
-
Gupta, Supriya, Sharaff, Aakanksha, and Nagwani, Naresh Kumar
- Subjects
ELECTRONIC health records ,TEXT summarization ,INFORMATION retrieval ,AUTOMATION ,DATA visualization ,MACHINE learning - Abstract
Text Summarization models facilitate biomedical clinicians and researchers in acquiring informative data from enormous domain-specific literature within less time and effort. Evaluating and selecting the most informative sentences from biomedical articles is always challenging. This study aims to develop a dual-mode biomedical text summarization model to achieve enhanced coverage and information. The research also includes checking the fitment of appropriate graph ranking techniques for improved performance of the summarization model. The input biomedical text is mapped as a graph where meaningful sentences are evaluated as the central node and the critical associations between them. The proposed framework utilizes the top k similarity technique in a combination of UMLS and a sampled probability-based clustering method which aids in unearthing relevant meanings of the biomedical domain-specific word vectors and finding the best possible associations between crucial sentences. The quality of the framework is assessed via different parameters like information retention, coverage, readability, cohesion, and ROUGE scores in clustering and non-clustering modes. The significant benefits of the suggested technique are capturing crucial biomedical information with increased coverage and reasonable memory consumption. The configurable settings of combined parameters reduce execution time, enhance memory utilization, and extract relevant information outperforming other biomedical baseline models. An improvement of 17% is achieved when the proposed model is checked against similar biomedical text summarizers. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
18. Improving Coverage and Novelty of Abstractive Text Summarization Using Transfer Learning and Divide and Conquer Approaches.
- Author
-
Alomari, Ayham, Idris, Norisma, Md Sabri, Aznul Qalid, and Alsmadi, Izzat
- Subjects
TEXT summarization - Abstract
Automatic Text Summarization (ATS) models yield outcomes with insufficient coverage of crucial details and poor degrees of novelty. The first issue resulted from the lengthy input, while the second problem resulted from the characteristics of the training dataset itself. This research employs the divide-and-conquer approach to address the first issue by breaking the lengthy input into smaller pieces to be summarized, followed by the conquest of the results in order to cover more significant details. For the second challenge, these chunks are summarized by models trained on datasets with higher novelty levels in order to produce more human-like and concise summaries with more novel words that do not appear in the input article. The results demonstrate an improvement in both coverage and novelty levels. Moreover, we defined a new metric to measure the novelty of the summary. Finally, the findings led us to conclude that the novelty levels are more significantly influenced by the training dataset itself, as in CNN/DM, than by other factors like the training model or its training objective, as in Pegasus. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
19. Philippine Court Case Summarizer using Latent Semantic Analysis.
- Author
-
Sagum, Ria Ambrocio, Clacio, Patrick Anndwin C., Cayetano, Rey Edison R., and Lobrio, Airis Dale F.
- Subjects
TEXT summarization ,LEGAL documents ,LATENT semantic analysis ,NATURAL language processing ,PYTHON programming language ,ARTIFICIAL intelligence ,COURTS - Abstract
Artificial Intelligence in the field of Law has introduced a lot of innovations to aid legal experts, and one of those are summarizing systems. Text summarization is the process of condensing text into a compact summary highlighting the important parts of the original text. As years pass by, numerous text summarization systems are being developed, and improvements keep getting made. These advancements have been helping humans in easing labor work with summarizing lengthy documents. With this, the researchers would like to propose a tool that will summarize legal documents, specifically cases obtained from the Philippine Supreme Court website. This includes extracting meanings and analyzation of words, and similarity of sentences from the original legal document obtained from the website of the Philippine Supreme Court. The tool will take the content of the original document, analyze the words, structure, and sentences according to context, and finally produce the summary. The proposed tool will be called SUMMIT, from the words "Summarize It". The method to be used to summarize the documents using the tool is called Latent Semantic Analysis (LSA). The researchers will use Python as their programming language in developing the summarizer tool. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
20. High Impact of Rough Set and KMeans Clustering Methods in Extractive Summarization of Journal Articles.
- Author
-
K., SHEENA KURIAN and MATHEW, SHEENA
- Subjects
ROUGH sets ,PERIODICAL articles ,TEXT summarization ,K-means clustering ,SULFATE pulping process - Abstract
Text Summarization is the technique of shortening a long text but having all the relevant and significant topics conveyed in the text. As the number of journal articles published every year is growing steadily, the relevance of research on journal article summarization also increases. In this work, six extractive summarization methods are implemented and compared with the results of four standard methods applied to the dataset of journal articles. Precision, Recall and F-measure of Rouge-1, Rouge-2, Rouge-L and Rouge-Lsum measures are analyzed. Eight features are used in the implementation of the sum of features method and the BernoulliRBM method. It is observed from the experiments conducted that the Rough set method and the K-Means clustering and summarization method have high rouge scores in 10 out of the 12 measures analyzed here. The recall of the generated summary by the Roughset method is further improved when the first part of the article is used as a heuristic yard in calculating the similarity score with selected sentences. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
21. Turkish abstractive text document summarization using text to text transfer transformer.
- Author
-
Ay, Betul, Ertam, Fatih, Fidan, Guven, and Aydin, Galip
- Subjects
TEXT summarization ,ATTRIBUTION of news - Abstract
Text summarization is the process of reducing text size while preserving its key points. Thanks to this process, the reading time of the text is also reduced which contributes to reaching the desired information quickly, especially in today's world where time is much more important. In addition, summarization can be used to create a solution to extract outstanding information from the text. In this study, we focus on abstract summarization, which can draw more human like conclusions from the text. A summarization study was carried out on the data set that was collected from online Turkish news sources. Rouge and Bert-score performance metrics were used to compare the performance of this study using the text to text transfer transformer (T5) method. The precision values of the Rouge-1, Rouge-2, Rouge-L and Bert-score performance metrics obtained in this study were found to be 0.6913, 0.6623, 0.7528 and 0.8718, respectively. Recall values were 0.9210, 0.8917, 0.9183 and 0.9138, respectively. F measure values were 0.7649, 0.7338, 0.8084 and 0.8913 respectively. Considering the success of the results obtained in the study, a method that can obtain successful results for Turkish text summarization is presented and the original dataset is made available to other researchers. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
22. Learning bilingual word embedding for automatic text summarization in low resource language.
- Author
-
Wijayanti, Rini, Khodra, Masayu Leylia, Surendro, Kridanto, and Widyantoro, Dwi H.
- Subjects
TEXT summarization ,LANGUAGE research ,GENERATIVE adversarial networks ,COMPUTATIONAL linguistics ,DIGITAL technology ,SPACE ,WORD problems (Mathematics) - Abstract
Studies in low-resource languages have become more challenging with the increasing volume of texts in today's digital era. Also, the lack of labeled data and text processing libraries in a language further widens the research gap between high and low-resource languages, such as English and Indonesian. This has led to the use of a transfer learning approach, which applies pre-trained models to solve similar problems, even in different languages by using bilingual or cross-lingual word embedding. Therefore, this study aims to investigate two bilingual word embedding methods, namely VecMap and BiVec, for Indonesian – English language and evaluates them for bilingual lexicon induction and text summarization tasks. The generated bilingual embedding was compared with MUSE (Multilingual Unsupervised and Supervised Embeddings) as the existing multilingual word created with the generative adversarial network method. Furthermore, the VecMap was improved by creating shared vocabulary spaces and mapping the unshared ones between languages. The result showed the embedding produced by the joint methods of BiVec, performed better in intrinsic evaluation, especially with CSLS (Cross-Domain Similarity Local Scaling) retrieval. Meanwhile, the improved VecMap outperformed the regular type by 16.6% without surpassing the BiVec evaluation score. These methods enabled model transfer between languages when applied to cross-lingual-based text summarization. Moreover, the ROUGE score outperformed classical text summarization by adding only 10% of the training dataset of the target language. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
23. TG-SMR: AText Summarization Algorithm Based on Topic and Graph Models.
- Author
-
Rakrouki, Mohamed Ali, Alharbe, Nawaf, Khayyat, Mashael, and Aljohani, Abeer
- Subjects
AUTOMATION ,GRAPH theory ,NATURAL language processing ,COMPUTER algorithms ,METHODOLOGY - Abstract
Recently, automation is considered vital in most fields since computing methods have a significant role in facilitating work such as automatic text summarization. However, most of the computing methods that are used in real systems are based on graph models, which are characterized by their simplicity and stability. Thus, this paper proposes an improved extractive text summarization algorithm based on both topic and graph models. The methodology of this work consists of two stages. First, the well-known TextRank algorithm is analyzed and its shortcomings are investigated. Then, an improved method is proposed with a new computational model of sentence weights. The experimental results were carried out on standard DUC2004 and DUC2006 datasets and compared to four text summarization methods. Finally, through experiments on the DUC2004 and DUC2006 datasets, our proposed improved graph model algorithm TG-SMR (Topic Graph-Summarizer) is compared to other text summarization systems. The experimental results prove that the proposed TG-SMR algorithm achieves higher ROUGE scores. It is foreseen that the TG-SMR algorithm will open a new horizon that concerns the performance of ROUGE evaluation indicators. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
24. A Natural Language Processing System using CWS Pipeline for Extraction of Linguistic Features.
- Author
-
Kumar, Sandeep and Solanki, Arun
- Subjects
NATURAL language processing ,FEATURE extraction ,NATURAL languages ,GRAMMAR ,READING comprehension ,ENGLISH language - Abstract
Understanding the rules of grammar and linguistic features is essential to understanding the context of a language, which helps to understand that language. Similarly, for Natural Language processing, the linguistic feature allows understanding of the language. This paper introduced how Coreference, Word-sense, and Semantic knowledge (CWS) of linguistic features work. It would improve the Natural Language Understanding (NLU) and Natural Language Processing (NLP) tasks of any NLP model and NLP applications (either existing or new). This paper proposed a CWS pipeline method to enhance the efficiency and performance of NLP applications like text summarization, information retrieval, question-answer, machine reading comprehension, etc. The proposed CWS pipeline model used a pre-trained CoNLL-2012 coreference dataset extracted from the famous Ontonotes-5.0 dataset for the English language. The model implementation is done in Python language. The performance evaluation is done using the standard CoNLL-2012 coreference dataset for the English language. The coreference marked output is evaluated against the manually tagged gold standard dataset. The proposed CWS pipeline model gives 78.98% of the average F1 score on the MUC metric, 1.78% higher than the previous models' top result. CWS pipeline model performs better than existing models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
25. An Experimental Investigation on Unsupervised Text Summarization for Customer Reviews.
- Author
-
K, Manojkumar V, Mathi, Senthilkumar, and Gao, Xiao-Zhi
- Subjects
CONSUMERS' reviews ,WEBSITES ,INFORMATION processing - Abstract
People generally turn to websites such as Yelp to find reviews for the food/restaurant before trying it first-hand. However, some reviews are so long and ambiguous that users get confused about whether the review appreciates or disparages the food. Here, websites might want to employ summarization techniques so that the crux of the study is well understood in just a sentence. The process of compressing the information/data while preserving its idea is text summarization. The present work's main objective is to empirically investigate the suitable algorithm for summarizing reviews so that users do not need to spend time reading through the entire study. The performance of the conducted investigations is analyzed with the metrics such as cosine similarity, the score of the bilingual evaluation understudy (precision), and recall-oriented understudy for gisting evaluation analysis. The experiments show that the text summarization technique, LexRank outperforms other methods with the precision and recall values of 0.586 and 0.346, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
26. Personalized Multi-document Text Summarization using Deep Learning Techniques.
- Author
-
Veningston, K, Venkateswara Rao, P V, and Ronalda, M
- Subjects
TEXT recognition ,NATURAL language processing ,DEEP learning ,BACHELOR of science degree ,INFORMATION processing - Abstract
The summary of a large piece of text or a document is a concise description of the same thing. It must retain most of the important points from the document and remove any kind of verbosity. The text summarization task is an information processing scheme that given a document or a set of documents, mines the most essential content from the source by considering the user or task at hand and presents the summary in a well-formed and brief text. Summarization can be done not just on one document, it can be done on a set of documents as well, which is called multi-document summarization. The proposed approach considers and in-corporates user preferences for the summarization task. Depending on the current user or current task, the summary could differ essentially. For an instance, a summary on 'photosynthesis', would be very different for the kid in 4th Grade vs. a student doing a bachelor's in Science. Therefore, while writing a summary, the reading history of the user is employed as a source of personalizing text summarization tasks. The proposed method is experimentally evaluated in the domain of news articles and obtained better summaries capable of extracting important concepts based on user preferences explained in the document when considering the relevant domain terms in the process of multi-document text summarization. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
27. T5LSTM-RNN based Text Summarization Model for Behavioral Biology Literature.
- Author
-
Chaurasia, Shivangi, Dasgupta, Debalay, and Regunathan, Rajeshkhannan
- Subjects
BIOLOGY ,LITERATURE - Abstract
Behavioral biology is one of the crucial and trending topics these days which needs proper attention by scholars for rapid development in this field. Therefore, the purpose of this paper is to ease the process of collecting all kinds of relevant and vital data available on the internet regarding this topic in different forms of media, such as news articles, research papers, and YouTube lecture videos, all into one place into a single document in a proper summarized form. For proper training of the LSTM model, the lengthy video and journal datasets are pre-processed using the T5 transformer model to generate a uniform training dataset. So, in this work, a comprehensive approach is proposed based on an abstractive form of text summarization using the seq2seq encoder-decoder model combined with a stacked LSTM layer with an attention mechanism and a T5 transformer model pre-processor. Therefore, a proper hybrid model, T5LSTM-RNN, is implemented to generate the summarized data. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
28. FactGen: Faithful Text Generation by Factuality-aware Pre-training and Contrastive Ranking Fine-tuning.
- Author
-
Zhibin Lan, Wei Li, Jinsong Su, Xinyan Xiao, Jiachen Liu, Wenhao Wu, and Yajuan Lyu
- Subjects
DEEP learning ,TEXT summarization ,INFORMATION retrieval ,PROBABILITY theory ,DATA analysis - Abstract
Conditional text generation is supposed to generate a fluent and coherent target text that is faithful to the source text. Although pre-trained models have achieved promising results, they still suffer from the crucial factuality problem. To deal with this issue, we propose a factuality-aware pretraining-finetuning framework named FactGen, which fully considers factuality during two training stages. Specifically, at the pre-training stage, we utilize a natural language inference model to construct target texts that are entailed by the source texts, resulting in a more factually consistent pre-training objective. Then, during the fine-tuning stage, we further introduce a contrastive ranking loss to encourage the model to generate factually consistent text with higher probability. Extensive experiments on three conditional text generation tasks demonstrate the effectiveness and generality of our training framework. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
29. Document Clustering Using Graph Based Fuzzy Association Rule Generation.
- Author
-
Perumal, P.
- Subjects
DOCUMENT clustering ,FUZZY logic ,FEATURE extraction ,ASSOCIATION rule mining ,REDUNDANCY in engineering - Abstract
With the wider growth of web-based documents, the necessity of automatic document clustering and text summarization is increased. Here, document summarization that is extracting the essential task with appropriate information, removal of unnecessary data and providing the data in a cohesive and coherent manner is determined to be a most confronting task. In this research, a novel intelligent model for document clustering is designed with graph model and Fuzzy based association rule generation (gFAR). Initially, the graph model is used to map the relationship among the data (multi-source) followed by the establishment of document clustering with the generation of association rule using the fuzzy concept. This method shows benefit in redundancy elimination by mapping the relevant document using graph model and reduces the time consumption and improves the accuracy using the association rule generation with fuzzy. This framework is provided in an interpretable way for document clustering. It iteratively reduces the error rate during relationship mapping among the data (clusters) with the assistance of weighted document content. Also, this model represents the significance of data features with class discrimination. It is also helpful in measuring the significance of the features during the data clustering process. The simulation is done with MATLAB 2016b environment and evaluated with the empirical standards like Relative Risk Patterns (RRP), ROUGE score, and Discrimination Information Measure (DMI) respectively. Here, DailyMail and DUC 2004 dataset is used to extract the empirical results. The proposed gFAR model gives better trade-off while compared with various prevailing approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
30. Supervised weight learning-based PSO framework for single document extractive summarization.
- Author
-
Singh, Sangita, Singh, Jyoti Prakash, and Deepak, Akshay
- Subjects
AUTOMATIC summarization ,TEXT summarization ,PARTICLE swarm optimization ,SUPERVISED learning ,GOAL programming ,TEXT recognition ,MATHEMATICAL optimization - Abstract
The need for automatic text summarization is natural: there is a huge volume of information available online, which prompts for a widespread interest in extracting relevant information in a concise and understandable manner. Here, automated text summarization has been treated as an extractive single-document summarization problem in the proposed system. To solve this problem, a particle swarm optimisation (PSO) algorithm-based approach is suggested, with the goal of producing good summaries in terms of content coverage, informativeness, and readability. This paper introduces XSumm-PSO: a new approach based on PSO optimization technique in a supervised manner for extractive summarization. Further, this paper also contributes a new feature "incorrect word" that captures misspelled words in the candidate sentences. This feature is combined with nine existing features used by proposed model to generate error free summaries. As a result, the proposed XSumm-PSO framework produces superior performance achieving improvements of +2.7%, +0.8%, and +0.8% for ROUGE-1, ROUGE-2, and ROUGE-L scores, respectively, on DUC 2002 dataset, over state-of-the-art techniques. The corresponding improvements on the CNN/DailyMail dataset are +0.97%, +0.25%, and +0.49%. We also performed sample t-test, showing the proposed approach is statistically consistent across various runs. • A PSO-based technique optimized in a supervised manner using ROGUE-1 is proposed. • The suggested model solves a single-document extractive text summarization task. • A new feature "incorrect word" is also introduced in this work. • We evaluate our proposed model on DUC-2002 and CNN/DailyMail benchmark datasets. • The suggested model generalizes better and produces better accuracy than SOTA. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. FuzzyTP-BERT: Enhancing extractive text summarization with fuzzy topic modeling and transformer networks.
- Author
-
Onan, Aytuğ and Alhumyani, Hesham A.
- Subjects
LANGUAGE models ,AUTOMATIC summarization ,TEXT summarization ,NATURAL language processing ,INFORMATION overload ,DEEP learning - Abstract
In the rapidly evolving field of natural language processing, the demand for efficient automated text summarization systems that not only distill extensive documents but also capture their nuanced thematic elements has never been greater. This paper introduces the FuzzyTP-BERT framework, a novel approach in extractive text summarization that synergistically combines Fuzzy Topic Modeling (FuzzyTM) with the advanced capabilities of Bidirectional Encoder Representations from Transformers (BERT). Unlike traditional extractive methods, FuzzyTP-BERT integrates fuzzy logic to refine topic modeling, enhancing the semantic sensitivity of summaries by allowing a more nuanced representation of word-topic relationships. This integration results in summaries that are not only coherent but also thematically rich, addressing a significant gap in current summarization technology. Extensive evaluations on benchmark datasets demonstrate that FuzzyTP-BERT significantly outperforms existing models in terms of ROUGE scores, effectively balancing topical relevance with semantic coherence. Our findings suggest that incorporating fuzzy logic into deep learning frameworks can markedly improve the quality of automated text summaries, potentially benefiting a wide range of applications in the information overload age. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. An Improved Method for Extractive Based Opinion Summarization Using Opinion Mining.
- Author
-
Bhatia, Surbhi and AlOjail, Mohammed
- Subjects
SENTIMENT analysis ,DATA extraction ,TEXT mining ,NATURAL language processing ,DEEP learning ,AUTOMATIC summarization - Abstract
Opinion summarization recapitulates the opinions about a common topic automatically. The primary motive of summarization is to preserve the properties of the text and is shortened in a way with no loss in the semantics of the text. The need of automatic summarization efficiently resulted in increased interest among communities of Natural Language Processing and Text Mining. This paper emphasis on building an extractive summarization system combining the features of principal component analysis for dimensionality reduction and bidirectional Recurrent Neural Networks and Long Short-Term Memory (RNN-LSTM) deep learning model for short and exact synopsis using seq2seq model. It presents a paradigm shift with regard to the way extractive summaries are generated. Novel algorithms for word extraction using assertions are proposed. The semantic framework is well-grounded in this research facilitating the correct decision making process after reviewing huge amount of online reviews, considering all its important features into account. The advantages of the proposed solution provides greater computational efficiency, better inferences from social media, data understanding, robustness and handling sparse data. Experiments on the different datasets also outperforms the previous researches and the accuracy is claimed to achieve more than the baselines, showing the efficiency and the novelty in the research paper. The comparisons are done by calculating accuracy with different baselines using Rouge tool. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
33. An Efficient Machine Learning-based Text Summarization in the Malayalam Language.
- Author
-
Haroon, Rosna P., M., Abdul Gafur, and U, Barakkath Nisha
- Subjects
TEXT summarization ,MALAYALAM language ,SUPPORT vector machines ,CLASSIFICATION algorithms ,LINGUISTIC complexity - Abstract
Automatic text summarization is a procedure that packs enormous content into a more limited book that incorporates significant data. Malayalam is one of the toughest languages utilized in certain areas of India, most normally in Kerala and in Lakshadweep. Natural language processing in the Malayalam language is relatively low due to the complexity of the language as well as the scarcity of available resources. In this paper, a way is proposed to deal with the text summarization process in Malayalam documents by training a model based on the Support Vector Machine classification algorithm. Different features of the text are taken into account for training the machine so that the system can output the most important data from the input text. The classifier can classify the most important, important, average, and least significant sentences into separate classes and based on this, the machine will be able to create a summary of the input document. The user can select a compression ratio so that the system will output that much fraction of the summary. The model performance is measured by using different genres of Malayalam documents as well as documents from the same domain. The model is evaluated by considering content evaluation measures precision, recall, F score, and relative utility. Obtained precision and recall value shows that the model is trustable and found to be more relevant compared to the other summarizers. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
34. Review of automatic text summarization techniques & methods.
- Author
-
Widyassari, Adhika Pramita, Rustad, Supriadi, Shidik, Guruh Fajar, Noersasongko, Edi, Syukur, Abdul, Affandy, Affandy, and Setiadi, De Rosal Ignatius Moses
- Abstract
Text summarization automatically produces a summary containing important sentences and includes all relevant important information from the original document. One of the main approaches, when viewed from the summary results, are extractive and abstractive. An extractive summary is heading towards maturity and now research has shifted towards abstractive summation and real-time summarization. Although there have been so many achievements in the acquisition of datasets, methods, and techniques published, there are not many papers that can provide a broad picture of the current state of research in this field. This paper provides a broad and systematic review of research in the field of text summarization published from 2008 to 2019. There are 85 journal and conference publications which are the results of the extraction of selected studies for identification and analysis to describe research topics/trends, datasets, preprocessing, features, techniques, methods, evaluations, and problems in this field of research. The results of the analysis provide an in-depth explanation of the topics/trends that are the focus of their research in the field of text summarization; provide references to public datasets, preprocessing and features that have been used; describes the techniques and methods that are often used by researchers as a comparison and means for developing methods. At the end of this paper, several recommendations for opportunities and challenges related to text summarization research are mentioned. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
35. Automatic summarization of scientific articles: A survey.
- Author
-
Ibrahim Altmami, Nouf and El Bachir Menai, Mohamed
- Subjects
NATURAL language processing ,MACHINE learning ,AD hoc computer networks - Abstract
The scientific research process generally starts with the examination of the state of the art, which may involve a vast number of publications. Automatically summarizing scientific articles would help researchers in their investigation by speeding up the research process. The automatic summarization of scientific articles differs from the summarization of generic texts due to their specific structure and inclusion of citation sentences. Most of the valuable information in scientific articles is presented in tables, figures, and algorithm pseudocode. These elements, however, do not usually appear in a generic text. Therefore, several approaches that consider the particularity of a scientific article structure were proposed to enhance the quality of the generated summary, resulting in ad hoc automatic summarizers. This paper provides a comprehensive study of the state of the art in this field and discusses some future research directions. It particularly presents a review of approaches developed during the last decade, the corpora used, and their evaluation methods. It also discusses their limitations and points out some open problems. The conclusions of this study highlight the prevalence of extractive techniques for the automatic summarization of single monolingual articles using a combination of statistical, natural language processing, and machine learning techniques. The absence of benchmark corpora and gold standard summaries for scientific articles remains the main issue for this task. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
36. Parameter-efficient fine-tuning large language model approach for hospital discharge paper summarization.
- Author
-
Goswami, Joyeeta, Prajapati, Kaushal Kumar, Saha, Ashim, and Saha, Apu Kumar
- Subjects
LANGUAGE models ,TEXT summarization ,HOSPITAL admission & discharge ,MEDICAL terminology ,MNEMONICS ,TEXT messages - Abstract
Text summarization in medical domain is one of the most crucial chores as it deals with the critical human information. Consequently the proper summarization and key point extraction from medical deeds using pre-trained Language models is now the key figure to be focused on for the researchers. But due to the considerable amount of real-world data and enormous amount of memory requirement to train the Large Language Models (LLMs), research on these models become challenging. To overcome these challenges multiple prompting and tuning techniques are being used. In this paper, effectiveness of prompt engineering and parameter efficient fine tuning is being studied to summarize the Hospital Discharge Summary (HDS) papers effectively, so that these models can accurately interprete medical terminologies and contexts, generate brief but compact summaries, and draw out concentrated themes, which opens new approaches for the application of LLMs in healthcare and making HDS more patient-friendly. In this research LLaMA 2 (Large Language Model Meta AI) has been considered as the base model. Also, the model has been fine-tuned using QLoRA (Quantized Low Rank Adapters), which can bring down the memory usage of LLMs without compromising the data quality. This study explores the way to use LLMs on HDS datasets without the hassle of memory usage using QLoRA, into electronic health record systems to further streamline the handling and retrieval of healthcare information. • Presents a method to summarize HDS by parameter-efficient fine-tuning to a LLM. • It utilizes QLoRA fine-tuning on the LLAMA 2 to minimize the memory requirements. • A comparative analysis of prompt engineering and parameter-efficient fine-tuning is performed. • The effectiveness of the approach is evaluated using BLEU and ROGUE-L scores. • Medical professionals have conducted a quality assessment of the summarization. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Text Summarization Technique for Punjabi Language Using Neural Networks.
- Author
-
Jain, Arti, Arora, Anuja, Yadav, Divakar, Morato, Jorge, and Kaur, Amanpreet
- Published
- 2021
- Full Text
- View/download PDF
38. Headnote Prediction Using Machine Learning.
- Author
-
Mahar, Sarmad, Zafar, Sahar, and Nishat, Kamran
- Published
- 2021
- Full Text
- View/download PDF
39. Attention based Abstractive Summarization of Malayalam Document.
- Author
-
Nambiar, Sindhya K, Peter S, David, and Idicula, Sumam Mary
- Subjects
TEXT messages ,ATTENTION - Abstract
There are different textual content summarization processes available in natural Language Processing. Amongst them abstractive textual content summarization is one of the challenging problems in natural language processing and that too, with very little research done in regional languages. Unlike other summarization techniques, which reuses the words and phrases from the source text, abstractive text summarization builds a short and concise precis of a huge text document built from the underlying message of the text not necessarily using the same words and phrases from the source. The objective of the proposed work is to create a brief and understandable abstractive summary of a Malayalam document. Malayalam is one of the 22 scheduled languages of India spoken by over 34 million people and is designated as a Classical Language in India. Being a Classical language, Malayalam has a very unique syntactic and semantic rules which makes this work more important. The proposed work attempts to create an attention mechanism to generate the summary of the source document. In this work, the goal was to compare the efficiency of Attention model with sequence to sequence baseline model of Malayalam text and thereby implementing a better abstractive text summarizer for a malayalam document. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
40. تلخيص النص الفارسي باستخدام الترميز المت...
- Author
-
رامين فاتركه and سيده ممتاز
- Subjects
SOCIAL media ,NATURAL language processing ,INTERNET - Abstract
Copyright of Iranian Journal of Information Processing & Management is the property of Iranian Information & Documentation Center (IRANDOC) and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2021
41. Decomposition–based multi-objective differential evolution for extractive multi-document automatic text summarization.
- Author
-
Wahab, Muhammad Hafizul Hazmi, Abdul Hamid, Nor Asilah Wati, Subramaniam, Shamala, Latip, Rohaya, and Othman, Mohamed
- Subjects
TEXT summarization ,DIFFERENTIAL evolution ,OPTIMIZATION algorithms ,SWARM intelligence ,MATHEMATICAL optimization ,INFORMATION processing - Abstract
The central challenge in Automatic Text Summarization (ATS) involves efficiently generating machine-generated text summaries through optimization algorithms. An ATS is a critical component for systems dealing with textual information processing. However, the current approach faces a significant hurdle due to the computational intensity of the process, particularly when employing complex optimization techniques like swarm intelligence optimization alongside a costly ATS repair operator. While the current approach yields impressive Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics for the generated summary, it struggles with inefficiencies, mainly attributed to the substantial optimization time consumed by the ATS repair operator scheme. In order to address this, a novel solution called Decomposition-based Multi-Objective Differential Evolution (MODE/D) is proposed. It is built upon the foundation of Differential Evolution for Multi-Objective Optimization (DEMO) and the weighted sum method (WS), coupled with an innovative ATS repair operator scheme. Through experimentation on Document Understanding Conferences (DUC) datasets, the novel approach of MODE/D – WS is validated by evaluating the results using ROUGE metrics. The outcomes are twofold: a remarkable reduction in serial execution time and a noteworthy enhancement over existing techniques in the scholarly domain, as evidenced by improved ROUGE-1, ROUGE-2, and ROUGE-L scores. [Display omitted] • The adoption of MODE/D and the WS method to effectively tackle the optimization problem of extractive multi-document ATS. • Employment of ROUGE metrics to gauge the quality of the proposed approach's summaries relative to alternative methodologies. • The introduction of an enhanced ATS repair operator scheme to reduce the computational demands of the repair operator. • Establish performance benchmarks by comparing MODE/D's execution times and word generation rates with other approaches in serial execution. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Model Peringkasan Teks Ekstraktif Dwibahasa menggunakan Fitur Kekangan Corak Tekstual.
- Author
-
Alias, Suraya, Sainin, Mohd Shamrie, and Mohammad, Siti Khaotijah
- Subjects
MALAY language ,RANDOM sets ,LINGUISTS ,CORPORA ,VOCABULARY - Abstract
In the era of digital information, an auto-generated summary can help readers to easily find important and relevant information. Most of the studies and benchmark data sets in the field of text summarization are in English. Hence, there is a need to study the potential of Malay language in this field. This study also highlights the problems in identifying and generating important information in extractive summaries. This is because existing text representation models such as BOW has weaknesses in inaccurate semantic representation, while the N-gram model has the issue of producing very high word vector dimensions. In this study, a bilingual text summarization model named MYTextSumBASIC has been developed to generate an extractive summary automatically in Malay and English. The MYTextSumBASIC summarizer model applies a text representation model known as FASP using three Textual Pattern Constraints, namely word item constraints, adjacent word constraints and sequence size constraints. There are three main phases in the framework of MYTextSumBASIC model, which are the development of the Malay language corpus, the development of MYTextSumBASIC model using FASP and the summary evaluation phase. In the summary evaluation phase, using the Malay language data sets of 100 news articles, the summaries produced by MYTextSumBASIC outperformed the summary generated by Baseline (Lead) and OTS summarizer with the highest average for retrieval (R) is 0.5849, precision (P) is 0.5736 and the F-score (Fm) is 0.5772. For manual evaluation by linguists, the MYTextSumBASIC method yielded a reading score of 4.1 and 3.87 for summary content generated using a random data set. Further experiments using the 2002 DUC English benchmark data set of 102 news articles have also shown that the MYTextSumBASIC model outperformed the best and lowest systems in the comparison with the mean retrieval values of ROUGE-1 (0.43896) and ROUGE-2 (0.19918). These findings conclude that the FASP text representation feature along with the textual pattern constraints used by our model can be used for bilingual text with competitive performance compared to other text summarization models. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
43. Deep Text Summarization using Generative Adversarial Networks in Indian Languages.
- Author
-
Bhargava, Rupal, Sharma, Gargi, and Sharma, Yashvardhan
- Subjects
SPEECH perception ,DEEP learning ,NATURAL language processing ,PARTS of speech - Abstract
Abstractive Text Summarization (ATS) is a task of capturing information from different sources and condense it such that, content is represented well and there is no loss of information. It has been an active area of research for quiet sometime now. ATS is more closer to human generated summaries and have the capability of representing and combining multiple information. With advent of deep learning architectures, many tasks relating to natural language processing have achieved persistent and comparable high performances. It has proven advantageous and showed promising results in machine translation, speech recognition, image captioning and many others using sequence to sequence models. Language tools such as Part of Speech taggers, Named Entity Recognizer for Indian languages are not very competitive and hence, language specific techniques do not perform very well for Indian languages. Deep learning techniques are language agnostic and hence can overcome these shortcomings. In this paper, Generative Adversarial Networks(GAN(s)) are assimilated to create gist for longer piece of text in conjunction to paraphrase detection. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
44. Leveraging GPT-4 for food effect summarization to enhance product-specific guidance development via iterative prompting.
- Author
-
Shi, Yiwen, Ren, Ping, Wang, Jing, Han, Biao, ValizadehAslani, Taha, Agbavor, Felix, Zhang, Yi, Hu, Meng, Zhao, Liang, and Liang, Hualou
- Abstract
[Display omitted] • Iterative prompting is a simple yet effective strategy that enables us to interact with ChatGPT or GPT-4 more effectively and efficiently through multi-turn interaction, which iteratively improves the quality of text summary. • Extensive evaluations of our approach using 100 NDA review documents for food effect summarization show the versatility of employing multi-turn interaction to refine the quality of the generated summaries. • The superior performance of GPT-4 over ChatGPT is indicative of the potential of GPT-4 in automating evaluation while acknowledging certain limitations. Food effect summarization from New Drug Application (NDA) is an essential component of product-specific guidance (PSG) development and assessment, which provides the basis of recommendations for fasting and fed bioequivalence studies to guide the pharmaceutical industry for developing generic drug products. However, manual summarization of food effect from extensive drug application review documents is time-consuming. Therefore, there is a need to develop automated methods to generate food effect summary. Recent advances in natural language processing (NLP), particularly large language models (LLMs) such as ChatGPT and GPT-4, have demonstrated great potential in improving the effectiveness of automated text summarization, but its ability with regard to the accuracy in summarizing food effect for PSG assessment remains unclear. In this study, we introduce a simple yet effective approach, iterative prompting, which allows one to interact with ChatGPT or GPT-4 more effectively and efficiently through multi-turn interaction. Specifically, we propose a three-turn iterative prompting approach to food effect summarization in which the keyword-focused and length-controlled prompts are respectively provided in consecutive turns to refine the quality of the generated summary. We conduct a series of extensive evaluations, ranging from automated metrics to FDA professionals and even evaluation by GPT-4, on 100 NDA review documents selected over the past five years. We observe that the summary quality is progressively improved throughout the iterative prompting process. Moreover, we find that GPT-4 performs better than ChatGPT, as evaluated by FDA professionals (43% vs. 12%) and GPT-4 (64% vs. 35%). Importantly, all the FDA professionals unanimously rated that 85% of the summaries generated by GPT-4 are factually consistent with the golden reference summary, a finding further supported by GPT-4 rating of 72% consistency. Taken together, these results strongly suggest a great potential for GPT-4 to draft food effect summaries that could be reviewed by FDA professionals, thereby improving the efficiency of the PSG assessment cycle and promoting generic drug product development. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
45. Summarization of scholarly articles using BERT and BiGRU: Deep learning-based extractive approach.
- Author
-
Bano, Sheher, Khalid, Shah, Tairan, Nasser Mansoor, Shah, Habib, and Khattak, Hasan Ali
- Subjects
LANGUAGE models ,RECURRENT neural networks ,DEEP learning ,TEXT summarization - Abstract
Extractive text summarization involves selecting and combining key sentences directly from the original text, rather than generating new content. While various methods, both statistical and graph-based, have been explored for this purpose, accurately capturing the intended meaning remains a challenge. To address this, researchers are investigating innovative techniques that harness deep learning models like BERT (Bidirectional Encoder Representations from Transformers). However, BERT has limitations in summarizing lengthy documents due to input length constraints. To find a more effective solution, we propose a novel approach. This approach combines the power of BERT, a transformer network pre-trained on extensive self-supervised datasets, with BiGRU (Bidirectional Gated Recurrent Units), a recurrent neural network that captures sequential dependencies within the text for extracting salient information. Our method involves using BERT to generate sentence-level embeddings, which are then fed into the BiGRU network. This allows us to achieve a comprehensive understanding of the complete document's context. In experimental analysis conducted on arXiv and PubMed datasets, the proposed approach outperformed several state-of-the-art models. It achieved remarkable ROUGE-F scores of (46.7, 19.4, 35.4) and (47.0, 21.3, 39.7) on these datasets respectively. The proposed fusion of BERT and BiGRU significantly enhances extractive text summarization. It shows promising potential for summarizing lengthy documents and benefiting various domains that require concise and informative summaries. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
46. Comparison of Graph-based and Term Weighting Method for Automatic Summarization of Online News.
- Author
-
Rumagit, Reinert Yosua, Setiyawati, Nina, and Bangkalang, Dwi Hosanna
- Subjects
TERMS & phrases ,PARAGRAPHS ,CRIMINAL sentencing - Abstract
Text summarization is one of the quickest ways to get the gist of a paragraph or story. In text summarization, there are two ways that can be used: extractive approach and abstractive approach. In this research, the summarization was conducted using extractive approach. The extraction process was conducted by taking a few sentences from a document and combining them into a short summary. The most common method used in conducting text summarization is graph-based method. The authors proposed another method for summarization, namely term weighting method. The purpose of this study is to compare between the result of graph-based method and term weighting method in order to determine the best method for text summarization. The text pre-processing phase involves omitting the stopwords and the affixes. Moreover, the researcher utilized the measurement of Precision, Recall and F-Score. Based on the experiment using the proposed method (term weighting method), the result shows that the average values on Precision and F-score for term-weighting method are 0.296 and 0.280 respectively, which are better than the values of graph-based method. In the end, the result shows that the proposed method, which is the term weighting method, produced better summary compared to graph-based method. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
47. An optimized hybrid deep learning model based on word embeddings and statistical features for extractive summarization.
- Author
-
Wazery, Yaser M., Saleh, Marwa E., and Ali, Abdelmgeid A.
- Subjects
DEEP learning ,CONVOLUTIONAL neural networks ,TEXT summarization ,STATISTICS ,MACHINE learning ,MATHEMATICAL optimization - Abstract
Extractive summarization has recently gained significant attention as a classification problem at the sentence level. Most current summarization methods rely on only one way of representing sentences in a document (i.e., extracted features, word embeddings, BERT embeddings). However, classification performance and summary generation quality will be improved if we combine two ways of representing sentences. This paper presents a novel extractive text summarization method based on word embeddings and statistical features of a single document. Each sentence is encoded using a Convolutional Neural Network (CNN) and a Feed-Forward Neural Network (FFNN) based on word embeddings and statistical features. CNN and FFNN outputs are concatenated to classify the sentence using a Multilayer Perceptron (MLP). In addition, hybrid model parameters are optimized by the KerasTuner optimization technique to determine the most efficient hybrid model. The proposed method was evaluated on the standard Newsroom dataset. Experiments show that the proposed method effectively captures the document's semantic and statistical information and outperforms deep learning, machine learning, and state-of-the-art approaches with scores of 78.64, 74.05, and 72.08 for ROUGE-1 ROUGE-2, and ROUGE-L, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
48. POS Tagging for Arabic Text Using Bee Colony Algorithm.
- Author
-
Alhasan, Ahmad and Al-Taani, Ahmad T.
- Subjects
NATURAL language processing ,DATA mining ,INFORMATION retrieval ,BEES algorithm ,ARABIC language - Abstract
Abstract Part-of-Speech (POS) Tagging is the process of automatically determining the proper grammatical tag or syntactic category of a word depending on a its context. POS Tagging is an essential step in most Natural Language Processing (NLP) applications such as text summarization, question answering, information extraction and information retrieval. In this study, we propose an efficient tagging approach for the Arabic language using Bee Colony Optimization algorithm. The problem is represented as a graph and a novel technique is proposed to assign scores to possible tags of a sentence, then the bees find the best solution path. The proposed approach is evaluated using KALIMAT corpus which consists of 18M words. Experimental results showed that the proposed approach achieved 98.2% of accuracy compared to 98%, 97.4% and 94.6% for Hybrid, Hidden Markov Model and Rule-Based methods respectively. Furthermore, the proposed approach determined all the tags presented in the corpus while the mentioned approaches can identify only three tags. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
49. Penentuan Fitur bagi Pengekstrakan Tajuk Berita Akhbar Bahasa Melayu.
- Author
-
Shahrul Azman Mohd Noah, Nazlena Mohamad Ali, and Mohd Sabri Hasan
- Abstract
Headline summarization is one of the automated text summarization techniques that can reduce the problem of information overload in the retrieval system and reduce the user's cognitive burden while searching and selecting relevant documents in large quantities. This study discusses the process on the determination of Malay language system features in the news genre document. Methodology starts with analysis the corpus of Malay news documents. The corpus contains 140 core news items which were selected from the two mainstream news databases in Malaysia which are Berita Harian and Utusan Malaysia. The selection news criteria are from core news categories, sized 50 to 250 words, the years of publication from 2007 to 2012 and news genres from economic, crime, education and sports. Three linguistic experts in Malay produced a headline summary for each news document manually. The experts need to comply with three conditions which are summary extraction, select-word-inorder word selection techniques and word morphological changes. The experimental results show that three characteristics have been identified, first: the first two sentenses are the important sentences, second: the verse that contains the potential acronym definitions is chosen as the most important sentence and third: the size of the summary of the ideal headline is six words. The consideration of this feature allows a summary of the headline that can be generated automatically, just like the process done by human. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
50. Hybrid Approach To Abstractive Summarization.
- Author
-
Sahoo, Deepak, Bhoi, Ashutosh, and Balabantaray, Rakesh Chandra
- Subjects
INFORMATION retrieval ,ABSTRACTING ,MARKOV processes ,CLUSTER analysis (Statistics) ,DATA fusion (Statistics) - Abstract
Text summarization is an application of information retrieval where short and non-redundant version of comparatively large text is presented to the end user. In this paper a hybrid approach is presented to generate abstract summary in which sentences are clustered using sentence level relationships among sentences in association with Markov clustering principle. Then sentence ranking is done in each cluster and if possible the top weighted sentence of each cluster is fused using some linguistic rules with its best-fit sentence(if found) within that cluster to generate a new sentence. Then top ranked sentences from each cluster are compressed using classification technique to generate the abstract summary. The proposed system is evaluated with DUC 2002 data set and the result is performing better than other existing systems. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.