Descriptor: "TEXT summarization" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"TEXT summarization"' showing total 1,669 results

Start Over Descriptor "TEXT summarization"

1,669 results on '"TEXT summarization"'

251. Pre-training Classification and Clustering Models for Vietnamese Automatic Text Summarization

Author: Nguyen, Ti-Hon, Do, Thanh-Nghi, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Sharma, Harish, editor, Shrivastava, Vivek, editor, Bharti, Kusum Kumari, editor, and Wang, Lipo, editor
Published: 2023
Full Text: View/download PDF

252. A Survey on Recent Text Summarization Techniques

Author: Senthil Kumar, G., Chakkaravarthy, Midhun, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Morusupalli, Raghava, editor, Dandibhotla, Teja Santosh, editor, Atluri, Vani Vathsala, editor, Windridge, David, editor, Lingras, Pawan, editor, and Komati, Venkateswara Rao, editor
Published: 2023
Full Text: View/download PDF

253. Morphosyntactic Evaluation for Text Summarization in Morphologically Rich Languages: A Case Study for Turkish

Author: Baykara, Batuhan, Güngör, Tunga, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Métais, Elisabeth, editor, Meziane, Farid, editor, Sugumaran, Vijayan, editor, Manning, Warren, editor, and Reiff-Marganiec, Stephan, editor
Published: 2023
Full Text: View/download PDF

254. Extractive and Abstractive Text Summarization Model Fine-Tuned Based on BERTSUM and Bio-BERT on COVID-19 Open Research Articles

Author: Nunna, Jhansi Lakshmi Durga, Hanuman Turaga, V. K., Chebrolu, Srilatha, Misra, Rajiv, editor, Omer, Rana, editor, Rajarajan, Muttukrishnan, editor, Veeravalli, Bharadwaj, editor, Kesswani, Nishtha, editor, and Mishra, Priyanka, editor
Published: 2023
Full Text: View/download PDF

255. Implementation of Legal Documents Text Summarization and Classification by Applying Neural Network Techniques

Author: Rusiya, Siddhartha, Jamatia, Anupam, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Sisodia, Dilip Singh, editor, Garg, Lalit, editor, Pachori, Ram Bilas, editor, and Tanveer, M., editor
Published: 2023
Full Text: View/download PDF

256. Topic-Selective Graph Network for Topic-Focused Summarization

Author: Shi, Zesheng, Zhou, Yucheng, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Kashima, Hisashi, editor, Ide, Tsuyoshi, editor, and Peng, Wen-Chih, editor
Published: 2023
Full Text: View/download PDF

257. Pre-training Meets Clustering: A Hybrid Extractive Multi-document Summarization Model

Author: Karotia, Akanksha, Susan, Seba, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Abraham, Ajith, editor, Hong, Tzung-Pei, editor, Kotecha, Ketan, editor, Ma, Kun, editor, Manghirmalani Mishra, Pooja, editor, and Gandhi, Niketa, editor
Published: 2023
Full Text: View/download PDF

258. Foundation Models for Text Generation

Author: Paaß, Gerhard, Giesselbach, Sven, O'Sullivan, Barry, Series Editor, Wooldridge, Michael, Series Editor, Paaß, Gerhard, and Giesselbach, Sven
Published: 2023
Full Text: View/download PDF

259. Abstractive Text Summarization of Biomedical Documents

Author: Mital, Tanya, Selvam, Sheba, Tanisha, V., Chauhan, Rajdeep, Goplani, Dewang, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Kumar, Sandeep, editor, Sharma, Harish, editor, Balachandran, K., editor, Kim, Joong Hoon, editor, and Bansal, Jagdish Chand, editor
Published: 2023
Full Text: View/download PDF

260. Survey of Text Summarization Stratification

Author: Jamwal, Arvind, Singh, Pardeep, Kumari, Namrata, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Hirche, Sandra, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Möller, Sebastian, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Singh, Yashwant, editor, Verma, Chaman, editor, Zoltán, Illés, editor, Chhabra, Jitender Kumar, editor, and Singh, Pradeep Kumar, editor
Published: 2023
Full Text: View/download PDF

261. Extractive Text Summarization Using Statistical Approach

Author: Tewari, Kartikey, Yadav, Arun Kumar, Kumar, Mohit, Yadav, Divakar, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Tistarelli, Massimo, editor, Dubey, Shiv Ram, editor, Singh, Satish Kumar, editor, and Jiang, Xiaoyi, editor
Published: 2023
Full Text: View/download PDF

262. A Review on BERT and Its Implementation in Various NLP Tasks

Author: Chakkarwar, Vrishali, Tamane, Sharvari, Thombre, Ankita, Fournier-Viger, Philippe, Series Editor, Tamane, Sharvari, editor, Ghosh, Suddhasheel, editor, and Deshmukh, Sonal, editor
Published: 2023
Full Text: View/download PDF

263. Automated Text Summarization Using Transformers

Author: Kumar, Yogesh, Jangir, Ashish, Meena, Bhavya, Tripathi, Isha Pathak, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Kumar, Rajesh, editor, Verma, Ajit Kumar, editor, Sharma, Tarun K., editor, Verma, Om Prakash, editor, and Sharma, Sanjay, editor
Published: 2023
Full Text: View/download PDF

264. Evaluation of Extractive and Abstract Methods in Text Summarization

Author: Lenka, Ranjita Kumari Biswal, Coombs, Thomas, Assi, Sulaf, Jayabalan, Manoj, Mustafina, Jamila, Liatsis, Panagiotis, Al-Hamid, Abdullah, Al-Sudani, Sahar, Ismail, Noor Lees, Al-Jumeily OBE, Dhiya, Xhafa, Fatos, Series Editor, Wah, Yap Bee, editor, Berry, Michael W., editor, Mohamed, Azlinah, editor, and Al-Jumeily, Dhiya, editor
Published: 2023
Full Text: View/download PDF

265. A Method for Workflow Segmentation and Action Prediction from Video Data - AR Content

Author: Kumar, Abhishek, Agnihotram, Gopichand, Kumar, Surbhit, Sudidhala, Raja Sekhar Reddy, Naik, Pandurang, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Patel, Kanubhai K., editor, Santosh, K. C., editor, and Patel, Atul, editor
Published: 2023
Full Text: View/download PDF

266. Understanding Images of Tourist Destinations Based on the Extract of Central Sentences from Reviews Using BERT and LexRank

Author: Kim, Da Hee, Lee, Kang Woo, Lim, Ji Won, Kim, Myeong Seon, Hong, Soon-Goo, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Shakya, Subarna, editor, Du, Ke-Lin, editor, and Ntalianis, Klimis, editor
Published: 2023
Full Text: View/download PDF

267. Design and Implementation of Automatic Rumor Detection System Based on Opposite Meaning Searching

Author: Lu, Haori, Wang, Jingrong, Song, Jiazhen, Li, Yutong, Nie, Peng, Li, Kan, Editor-in-Chief, Li, Qingyong, Associate Editor, Fournier-Viger, Philippe, Series Editor, Hong, Wei-Chiang, Series Editor, Liang, Xun, Series Editor, Wang, Long, Series Editor, Xu, Xuesong, Series Editor, Zhan, Zehui, editor, Zhou, Ding, editor, and Wu, Honglin, editor
Published: 2023
Full Text: View/download PDF

268. Hybridization of Fuzzy Theory and Nature-Inspired Optimization for Medical Report Summarization

Author: Mallick, Chirantana, Das, Asit Kumar, Kacprzyk, Janusz, Series Editor, Jain, Lakhmi C., Series Editor, Nayak, Janmenjoy, editor, Das, Asit Kumar, editor, Naik, Bighnaraj, editor, Meher, Saroj K., editor, and Brahnam, Sheryl, editor
Published: 2023
Full Text: View/download PDF

269. A Hybrid Model of Latent Semantic Analysis with Graph-Based Text Summarization on Telugu Text

Author: Lakshmi, Aluri, Latha, D., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Bhateja, Vikrant, editor, Sunitha, K. V. N., editor, Chen, Yen-Wei, editor, and Zhang, Yu-Dong, editor
Published: 2023
Full Text: View/download PDF

270. Creating a Brief Review of Judicial Practice Using Clustering Methods

Author: Taran, Maria O., Revunkov, Georgiy I., Gapanyuk, Yuriy E., Kacprzyk, Janusz, Series Editor, Kryzhanovsky, Boris, editor, Dunin-Barkowski, Witali, editor, Redko, Vladimir, editor, and Tiumentsev, Yury, editor
Published: 2023
Full Text: View/download PDF

271. Extractive Text Summarization on Large-Scale Dataset Using K-Means Clustering and Word Embedding

Author: Nguyen, Ti-Hon, Do, Thanh-Nghi, Xhafa, Fatos, Series Editor, Smys, S., editor, Lafata, Pavel, editor, Palanisamy, Ram, editor, and Kamel, Khaled A., editor
Published: 2023
Full Text: View/download PDF

272. Text Summarization Using Combination of Sequence-To-Sequence Model with Attention Approach

Author: Bhandarkar, Prasad, Thomas, K. T., Xhafa, Fatos, Series Editor, Smys, S., editor, Lafata, Pavel, editor, Palanisamy, Ram, editor, and Kamel, Khaled A., editor
Published: 2023
Full Text: View/download PDF

273. Supervised Automatic Text Summarization of Konkani Texts Using Linear Regression-Based Feature Weighing and Language-Independent Features

Author: D’Silva, Jovi, Sharma, Uzzal, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Gupta, Deepak, editor, Khanna, Ashish, editor, Bhattacharyya, Siddhartha, editor, Hassanien, Aboul Ella, editor, Anand, Sameer, editor, and Jaiswal, Ajay, editor
Published: 2023
Full Text: View/download PDF

274. Text Summarization Approaches Under Transfer Learning and Domain Adaptation Settings—A Survey

Author: Tank, Meenaxi, Thakkar, Priyank, Xhafa, Fatos, Series Editor, Buyya, Rajkumar, editor, Hernandez, Susanna Munoz, editor, Kovvur, Ram Mohan Rao, editor, and Sarma, T. Hitendra, editor
Published: 2023
Full Text: View/download PDF

275. Extractive Text Summarization for Turkish: Implementation of TF-IDF and PageRank Algorithms

Author: Akülker, Emre, Turhan, Çiğdem, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
Published: 2023
Full Text: View/download PDF

276. Extraction and Summarization of Disease Details Using Text Summarization Techniques

Author: Balipa, Mamatha, Yashvanth, S., Prakash, Sharan, Xhafa, Fatos, Series Editor, Rajakumar, G., editor, Du, Ke-Lin, editor, Vuppalapati, Chandrasekar, editor, and Beligiannis, Grigorios N., editor
Published: 2023
Full Text: View/download PDF

277. Contextual Hypergraph Networks for Enhanced Extractive Summarization: Introducing Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES)

Author: Aytuğ Onan and Hesham Alhumyani
Subjects: extractive summarization, hypergraph networks, text summarization, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: Extractive summarization, a pivotal task in natural language processing, aims to distill essential content from lengthy documents efficiently. Traditional methods often struggle with capturing the nuanced interdependencies between different document elements, which is crucial to producing coherent and contextually rich summaries. This paper introduces Multi-Element Contextual Hypergraph Extractive Summarizer (MCHES), a novel framework designed to address these challenges through an advanced hypergraph-based approach. MCHES constructs a contextual hypergraph where sentences form nodes interconnected by multiple types of hyperedges, including semantic, narrative, and discourse hyperedges. This structure captures complex relationships and maintains narrative flow, enhancing semantic coherence across the summary. The framework incorporates a Contextual Homogenization Module (CHM), which harmonizes features from diverse hyperedges, and a Hypergraph Contextual Attention Module (HCA), which employs a dual-level attention mechanism to focus on the most salient information. The innovative Extractive Read-out Strategy selects the optimal set of sentences to compose the final summary, ensuring that the latter reflects the core themes and logical structure of the original text. Our extensive evaluations demonstrate significant improvements over existing methods. Specifically, MCHES achieves an average ROUGE-1 score of 44.756, a ROUGE-2 score of 24.963, and a ROUGE-L score of 42.477 on the CNN/DailyMail dataset, surpassing the best-performing baseline by 3.662%, 3.395%, and 2.166% respectively. Furthermore, MCHES achieves BERTScore values of 59.995 on CNN/DailyMail, 88.424 on XSum, and 89.285 on PubMed, indicating superior semantic alignment with human-generated summaries. Additionally, MCHES achieves MoverScore values of 87.432 on CNN/DailyMail, 60.549 on XSum, and 59.739 on PubMed, highlighting its effectiveness in maintaining content movement and ordering. These results confirm that the MCHES framework sets a new standard for extractive summarization by leveraging contextual hypergraphs for better narrative and thematic fidelity.
Published: 2024
Full Text: View/download PDF

278. Automated Generation of Clinical Reports Using Sensing Technologies with Deep Learning Techniques

Author: Celia Cabello-Collado, Javier Rodriguez-Juan, David Ortiz-Perez, Jose Garcia-Rodriguez, David Tomás, and Maria Flores Vizcaya-Moreno
Subjects: text summarization, healthcare, multimodal data, audio sensors, transformers, Chemical technology, TP1-1185
Abstract: This study presents a pioneering approach that leverages advanced sensing technologies and data processing techniques to enhance the process of clinical documentation generation during medical consultations. By employing sophisticated sensors to capture and interpret various cues such as speech patterns, intonations, or pauses, the system aims to accurately perceive and understand patient–doctor interactions in real time. This sensing capability allows for the automation of transcription and summarization tasks, facilitating the creation of concise and informative clinical documents. Through the integration of automatic speech recognition sensors, spoken dialogue is seamlessly converted into text, enabling efficient data capture. Additionally, deep models such as Transformer models are utilized to extract and analyze crucial information from the dialogue, ensuring that the generated summaries encapsulate the essence of the consultations accurately. Despite encountering challenges during development, experimentation with these sensing technologies has yielded promising results. The system achieved a maximum ROUGE-1 metric score of 0.57, demonstrating its effectiveness in summarizing complex medical discussions. This sensor-based approach aims to alleviate the administrative burden on healthcare professionals by automating documentation tasks and safeguarding important patient information. Ultimately, by enhancing the efficiency and reliability of clinical documentation, this innovative method contributes to improving overall healthcare outcomes.
Published: 2024
Full Text: View/download PDF

279. occams: A Text Summarization Package

Author: Clinton T. White, Neil P. Molino, Julia S. Yang, and John M. Conroy
Subjects: text summarization, extractive, multilingual, budgeted maximum coverage, Electronic computers. Computer science, QA75.5-76.95, Probabilities. Mathematical statistics, QA273-280
Abstract: Extractive text summarization selects asmall subset of sentences from a document, which gives good “coverage” of a document. When given a set of term weights indicating the importance of the terms, the concept of coverage may be formalized into a combinatorial optimization problem known as the budgeted maximum coverage problem. Extractive methods in this class are known to beamong the best of classic extractive summarization systems. This paper gives a synopsis of thesoftware package occams, which is a multilingual extractive single and multi-document summarization package based on an algorithm giving an optimal approximation to the budgeted maximum coverage problem. The occams package is written in Python and provides an easy-to-use modular interface, allowing it to work in conjunction with popular Python NLP packages, such as nltk, stanza or spacy.
Published: 2023
Full Text: View/download PDF

280. Automatic text summarization based on extractive-abstractive method

Author: Md. Ahsan Habib, Romana Rahman Ema, Tajul Islam, Md. Yasir Arafat, and Mahedi Hasan
Subjects: text summarization, extractive summarization, abstractive summarization, sentence ranking algorithm, text generation, noun pronoun conversion, Computer engineering. Computer hardware, TK7885-7895, Electronic computers. Computer science, QA75.5-76.95
Abstract: The choice of this study has a significant impact on daily life. In various fields such as journalism, academia, business, and more, large amounts of text need to be processed quickly and efficiently. Text summarization is a technique used to generate a precise and shortened summary of spacious texts. The generated summary sustains overall meaning without losing any information and focuses on those parts that contain useful information. The goal is to develop a model that converts lengthy articles into concise versions. The task to be solved is to select an effective procedure to develop the model. Although the present text summarization models give us good results in many recognized datasets such as cnn/daily- mail, newsroom, etc. All the problems can not be resolved by these models. In this paper, a new text summarization method has been proposed: combining the Extractive and Abstractive Text Summarization technique. In the extractive-based method, the model generates a summary using Sentence Ranking Algorithm and passes this generated summary through an abstractive method. When using the sentence ranking algorithm, after rearranging the sentences, the relationship between one sentence and another sentence is destroyed. To overcome this situation, Pronoun to Noun conversion has been proposed with the new system. After generating the extractive summary, the generated summary is passed through the abstractive method. The proposed abstractive model consists of three pre-trained models: google/pegusus-xsum, face-book/bart-large-cnn model, and Yale-LILY/brio-cnndm-uncased, which generates a final summary depending on the maximum final score. The following results were obtained: experimental results on CNN/daily-mail dataset show that the proposed model obtained scores of ROUGE-1, ROUGE-2 and ROUGE-L are respectively 42.67 %, 19.35 %, and 39.57 %. Then, the result has been compared with three state-of-the-art methods: JEANS, DEATS and PGAN-ATSMT. The results outperform state-of-the-art models. Experimental results also show that the proposed model is qualitatively readable and can generate abstract summaries. Conclusion: In terms of ROUGE score, the model outperforms some art-of-the-state models for ROUGE-1 and ROUGE-L, but doesn’t achieve good result in ROUGE-2.
Published: 2023
Full Text: View/download PDF

281. Tweet congestion locations identification using natural language processing.

Author: Subarkah, Aan, Kusnanto, Geri, Permai, Syarifah Diana, Ohyver, Margaretha, and Arifin, Samsul
Subjects: *NATURAL language processing, *TEXT summarization, *MACHINE translating, *COMPUTATIONAL linguistics, *EXPERT systems, *ARTIFICIAL intelligence
Abstract: Natural Language Processing (NLP) is a field of computer science and linguistics that explores how computers interact with human (natural) language. NLP is frequently seen as a subfield of artificial intelligence, and its research area overlaps with computational linguistics. Machine translation, natural language text processing and summarization, user interfaces, speech recognition, and expert systems are all examples of NLP applications. Low-level NLP tasks and higher-level NLP tasks are two types of Natural Language Processing tasks. Tokenization, part-of-speech assignment to individual words (POS tagging), and shallow parsing are examples of low-level NLP activities (chunking). Meanwhile, higher-level NLP tasks, such as spelling/grammatical mistake detection and recovery, named entity recognition (NER), and information extraction, are constructed on top of low-level NLP tasks and used according to the problems found (IE). This study uses natural language processing to extract the location of congestion from tweets (twit). We get the location name and traffic conditions from the tweet by implementing a low-level NLP task in the form of tokenization, POS tagging, and chunking, followed by a higher-level NLP work in the form of named entity recognition. The types of words, such as nouns, verbs, adjectives, and descriptions, are taught at the introductory stage. Following POS tagging, word grouping (chunking) is carried out; if numerous consecutive words are verbs, they are classified as location names, whereas adjectives are grouped as traffic conditions. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

282. Biomedical semantic text summarizer

Author: Kirmani, Mahira, Kour, Gagandeep, Mohd, Mudasir, Sheikh, Nasrullah, Khan, Dawood Ashraf, Maqbool, Zahid, Wani, Mohsin Altaf, and Wani, Abid Hussain
Published: 2024
Full Text: View/download PDF

283. Constraint-Based Adversarial Networks for Unsupervised Abstract Text Summarization.

Author: Jing, Liwei, Yang, Lina, Yuan, Yujian, Meng, Zuqiang, Tan, Yifeng, Wang, Patrick Shen-Pei, and Li, Xichun
Subjects: *TEXT summarization, *GENERATIVE adversarial networks
Abstract: Abstract text summarization is a classic sequence-to-sequence natural language generation task. In order to improve the quality of unsupervised abstract text summarization in unsupervised mode, we propose two constraints for training text summarization model, embedding space constraint and information ratio constraint. We construct a generative adversarial network with two discriminators based on these two constraints (TC-SUM-GAN). We use unsupervised and supervised methods to train the model in the experiment. Experimental results show that the ROUGE-1 value of the unsupervised TC-SUM-GAN increases by 1 2. 5 7 points compared with the basic model and at least 1.96 points compared with other comparative models. The ROUGE scores of the supervised TC-SUM-GAN are also improved. TC-SUM-GAN achieves very competitive results for the metrics of ROUGE-1 and ROUGE-2. In addition, the abstracts generated by our model are closer to those generated manually. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

284. CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19.

Author: Karotia, Akanksha and Susan, Seba
Subjects: *TEXT summarization, *SCIENTIFIC literature, *INFORMATION overload, *SCIENTIFIC method, *SCIENCE databases
Abstract: The number of research articles published on COVID-19 has dramatically increased since the outbreak of the pandemic in November 2019. This absurd rate of productivity in research articles leads to information overload. It has increasingly become urgent for researchers and medical associations to stay up to date on the latest COVID-19 studies. To address information overload in COVID-19 scientific literature, the study presents a novel hybrid model named CovSumm, an unsupervised graph-based hybrid approach for single-document summarization, that is evaluated on the CORD-19 dataset. We have tested the proposed methodology on the scientific papers in the database dated from January 1, 2021 to December 31, 2021, consisting of 840 documents in total. The proposed text summarization is a hybrid of two distinctive extractive approaches (1) GenCompareSum (transformer-based approach) and (2) TextRank (graph-based approach). The sum of scores generated by both methods is used to rank the sentences for generating the summary. On the CORD-19, the recall-oriented understudy for gisting evaluation (ROUGE) score metric is used to compare the performance of the CovSumm model with various state-of-the-art techniques. The proposed method achieved the highest scores of ROUGE-1: 40.14%, ROUGE-2: 13.25%, and ROUGE-L: 36.32%. The proposed hybrid approach shows improved performance on the CORD-19 dataset when compared to existing unsupervised text summarization methods. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

285. Supplementing elearning systems with adaptive content generation elements.

Author: Parahonco, Alexandr and Petic, Mircea
Subjects: *AUTOMATIC summarization, *TEXT summarization, *ROMANIAN language, *RUSSIAN language, *ENGLISH language
Abstract: The paper describes automatic summarization as one of the topic that helps elearning system to be more adaptable on content generation. This article treat automatic summarization with approaches that provide the ability to summarize texts for different languages. In the case of this article, it is about the English, Romanian and Russian languages. The paper contains both the description of the problem and different approaches already used by other researchers. Next, the data with which the automatic summarization experiments were carried out were described. The metrics with which we can evaluate the quality of the summarization result were presented. Finally, some thoughts were formulated regarding the results obtained in the experiment. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

286. 基于关键信息指导的文本摘要模型.

Author: 林舟 and 周绮凤
Subjects: *TEXT summarization, *INFORMATION processing
Abstract: Existing abstractive text summarization models lack attention to keyword information, which leads to the loss of key information in the input text. A keyword semantic information enhancement pointer-generator networks, named KSIE-PGN, is proposed. Firstly, the keyword selection model KSBERT is built to extract keywords. Secondly, a keyword-masked coverage mechanism based on the information of keywords is proposed. When using the coverage mechanism, the continuous attention to keywords in the decoding process is retained. Then, the KSIE-PGN model integrates keyword information in the decoding process including the keyword semantic vector and the keyword context vector. Therefore, the decoder can avoid losing the key information in the input text. The experimental results on the CNN/Daily Mail dataset show that the model can capture the key information in the input text well. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

287. An abstractive text summarization technique using transformer model with self-attention mechanism.

Author: Kumar, Sandeep and Solanki, Arun
Subjects: *TEXT summarization, *NATURAL language processing
Abstract: Creating a summarized version of a text document that still conveys precise meaning is an incredibly complex endeavor in natural language processing (NLP). Abstract text summarization (ATS) is the process of using facts from source sentences and merging them into concise representations while maintaining the content and intent of the text. Manually summarizing large amounts of text are challenging and time-consuming for humans. Therefore, text summarization has become an exciting research focus in NLP. This research paper proposed an ATS model using a Transformer Technique with Self-Attention Mechanism (T2SAM). The self-attention mechanism is added to the transformer to solve the problem of coreference in text. This makes the system to understand the text better. The proposed T2SAM model improves the performance of text summarization. It is trained on the Inshorts News dataset combined with the DUC-2004 shared tasks dataset. The performance of the proposed model has been evaluated using the ROUGE metrics, and it has been shown to outperform the existing state-of-the-art baseline models. The proposed model gives the training loss minimum to 1.8220 from 10.3058 (at the starting point) up to 30 epochs, and it achieved model accuracy 48.50% F1-Score on both the Inshorts and DUC-2004 news datasets. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

288. Graph-Based Extractive Text Summarization Sentence Scoring Scheme for Big Data Applications.

Author: Verma, Jai Prakash, Bhargav, Shir, Bhavsar, Madhuri, Bhattacharya, Pronaya, Bostani, Ali, Chowdhury, Subrata, Webber, Julian, and Mehbodniya, Abolfazl
Subjects: *TEXT summarization, *COMPUTATIONAL linguistics, *BIG data, *NATURAL language processing, *TEXT mining, *SENTIMENT analysis
Abstract: The recent advancements in big data and natural language processing (NLP) have necessitated proficient text mining (TM) schemes that can interpret and analyze voluminous textual data. Text summarization (TS) acts as an essential pillar within recommendation engines. Despite the prevalent use of abstractive techniques in TS, an anticipated shift towards a graph-based extractive TS (ETS) scheme is becoming apparent. The models, although simpler and less resource-intensive, are key in assessing reviews and feedback on products or services. Nonetheless, current methodologies have not fully resolved concerns surrounding complexity, adaptability, and computational demands. Thus, we propose our scheme, GETS, utilizing a graph-based model to forge connections among words and sentences through statistical procedures. The structure encompasses a post-processing stage that includes graph-based sentence clustering. Employing the Apache Spark framework, the scheme is designed for parallel execution, making it adaptable to real-world applications. For evaluation, we selected 500 documents from the WikiHow and Opinosis datasets, categorized them into five classes, and applied the recall-oriented understudying gisting evaluation (ROUGE) parameters for comparison with measures ROUGE-1, 2, and L. The results include recall scores of 0.3942, 0.0952, and 0.3436 for ROUGE-1, 2, and L, respectively (when using the clustered approach). Through a juxtaposition with existing models such as BERTEXT (with 3-gram, 4-gram) and MATCHSUM, our scheme has demonstrated notable improvements, substantiating its applicability and effectiveness in real-world scenarios. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

289. Automatic Text Summarization of Konkani Folk Tales Using Supervised Machine Learning Algorithms and Language Independent Features.

Author: D'Silva, Jovi and Sharma, Uzzal
Subjects: *TEXT summarization, *MACHINE learning
Abstract: Automatic text summarization is an emerging field of research in Natural Language Processing. This work is a novel attempt to include a low-resource language to the domain of Automatic Text Summarization. We use supervised machine learning algorithms to perform single document extractive automatic text summarization on documents in a low-resource language, Konkani. In particular, we propose using language independent features to train supervised machine learning algorithms using a Konkani dataset, specifically devised for the experimentation using books on Konkani folktale literature. We approach the automatic text summarization task as a binary classification problem, and the algorithms, once trained, classify the sentences based on their relevance to generate a summary. Thereafter, the performance of popular linear and non-linear supervised machine learning algorithms is evaluated using K-fold cross-validation. The summary generated by the systems is compared with human-generated summaries to verify its effectiveness. The results show that the linear models exhibit better performance in comparison with the non-linear models; however, all the models could beat the baselines. The output produced by the proposed methodology generates promising summaries without the need for any language-specific domain knowledge. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

290. Turkish abstractive text summarization using pretrained sequence-to-sequence models.

Author: Baykara, Batuhan and Güngör, Tunga
Subjects: TEXT summarization, LANGUAGE models, TURKISH language, ENGLISH language
Abstract: The tremendous amount of increase in the number of documents available on the Web has turned finding the relevant piece of information into a challenging, tedious, and time-consuming activity. Accordingly, automatic text summarization has become an important field of study by gaining significant attention from the researchers. Lately, with the advances in deep learning, neural abstractive text summarization with sequence-to-sequence (Seq2Seq) models has gained popularity. There have been many improvements in these models such as the use of pretrained language models (e.g., GPT, BERT, and XLM) and pretrained Seq2Seq models (e.g., BART and T5). These improvements have addressed certain shortcomings in neural summarization and have improved upon challenges such as saliency, fluency, and semantics which enable generating higher quality summaries. Unfortunately, these research attempts were mostly limited to the English language. Monolingual BERT models and multilingual pretrained Seq2Seq models have been released recently providing the opportunity to utilize such state-of-the-art models in low-resource languages such as Turkish. In this study, we make use of pretrained Seq2Seq models and obtain state-of-the-art results on the two large-scale Turkish datasets, TR-News and MLSum, for the text summarization task. Then, we utilize the title information in the datasets and establish hard baselines for the title generation task on both datasets. We show that the input to the models has a substantial amount of importance for the success of such tasks. Additionally, we provide extensive analysis of the models including cross-dataset evaluations, various text generation options, and the effect of preprocessing in ROUGE evaluations for Turkish. It is shown that the monolingual BERT models outperform the multilingual BERT models on all tasks across all the datasets. Lastly, qualitative evaluations of the generated summaries and titles of the models are provided. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

291. Automatic Short Text Summarization Techniques in Social Media Platforms.

Author: Ghanem, Fahd A., Padma, M. C., and Alkhatib, Ramez
Subjects: TEXT summarization, SOCIAL media, USER-generated content
Abstract: The rapid expansion of social media platforms has resulted in an unprecedented surge of short text content being generated on a daily basis. Extracting valuable insights and patterns from this vast volume of textual data necessitates specialized techniques that can effectively condense information while preserving its core essence. In response to this challenge, automatic short text summarization (ASTS) techniques have emerged as a compelling solution, gaining significant importance in their development. This paper delves into the domain of summarizing short text on social media, exploring various types of short text and the associated challenges they present. It also investigates the approaches employed to generate concise and meaningful summaries. By providing a survey of the latest methods and potential avenues for future research, this paper contributes to the advancement of ASTS in the ever-evolving landscape of social media communication. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

292. Improving Abstractive Dialogue Summarization Using Keyword Extraction.

Author: Yoo, Chongjae and Lee, Hwanhee
Subjects: TEXT summarization
Abstract: Abstractive dialogue summarization aims to generate a short passage that contains important content for a particular dialogue spoken by multiple speakers. In abstractive dialogue summarization systems, capturing the subject in the dialogue is challenging owing to the properties of colloquial texts. Moreover, the system often generates uninformative summaries. In this paper, we propose a novel keyword-aware dialogue summarization system (KADS) that easily captures the subject in the dialogue to alleviate the problem mentioned above through the efficient usage of keywords. Specifically, we first extract the keywords from the input dialogue using a pre-trained keyword extractor. Subsequently, KADS efficiently leverages the keywords information of the dialogue to the transformer-based dialogue system by using the pre-trained keyword extractor. Extensive experiments performed on three benchmark datasets show that the proposed method outperforms the baseline system. Additionally, we demonstrate that the proposed keyword-aware dialogue summarization system exhibits a high-performance gain in low-resource conditions where the number of training examples is highly limited. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

293. Polarity classification on twitter data for classifying sarcasm using clause pattern for sentiment analysis.

Author: Prasanna, M. S. M., Shaila, S. G., and Vadivel, A.
Subjects: SENTIMENT analysis, ONLINE social networks, TEXT summarization, SARCASM, DECISION trees
Abstract: Nowadays, an enormous amount of data is available on the WWW and exponentially growing as well. A lot of users use social networking websites such as Twitter, Facebook, Instagram, and Google+ as common platforms for sharing and exchanging views and opinions on any topics/events. The researchers have considered the reviews and views of the users on these platforms for sentiment analysis, opinion mining, question answering, text summarization, etc. The paper proposes a novel approach for identifying reviews or opinion of users having sarcasm in the text patterns at the clause level. The sentences are classified into four categories such as Simple Sentences, Compound Sentences, Complex Sentences, and Compound-Complex Sentences based on the rules derived from a decision tree. The Simple Sentences and Complex Sentences alone are considered for analysing the sentence patterns where a positive sentiment contrasts with a negative polarity and vice-versa. The decision tree and neuro-fuzzy rules are used on sentence structures to classify the sentences into sarcastic and non-sarcastic sentence patterns. Membership functions are used to map the fuzzy rules and linguistic grading is used for grading the sarcastic patterns. The proposed approach is evaluated on Twitter Dataset and found that the results are promising with the recent and relevant work. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

294. Capturing product/service improvement ideas from social media based on lead user theory.

Author: Yin, Chang, Jiang, Cuiqing, Jain, Hemant K., Liu, Yao, and Chen, Bo
Subjects: TEXT summarization, SOCIAL media, USER-generated content, INFORMATION overload, AUTOMOBILE industry, MACHINE learning
Abstract: Capturing valuable product/service improvement ideas is helpful for the development of new features. However, the existing methods for capturing such improvement ideas have the disadvantages of high cost, long time lag, information overload, and difficulty in getting a response. We propose an innovative framework based on lead user theory for capturing product/service improvement ideas from user‐generated content on social media (henceforth called "chatter"). To identify the chatter containing improvement ideas, we design a machine‐learning‐based imbalanced classification model. Additionally, we use text summarization technology to get a rough sense of improvement ideas from the selected chatter. We validate the proposed framework by a case study in the automotive industry. The results demonstrate that the ideas extracted by our framework are breakthrough innovative, useful, feasible, and adoptable. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

295. occams : A Text Summarization Package.

Author: White, Clinton T., Molino, Neil P., Yang, Julia S., and Conroy, John M.
Subjects: MULTILINGUALISM, PYTHON programming language, HOUSEHOLD budgets, INTERFACES (Physical sciences), CONJUNCTIONS (Grammar)
Abstract: Extractive text summarization selects asmall subset of sentences from a document, which gives good "coverage" of a document. When given a set of term weights indicating the importance of the terms, the concept of coverage may be formalized into a combinatorial optimization problem known as the budgeted maximum coverage problem. Extractive methods in this class are known to beamong the best of classic extractive summarization systems. This paper gives a synopsis of thesoftware package occams, which is a multilingual extractive single and multi-document summarization package based on an algorithm giving an optimal approximation to the budgeted maximum coverage problem. The occams package is written in Python and provides an easy-to-use modular interface, allowing it to work in conjunction with popular Python NLP packages, such as nltk, stanza or spacy. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

296. Evaluation of text summarization techniques in healthcare domain: Pharmaceutical drug feedback.

Author: Arora, Monika, Mudgil, Pooja, Sharma, Utkarsh, Chopra, Chaitanya, and Singh, Ngangbam Herojit
Subjects: TEXT summarization, DRUGS, MEDICAL terminology, RESEARCH personnel, MEDICAL care
Abstract: Text summarization techniques offer a way to address the significant challenges faced by clinicians and researchers due to the exponential growth of information in healthcare on the internet. By condensing lengthy text into concise summaries, these techniques facilitate faster, easier, and convenient access to relevant information. This is particularly beneficial in use cases such as online user feedback/reviews about drugs, where valuable insights can be obtained that extend beyond clinical trials and observational studies. This paper comprehensively evaluates six widely used text summarization techniques (LSA, Luhn's Method, Text Rank, T5 Transformer, and Kullback-Leibler, BERT) in extracting key insights, themes and patterns about drugs from online drug reviews. The evaluation considers both quantitative and qualitative aspects, focusing on their applicability to the challenging medical terminology, which is known for its inherent intricacies and complexities. The findings of this study showed the performance of text summarization techniques using metrics such as F1 score, Recall, and Precision, focused on the unigram, bigram, and trigram overlap between the generated text summaries and the reference summaries, utilizing the ROUGE-1, ROUGE-2, and ROUGE-L evaluation methods. It is shown that results showed TextRank to be the most effective text summarization method followed by BERT when working with Medical Terminology in Healthcare & Biomedical Informatics, given its complex hierarchy and extensive vocabulary of medical terms. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

297. Hierarchical Clause Annotation: Building a Clause-Level Corpus for Semantic Parsing with Complex Sentences.

Author: Fan, Yunlong, Li, Bin, Sataer, Yikemaiti, Gao, Miao, Shi, Chuanqi, Cao, Siyi, and Gao, Zhiqiang
Subjects: PARSING (Computer grammar), TEXT summarization, SEMANTICS, MACHINE translating, NATURAL language processing, CORPORA, DATA mining, ANNOTATIONS
Abstract: Featured Application: Hierarchical clause annotation could be applied in many downstream tasks of natural language processing, including abstract meaning representation parsing, semantic dependency parsing, text summarization, argument mining, information extraction, question answering, machine translation, etc. Most natural-language-processing (NLP) tasks suffer performance degradation when encountering long complex sentences, such as semantic parsing, syntactic parsing, machine translation, and text summarization. Previous works addressed the issue with the intuition of decomposing complex sentences and linking simple ones, such as rhetorical-structure-theory (RST)-style discourse parsing, split-and-rephrase (SPRP), text simplification (TS), simple sentence decomposition (SSD), etc. However, these works are not applicable for semantic parsing such as abstract meaning representation (AMR) parsing and semantic dependency parsing due to misalignments with semantic relations and unavailabilities to preserve the original semantics. Following the same intuition and avoiding the deficiencies of previous works, we propose a novel framework, hierarchical clause annotation (HCA), for capturing clausal structures of complex sentences, based on the linguistic research of clause hierarchy. With the HCA framework, we annotated a large HCA corpus to explore the potentialities of integrating HCA structural features into semantic parsing with complex sentences. Moreover, we decomposed HCA into two subtasks, i.e., clause segmentation and clause parsing, and provide neural baseline models for more-silver annotations. In evaluating the proposed models on our manually annotated HCA dataset, the performances of clause segmentation and parsing resulted in 91.3% F1-scores and 88.5% Parseval scores, respectively. Due to the same model architectures employed, the performance differences of the clause/discourse segmentation and parsing subtasks was reflected in our HCA corpus and compared discourse corpora, where our sentences contained more segment units and fewer interrelations than those in the compared corpora. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

298. A Comparison of Summarization Methods for Duplicate Software Bug Reports.

Author: Mukhtar, Samal, Primadani, Claudia Cahya, Lee, Seonah, and Jung, Pilsu
Subjects: TEXT summarization, COMPUTER software, SOFTWARE maintenance
Abstract: Bug reports vary in length, while some bug reports are lengthy, others are too brief to describe bugs in detail. In such a case, duplicate bug reports can serve as valuable resources for enriching bug descriptions. However, existing bug summarization methods mainly focused on summarizing a single bug report. In this paper, we focus on summarizing duplicate bug reports. By doing so, we aim to obtain an informative summary of bug reports while reducing redundant sentences in the summary. We apply several text summarization methods to duplicate bug reports. We then compare summarization results generated by different summarization methods and identify the most effective method for summarizing duplicate bug reports. Our comparative experiment reveals that the extractive multi-document method based on TF-IDF is the most effective in the summarization. This method successfully captures the relevant information from duplicate bug reports, resulting in comprehensive summaries. These results contribute to the advancement of bug summarization techniques, especially in summarizing duplicate bug reports. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

299. A Comprehensive Study of ChatGPT: Advancements, Limitations, and Ethical Considerations in Natural Language Processing and Cybersecurity.

Author: Alawida, Moatsum, Mejri, Sami, Mehmood, Abid, Chikhaoui, Belkacem, and Isaac Abiodun, Oludare
Subjects: *CHATGPT, *NATURAL language processing, *LANGUAGE models, *TEXT summarization, *INTERNET security, *HAZARD mitigation
Abstract: This paper presents an in-depth study of ChatGPT, a state-of-the-art language model that is revolutionizing generative text. We provide a comprehensive analysis of its architecture, training data, and evaluation metrics and explore its advancements and enhancements over time. Additionally, we examine the capabilities and limitations of ChatGPT in natural language processing (NLP) tasks, including language translation, text summarization, and dialogue generation. Furthermore, we compare ChatGPT to other language generation models and discuss its applicability in various tasks. Our study also addresses the ethical and privacy considerations associated with ChatGPT and provides insights into mitigation strategies. Moreover, we investigate the role of ChatGPT in cyberattacks, highlighting potential security risks. Lastly, we showcase the diverse applications of ChatGPT in different industries and evaluate its performance across languages and domains. This paper offers a comprehensive exploration of ChatGPT's impact on the NLP field. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

300. Web Scraping using Natural Language Processing: Exploiting Unstructured Text for Data Extraction and Analysis.

Author: Pichiyan, Vijayaragavan, Muthulingam, S, G, Sathar, Nalajala, Sunanda, Ch, Akhil, and Das, Manmath Nath
Subjects: NATURAL language processing, DATA extraction, DATA analysis, TEXT summarization, INTERNET content
Abstract: In recent years, combining web scraping techniques with Natural Language Processing (NLP) has emerged as a powerful approach to unlock deeper insights from unstructured textual data. This research study presents a detailed exploration of web scraping using Natural Language Processing (NLP) techniques, demonstrating how these methodologies can be synergistically integrated to extract and analyze unstructured text from diverse web sources. This research study analyzes the challenges posed by unstructured data on the web and how NLP can play a pivotal role in converting this text into structured and actionable information. The first part of the paper covers an overview of web scraping methods, including rule-based parsing, XPath queries, and the use of web scraping libraries such as BeautifulSoup and Scrapy. The second part of this research work focuses on applying NLP techniques to process and analyze the extracted textual data. Further, the preprocessing steps such as tokenization, stemming, and stop word removal, are analyzed followed by more advanced techniques like Named Entity Recognition. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,669 results on '"TEXT summarization"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources