Descriptor: "AUTOMATIC summarization" / Publication Type: Dissertations - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"AUTOMATIC summarization"' showing total 7 results

Start Over Descriptor "AUTOMATIC summarization" Publication Type Dissertations

7 results on '"AUTOMATIC summarization"'

1. Low- and high-resource opinion summarization

Author: Bražinskas, Arthur, Titov, Ivan, and Lapata, Maria
Subjects: Customer reviews, Automatic summarization, Automatically produced summaries, e-commerce platforms, in-domain specifics, training signal, small annotated datasets, query-based summarizer
Abstract: Customer reviews play a vital role in the online purchasing decisions we make. The reviews express user opinions that are useful for setting realistic expectations and uncovering important details about products. However, some products receive hundreds or even thousands of reviews, making them time-consuming to read. Moreover, many reviews contain uninformative content, such as irrelevant personal experiences. Automatic summarization offers an alternative - short text summaries capturing the essential information expressed in reviews. Automatically produced summaries can reflect overall or particular opinions and be tailored to user preferences. Besides being presented on major e-commerce platforms, home assistants can also vocalize them. This approach can improve user satisfaction by assisting in making faster and better decisions. Modern summarization approaches are based on neural networks, often requiring thousands of annotated samples for training. However, human-written summaries for products are expensive to produce because annotators need to read many reviews. This has led to annotated data scarcity where only a few datasets are available. Data scarcity is the central theme of our works, and we propose a number of approaches to alleviate the problem. The thesis consists of two parts where we discuss low- and high-resource data settings. In the first part, we propose self-supervised learning methods applied to customer reviews and few-shot methods for learning from small annotated datasets. Customer reviews without summaries are available in large quantities, contain a breadth of in-domain specifics, and provide a powerful training signal. We show that reviews can be used for learning summarizers via a self-supervised objective. Further, we address two main challenges associated with learning from small annotated datasets. First, large models rapidly overfit on small datasets leading to poor generalization. Second, it is not possible to learn a wide range of in-domain specifics (e.g., product aspects and usage) from a handful of gold samples. This leads to subtle semantic mistakes in generated summaries, such as 'great dead on arrival battery.' We address the first challenge by explicitly modeling summary properties (e.g., content coverage and sentiment alignment). Furthermore, we leverage small modules - adapters - that are more robust to overfitting. As we show, despite their size, these modules can be used to store in-domain knowledge to reduce semantic mistakes. Lastly, we propose a simple method for learning personalized summarizers based on aspects, such as 'price,' 'battery life,' and 'resolution.' This task is harder to learn, and we present a few-shot method for training a query-based summarizer on small annotated datasets. In the second part, we focus on the high-resource setting and present a large dataset with summaries collected from various online resources. The dataset has more than 33,000 humanwritten summaries, where each is linked up to thousands of reviews. This, however, makes it challenging to apply an 'expensive' deep encoder due to memory and computational costs. To address this problem, we propose selecting small subsets of informative reviews. Only these subsets are encoded by the deep encoder and subsequently summarized. We show that the selector and summarizer can be trained end-to-end via amortized inference and policy gradient methods.
Published: 2023
Full Text: View/download PDF

2. Automation of summarization evaluation methods and their application to the summarization process

Author: Nahnsen, Thade, Grover, Claire., and Lapata, Mirella
Subjects: 621.382, natural language processing, NLP, automatic summarization, summarization evaluation, sentence ordering
Abstract: Summarization is the process of creating a more compact textual representation of a document or a collection of documents. In view of the vast increase in electronically available information sources in the last decade, filters such as automatically generated summaries are becoming ever more important to facilitate the efficient acquisition and use of required information. Different methods using natural language processing (NLP) techniques are being used to this end. One of the shallowest approaches is the clustering of available documents and the representation of the resulting clusters by one of the documents; an example of this approach is the Google News website. It is also possible to augment the clustering of documents with a summarization process, which would result in a more balanced representation of the information in the cluster, NewsBlaster being an example. However, while some systems are already available on the web, summarization is still considered a difficult problem in the NLP community. One of the major problems hampering the development of proficient summarization systems is the evaluation of the (true) quality of system-generated summaries. This is exemplified by the fact that the current state-of-the-art evaluation method to assess the information content of summaries, the Pyramid evaluation scheme, is a manual procedure. In this light, this thesis has three main objectives. 1. The development of a fully automated evaluation method. The proposed scheme is rooted in the ideas underlying the Pyramid evaluation scheme and makes use of deep syntactic information and lexical semantics. Its performance improves notably on previous automated evaluation methods. 2. The development of an automatic summarization system which draws on the conceptual idea of the Pyramid evaluation scheme and the techniques developed for the proposed evaluation system. The approach features the algorithm for determining the pyramid and bases importance on the number of occurrences of the variable-sized contributors of the pyramid as opposed to word-based methods exploited elsewhere. 3. The development of a text coherence component that can be used for obtaining the best ordering of the sentences in a summary.
Published: 2011

3. Identifying and Resolving Entities in Text

Author: Durrett, Gregory Christopher
Subjects: Computer science, automatic summarization, coreference resolution, entity linking, natural language processing, structured machine learning
Abstract: When automated systems attempt to deal with unstructured text, a key subproblem is identifying the relevant actors in that text---answering the "who" of the narrative being presented. This thesis is concerned with developing tools to solve this NLP subproblem, which we call entity analysis. We focus on two tasks in particular: first, coreference resolution, which consists of within-document identification of entities, and second, entity linking, which involves identifying each of those entities with an entry in a knowledge base like Wikipedia.One of the challenges of coreference is that it requires dealing with many different linguistic phenomenon: constraints in reference resolution arise from syntax, semantics, discourse, and pragmatics. This diversity of effects to handle makes it difficult to build effective learning-based coreference resolution systems rather than relying on handcrafted features. We show that a set of simple features inspecting surface lexical properties of a document is sufficient to capture a range of these effects, and that these can power an efficient, high-performing coreference system.Our analysis of our base coreference system shows that some examples can only be resolved successfully by exploiting world knowledge or deeper knowledge of semantics. Therefore, we turn to the task of entity linking and tackle it not in isolation, but instead jointly with coreference. By doing so, our coreference module can draw upon knowledge from a resource like Wikipedia, and our entity linking module can draw on information from multiple mentions of the entity we are attempting to resolve. Our joint model of these tasks, which additionally models semantic types of entities, gives strong performance across the board and shows that effectively exploiting these interactions is a natural way to build better NLP systems.Having developed these tools, we show that they can be useful for a downstream NLP task, namely automatic summarization. We develop an extractive and compressive automatic summarization system, and argue that one deficiency it has is its inability to use pronouns coherently in generated summaries, as we may have deleted content that contained a pronoun's antecedent. Our entity analysis machinery allows us to place constraints on summarization that guarantee pronoun interpretability: each pronoun must have a valid antecedent included in the summary or it must be expanded into a reference that makes sense in isolation. We see improvements in our system's ability to produce summaries with coherent pronouns, which suggests that deeper integration of various parts of the NLP stack promises to yield better systems for text understanding.
Published: 2016

4. NLP Driven Models for Automatically Generating Survey Articles for Scientific Topics.

Author: Jha, Rahul Kumar
Subjects: natural language processing, automatic summarization, scholarly data
Abstract: This thesis presents new methods that use natural language processing (NLP) driven models for summarizing research in scientific fields. Given a topic query in the form of a text string, we present methods for finding research articles relevant to the topic as well as summarization algorithms that use lexical and discourse information present in the text of these articles to generate coherent and readable extractive summaries of past research on the topic. In addition to summarizing prior research, good survey articles should also forecast future trends. With this motivation, we present work on forecasting future impact of scientific publications using NLP driven features.
Published: 2015

5. Knowledge Acquisition with Multiple Summarization Techniques for Legal Text

Author: Galgani, Filippo
Subjects: Citation Analysis, Natural Language Processing, Automatic Summarization
Abstract: Today's world is all about information. The number of documents on Internet, already massive, is growing at an exponential pace. Support from automatic tools is needed to organize, filter and reduce information. Automatic summarization is one of the natural language processing applications which can assist in accessing such vast amount of information. However, despite over 50 years of research, successful summarization systems are limited to a few examples. To deal with the large variety of applications, domains and user needs, methods are needed to quickly create special purpose summarizer systems. The legal domain is one where the information overload is particularly pronounced. Common law systems rely on the concept of precedence: on how the courts have interpreted the law in individual cases. Information search thus becomes an onerous task for judges and lawyers, who have to select relevant cases in large collections of judgements. This thesis presents a framework, based on incremental knowledge acquisition, for automatic summarization of legal case reports. We explore several techniques for legal text summarization and citation analysis, with a particular focus on citation-based summarization and catchphrase extraction. The system is based on an efficient knowledge acquisition framework that integrates several base techniques into a single approach. To recognize relevant text fragments, rules are created to combine frequency, statistical, citation and linguistic information in a context-dependent way. The knowledge acquisition framework strongly supports the creation of powerful knowledge bases for summarization, using a training corpus to guide rule acquisition. Using this framework, we created three knowledge bases for citation classification, case categorization, and catchphrase extraction in legal texts. The knowledge bases were automatically evaluated on a large corpus of cases, showing good performance. Despite containing only a preliminary and limited number of rules, the system already outperforms existing state-of-the-art general purpose summarizers and baseline machine learning approaches. The performance of the system also suggests that with additional knowledge acquisition sessions the results could be further improved. A small number of summaries was evaluated manually by legal experts, and the extracts were rated similarly to the original catchphrases given by the court. Our investigation of knowledge acquisition methods for summarization therefore demonstrates that it is possible to quickly create effective special-purpose summarizers, that combine multiple information, or base techniques, into a single context-aware approach.
Published: 2013

6. Enhanced web-based summary generation for search.

Author: Wenerstrom, Brent, 1980-
Subjects: Automatic summarization, Search engines, Graph theory, Social media
Abstract: After a user types in a search query on a major search engine, they are presented with a number of search results. Each search result is made up of a title, brief text summary and a URL. It is then the user's job to select documents for further review. Our research aims to improve the accuracy of users selecting relevant documents by improving the way these web pages are summarized. Improvements in accuracy will lead to time improvements and user experience improvements. We propose ReClose, a system for generating web document summaries. ReClose generates summary content through combining summarization techniques from query-biased and query-independent summary generation. Query-biased summaries generally provide query terms in context. Query-independent summaries focus on summarizing documents as a whole. Combining these summary techniques led to a 10% improvement in user decision making over Google generated summaries. Color-coded ReClose summaries provide keyword usage depth at a glance and also alert users to topic departures. Color-coding further enhanced ReClose results and led to a 20% improvement in user decision making over Google generated summaries. Many online documents include structure and multimedia of various forms such as tables, lists, forms and images. We propose to include this structure in web page summaries. We found that the expert user was insignificantly slowed in decision making while the majority of average users made decisions more quickly using summaries including structure without any decrease in decision accuracy. We additionally extended ReClose for use in summarizing large numbers of tweets in tracking flu outbreaks in social media. The resulting summaries have variable length and are effective at summarizing flu related trends. Users of the system obtained an accuracy of 0.86 labeling multi-tweet summaries. This showed that the basis of ReClose is effective outside of web documents and that variable length summaries can be more effective than fixed length. Overall the ReClose system provides unique summaries that contain more informative content than current search engines produce, highlight the results in a more meaningful way, and add structure when meaningful. The applications of ReClose extend far beyond search and have been demonstrated in summarizing pools of tweets.
Published: 2012

7. Extracting Opinions from Blog Comments: Analysis, Design and Applications

Author: Raghavan, Preethi
Subjects: Computer Science, opinion mining, cluster analysis, cognitive engineering, blog comments, automatic summarization, voice of the customer
Abstract: Online interactive media such as blogs, discussion forums etc. present many challenges as organizations and individuals attempt to analyze and comprehend the collective opinion of others using information technology. We address the need for an open source sense-respond cyberinfrastructure framework that an organization could apply to extract and analyze the ‘voice of the customer’, from blog comments, helping decision makers develop strategic, tactical and operational planning initiatives. As a step in this direction, we design an opinion mining tool that integrates information extraction and document search and analysis techniques to provide a concise representation of comments to the user. The tool interface is designed by adopting a cognitive engineering methodology in order to effectively abstract out the complexity of the underlying analysis techniques, allow easy navigation and present information to the user in a coherent manner.
Published: 2009

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

7 results on '"AUTOMATIC summarization"'

1. Low- and high-resource opinion summarization

2. Automation of summarization evaluation methods and their application to the summarization process

3. Identifying and Resolving Entities in Text

4. NLP Driven Models for Automatically Generating Survey Articles for Scientific Topics.

5. Knowledge Acquisition with Multiple Summarization Techniques for Legal Text

6. Enhanced web-based summary generation for search.

7. Extracting Opinions from Blog Comments: Analysis, Design and Applications

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

7 results on '"AUTOMATIC summarization"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources