Back to Search Start Over

Big Data Text Summarization - Attack Westminster

Authors :
Gallagher, Colm
Dyer, Jamie
Liebold, Jeanine
Becker, Aaron
Yang, Limin
Gallagher, Colm
Dyer, Jamie
Liebold, Jeanine
Becker, Aaron
Yang, Limin
Publication Year :
2018

Abstract

Automatic text summarization, a process of distilling the most important information from a text document, is to create an abridged summary with software. Basically, in this task, we can regard the "summarization" as a function which takes a single document or multiple documents as an input and has the summary as an output. There are two ways that we can manage to create a summary: extractive and abstractive. The extractive summarization means that we select the most relevant sentences from the input and concatenate them to form a summary. Graph-based algorithm like TextRank, Feature-based models like TextTeaser, Topic-based models like Latent Semantic Analysis (LSA), and Grammar-based models could be viewed as approaches to extractive summarization. Abstractive summarization aims to create a summary similar to humans. It keeps the original intent, but uses new phrases and words not found in the original text. One of the most commonly used models is the encoder-decoder model, a neural network model that is mainly used in machine translation tasks. Recently, there is another combination approach that combines both extractive and abstractive summarization, like Pointer-Generator Network, and the Extract then Abstract model. In this course, we're given both a small dataset (about 500 documents) and a big dataset (about 11,300 documents) that mainly consist of web archives about a specific event. Our group is focusing on reports about a terrorist event -- Attack Westminster. It occurred outside the Palace of Westminster in London on March 22, 2017. The attacker, 52 year-old Briton Khalid Masood, drove a car into pedestrians on the pavement, injuring more than 50 people, 5 of them fatally. The attack was treated as "Islamist-related terrorism". We first created a Solr index for both the small dataset and the big dataset, which helped us to perform various queries to know more about the data. Additionally, the index aided another team to create a gold standard summary of

Details

Database :
OAIster
Notes :
en_US
Publication Type :
Electronic Resource
Accession number :
edsoai.on1198383600
Document Type :
Electronic Resource