Back to Search Start Over

Summarizing Online Movie Reviews: A Machine Learning Approach to Big Data Analytics

Authors :
Mazen Zaindin
Muhammad Adnan Gul
Shafiq Ahmad
Syed Atif Ali Shah
Muhammad Firdausi
Atif Khan
M. Irfan Uddin
Source :
Scientific Programming, Vol 2020 (2020)
Publication Year :
2020
Publisher :
Hindawi Limited, 2020.

Abstract

Information is exploding on the web at exponential pace, and online movie review over the web is a substantial source of information for online users. However, users write millions of movie reviews on regular basis, and it is not possible for users to condense the reviews. Classification and summarization of reviews is a difficult task in computational linguistics. Hence, an automatic method is demanded to summarize the vast amount of movie reviews, and this method will permit the users to speedily distinguish between positive and negative features of a movie. This work has proposed a classification and summarization method for movie reviews. For movie review classification, bag-of-words feature extraction technique is used to extract unigrams, bigrams, and trigrams as a feature set from given review documents and represent the review documents as a vector. Next, the Na¨ıve Bayes algorithm is employed to categorize the movie reviews (signified as a feature vector) into negative and positive reviews. For the task of movie review summarization, word2vec model is used to extract features from classified movie review sentences, and then semantic clustering technique is used to cluster semantically related review sentences. Different text features are employed to compute the salience score of all review sentences in clusters. Finally, the best-ranked review sentences are picked based on top salience scores to form a summary of movie reviews. Empirical results indicate that the suggested machine learning approach performed better than benchmark summarization approaches.

Details

ISSN :
1875919X and 10589244
Volume :
2020
Database :
OpenAIRE
Journal :
Scientific Programming
Accession number :
edsair.doi.dedup.....12f6d48120049d9e96d71132861c0800
Full Text :
https://doi.org/10.1155/2020/5812715