Back to Search
Start Over
Summarizing Online Movie Reviews: A Machine Learning Approach to Big Data Analytics
- Source :
- Scientific Programming, Vol 2020 (2020)
- Publication Year :
- 2020
- Publisher :
- Hindawi Limited, 2020.
-
Abstract
- Information is exploding on the web at exponential pace, and online movie review over the web is a substantial source of information for online users. However, users write millions of movie reviews on regular basis, and it is not possible for users to condense the reviews. Classification and summarization of reviews is a difficult task in computational linguistics. Hence, an automatic method is demanded to summarize the vast amount of movie reviews, and this method will permit the users to speedily distinguish between positive and negative features of a movie. This work has proposed a classification and summarization method for movie reviews. For movie review classification, bag-of-words feature extraction technique is used to extract unigrams, bigrams, and trigrams as a feature set from given review documents and represent the review documents as a vector. Next, the Na¨ıve Bayes algorithm is employed to categorize the movie reviews (signified as a feature vector) into negative and positive reviews. For the task of movie review summarization, word2vec model is used to extract features from classified movie review sentences, and then semantic clustering technique is used to cluster semantically related review sentences. Different text features are employed to compute the salience score of all review sentences in clusters. Finally, the best-ranked review sentences are picked based on top salience scores to form a summary of movie reviews. Empirical results indicate that the suggested machine learning approach performed better than benchmark summarization approaches.
- Subjects :
- Article Subject
Computer science
business.industry
Bigram
Feature vector
Feature extraction
Big data
02 engineering and technology
Machine learning
computer.software_genre
Automatic summarization
Computer Science Applications
QA76.75-76.765
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Trigram
Word2vec
Computer software
Artificial intelligence
Computational linguistics
business
computer
Software
Subjects
Details
- ISSN :
- 1875919X and 10589244
- Volume :
- 2020
- Database :
- OpenAIRE
- Journal :
- Scientific Programming
- Accession number :
- edsair.doi.dedup.....12f6d48120049d9e96d71132861c0800
- Full Text :
- https://doi.org/10.1155/2020/5812715