Back to Search
Start Over
Lexicon‐pointed hybrid N‐gram Features Extraction Model (LeNFEM) for sentence level sentiment analysis
- Source :
- Engineering Reports, Vol 3, Iss 8, Pp n/a-n/a (2021)
- Publication Year :
- 2021
- Publisher :
- Wiley, 2021.
-
Abstract
- open access article Sentiment analysis of social media textual posts can provide information and knowledge that is applicable in social settings, business intelligence, evaluation of citizens' opinions in governance, and in mood triggered devices in the Internet of Things. Feature extraction and selection is a key determinant of accuracy and computational cost of machine learning models for such analysis. Most feature extraction and selection techniques utilize bag of words, N‐grams, and frequency‐based algorithms especially Term Frequency‐Inverse Document Frequency. However, these approaches do not consider relationships between words, they ignore words' characteristics and they suffer high feature dimensionality. In this paper we propose and evaluate a feature extraction and selection approach that utilizes a fixed hybrid N‐gram window for feature extraction and minimum redundancy maximum relevance feature selection algorithm for sentence level sentiment analysis. The approach improves the existing features extraction techniques, specifically the N‐gram by generating a hybrid vector from words, Part of Speech (POS) tags, and word semantic orientation. The vector is extracted by using a static trigram window identified by a lexicon where a sentiment word appears in a sentence. A blend of the words, POS tags, and the sentiment orientations of the static trigram are used to build the feature vector. The optimal features from the vector are then selected using minimum redundancy maximum relevance (MRMR) algorithm. Experiments were carried out using the public Yelp dataset to compare the performance of the proposed model and existing feature extraction models (BOW, normal N‐grams and lexicon‐based bag of words semantic orientations). Using supervised machine learning classifiers the experimental results showed that the proposed model had the highest F‐measure (88.64%) compared to the highest (83.55%) from baseline approaches. Wilcoxon test carried out ascertained that the proposed approach performed significantly better than the baseline approaches. Comparative performance analysis with other datasets further affirmed that the proposed approach is generalizable.
- Subjects :
- General Computer Science
Computer science
Feature vector
Feature extraction
Feature selection
computer.software_genre
N‐gram2vec model
Naive Bayes classifier
feature selection
sentiment classification
Feature (machine learning)
N-gram2vec model
minimum redundancy maximum relevance
tf–idf
sentence level SA
business.industry
Sentiment analysis
General Engineering
TF-IDF
QA75.5-76.95
Engineering (General). Civil engineering (General)
Bag-of-words model
Electronic computers. Computer science
lexicon
Artificial intelligence
TA1-2040
business
computer
Natural language processing
Subjects
Details
- Language :
- English
- ISSN :
- 25778196
- Volume :
- 3
- Issue :
- 8
- Database :
- OpenAIRE
- Journal :
- Engineering Reports
- Accession number :
- edsair.doi.dedup.....fe2e86bb2b404d4dd14bdef2fb7ad253