151. Predicting Stock Prices using Ensemble Learning and Sentiment Analysis
- Author
-
Biju R. Mohan, Subham Anmol, Ujjwal Pasupulety, and Aiman Abdullah Anees
- Subjects
0209 industrial biotechnology ,Stock market prediction ,Ensemble forecasting ,Computer science ,Sentiment analysis ,Feature selection ,Regression analysis ,02 engineering and technology ,Ensemble learning ,Random forest ,Support vector machine ,020901 industrial engineering & automation ,Stock exchange ,Technical indicator ,0202 electrical engineering, electronic engineering, information engineering ,Market price ,Econometrics ,020201 artificial intelligence & image processing ,Financial sector - Abstract
The recent success of the application of Artificial Intelligence in the financial sector has resulted in more firms relying on stochastic models for predicting the behaviour of the market. Everyday, quantitative analysts strive to attain better accuracies from their machine learning models for forecasting returns from stocks. Support Vector Machine (SVM) and Random Forest based regression models are known for their effectiveness in accurately predicting closing prices. In this work, we propose a technique for analyzing and predicting stock prices of companies using the aforementioned algorithms as an ensemble. Datasets from India's National Stock Exchange (NSE) containing basic market price information are preprocessed to include well known leading technical indicators as features. Feature selection, which ranks features based on their degree of influence on the final closing price has been incorporated to reduce the size of the training dataset. Additionally, we evaluate the effectiveness of considering the public opinion of a company by employing sentiment analysis. Using a trained Word2Vec model, company specific hash-tagged posts from Twitter are classified as positive or negative. Our proposed ensemble model is then trained on a new dataset which combines the technical indicator data along with the aggregated number of positive/negative tweets of a company over time. Our experiments indicate that in some scenarios, the ensemble model performs better than the constituent models and is highly dependent of the nature and size of the training data. However, combining technical indicator data with aggregated positive/negative tweet counts has a negligible effect on the performance of the ensemble model.
- Published
- 2019