Back to Search Start Over

Predicting Monthly Pageview of Wikipedia Pages by Neighbor Pages

Authors :
Xiaoqian Ju
Yujia Yang
Shi Lu
Huan Zhao
Source :
Proceedings of the 2020 3rd International Conference on Big Data Technologies.
Publication Year :
2020
Publisher :
ACM, 2020.

Abstract

Predicting traffic has been important for websites' daily services. Developing efficient models for Wikipedia's page traffic would deepen our knowledge about people's behavior on Wikipedia and potentially for other crowdsourcing pages. The current project attempted to experiment with incorporating time series data from a linked page trying to improve the prediction accuracy of future traffic of a page. The current study experimented with three timeseries models. The baseline model uses the monthly traffic of 2019 of a page to predict the monthly traffic of January of 2020. The random neighbor model randomly selects a page which has a hyperlink to the focal page and uses the 2019 data of the focal page and the neighboring page to predict the monthly traffic of January of 2020. The similar neighbor model also uses data from the focal and a neighboring page, but the neighbor is selected based on its content similarity to the focal page. The results show that prediction with a similar neighbor model has better prediction performance than with the Random neighbor model on popular pages. The baseline model has the best performance with the smallest MSE, MAE, and MAPE, while the random neighbor model and similar neighbor model have much larger MSE than the Baseline model.

Details

Database :
OpenAIRE
Journal :
Proceedings of the 2020 3rd International Conference on Big Data Technologies
Accession number :
edsair.doi...........9c1701d24d3de61cda5ebf57bd3ef552