1. Web scraping techniques for price statistics – the Romanian experience.
- Author
-
Oancea, Bogdan and Necula, Marian
- Subjects
STATISTICS ,CONSUMER price indexes ,ELECTRONIC commerce ,TRANSACTION systems (Computer systems) ,FEATURE selection - Abstract
Internet has been widely recognized as a new data source that can be used either to compile new statistics, or to enhance the traditional ones in several fields of official statistics. Considering that online commerce has a rapid growing share in the overall household's consumption expenditures behavior broke down by distribution/transaction channel, price statistics is one of the research areas in official statistics which benefits greatly from this new data source. This paper provides a description of the Romanian National Institute of Statistics experience regarding the use of Internet as a data source and an exercise in compiling an experimental consumer price index (CPI) based on Internet data. Aim the pilot project was to investigate whether alternative data collection methods for price statistics can be introduced and enhance the statistical production system in the near future and, most important, it was a great firsthand opportunity to identify methodological challenges which are inherent to Big Data sources from the official statistics point of view. The tool chain is built on top of the traditional methodology used for CPI, enhanced by new features such as simple clustering technique for treating high volatility present in the collected data using a distance-based method for classification similar products. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF