Back to Search Start Over

Model degradation in web derived text-based models

Authors :
Daas, Piet
Jansen, Jelmer
Daas, Piet
Jansen, Jelmer
Publication Year :
2020

Abstract

[EN] Getting an overview of the innovative companies in a country is a challenging task. Traditionally, this is done by sending a questionnaire to a sample of large companies. For this an alternative approach has been developed: determining if a company is innovative by studying the text on the main page of its website. The text-based model created is able to reproduce the results from the survey and is also able to detect small innovative companies, such as startups. However, model stability was found to be a serious problem. It suffered from model degradation which resulted in a gradual decrease in the detection of innovative companies. The accuracy of the model dropped from 93% to 63% over a period of one year. In this paper this phenomenon is described and the data underlying it is studied in great detail. It was found that the combination of the inactivity of a subset of websites and changes in the composition of the words on company websites over time produced this effect. A solution for dealing with this phenomenon is presented and future research is discussed.

Details

Database :
OAIster
Notes :
TEXT, TEXT, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1228704299
Document Type :
Electronic Resource