Back to Search Start Over

Modelling the characteristics of Web page outlinks

Authors :
Wolfram Dietmar
Ajiferuke Isola
Source :
Scientometrics. 59:43-62
Publication Year :
2004
Publisher :
Springer Science and Business Media LLC, 2004.

Abstract

Using data sampled from top-level Web pages across five high-level domains and from sample pages within individual websites, the authors investigate the frequency distribution of outlinks in Web pages. The observed distributions were fitted to different theoretical distributions to determine the best-fitting model for representing outlink frequency across Web pages. Theoretical models tested include the modified power law (MPL), Mandelbrot (MDB), generalized Waring (GW), generalized inverse Gaussian-Poisson (GIGP), and generalized negative binomial (GNB) distributions. The GIGP and GNB provided good fits for data sets for top-level pages across the high level domains tested, with the GIGP performing slightly better. The lumpiness and bimodal nature of two of the observed outlink distributions from Web pages within a given website resulted in poor fits of the theoretical models. The GIGP was able to provide better fits to these data sets after the top components were truncated. The ability to effectively model Web page attributes, such as the distribution of the number of outlinks per page, paves the way for simulation models of Web page structural content, and makes it possible to estimate the number of outlinks that may be encountered within Web pages of a specific domain or within individual websites.

Details

ISSN :
01389130
Volume :
59
Database :
OpenAIRE
Journal :
Scientometrics
Accession number :
edsair.doi...........6b78d89865f01c5634959acc076621bd