Back to Search Start Over

Data sets describing the circle of life in Ruby hosting, 2003-2016.

Authors :
Squire, Megan
Source :
Empirical Software Engineering; Apr2018, Vol. 23 Issue 2, p1123-1152, 30p
Publication Year :
2018

Abstract

Studying software repositories and hosting services can provide valuable insights into the behaviors of large groups of software developers and their projects. Traditionally, most analysis of metadata collected from software project hosting services has been conducted by specifying some short window of time, typically just a few years. To date, few - if any - studies have been built from data comprising the entirety of a hosting facility’s lifespan: from its birth to its death, and rebirth in another form. Thus, the first contribution of this paper is to present two data sets that support the historical analysis of over ten years of collected metadata from the now-defunct RubyForge project hosting site, as well as the follow-on successor to RubyForge, the RubyGems package (“gem”) hosting facility. The data sets and samples of usage demonstrated in this paper include: analyses of overall forge growth over time, presentation of data and analyses of project-level characteristics on both forges and their changes over time (for example in licenses, languages, and so on), and demonstration of how to use developer-level metadata (for example counts of new developers and calculation of developer-project density) to assess changes in person-level activity on both sites over time. Finally, because RubyForge was phased out and the gem-hosting portion of it was replaced by RubyGems, all the gems within RubyForge projects were transferred by project owners and by the site owners themselves into the RubyGems hosting facility. Thus, the data sets in this paper represent a unique opportunity to study projects as they moved from one ecosystem to another, and as such we show several methods for locating related projects between the two forges, and for building a cross-forge, longitudinal project history using information from both forges. These data sets and sample analyses in this paper will be relevant to researchers studying long-term software evolution, and distributed, hosted, or collaborative software development environments. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13823256
Volume :
23
Issue :
2
Database :
Complementary Index
Journal :
Empirical Software Engineering
Publication Type :
Academic Journal
Accession number :
128969124
Full Text :
https://doi.org/10.1007/s10664-017-9581-6