Back to Search Start Over

Measurement and analysis of operating system fault tolerance

Authors :
Lee, I
Tang, D
Iyer, R. K
Publication Year :
1992
Publisher :
United States: NASA Center for Aerospace Information (CASI), 1992.

Abstract

This paper demonstrates a methodology to model and evaluate the fault tolerance characteristics of operational software. The methodology is illustrated through case studies on three different operating systems: the Tandem GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Measurements are made on these systems for substantial periods to collect software error and recovery data. In addition to investigating basic dependability characteristics such as major software problems and error distributions, we develop two levels of models to describe error and recovery processes inside an operating system and on multiple instances of an operating system running in a distributed environment. Based on the models, reward analysis is conducted to evaluate the loss of service due to software errors and the effect of the fault-tolerance techniques implemented in the systems. Software error correlation in multicomputer systems is also investigated.

Subjects

Subjects :
Computer Operations And Hardware

Details

Language :
English
Database :
NASA Technical Reports
Notes :
NAG1-613, , N00014-91-J-1116
Publication Type :
Report
Accession number :
edsnas.19930003352
Document Type :
Report