Back to Search
Start Over
Measurement and analysis of operating system fault tolerance
- Publication Year :
- 1992
- Publisher :
- United States: NASA Center for Aerospace Information (CASI), 1992.
-
Abstract
- This paper demonstrates a methodology to model and evaluate the fault tolerance characteristics of operational software. The methodology is illustrated through case studies on three different operating systems: the Tandem GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Measurements are made on these systems for substantial periods to collect software error and recovery data. In addition to investigating basic dependability characteristics such as major software problems and error distributions, we develop two levels of models to describe error and recovery processes inside an operating system and on multiple instances of an operating system running in a distributed environment. Based on the models, reward analysis is conducted to evaluate the loss of service due to software errors and the effect of the fault-tolerance techniques implemented in the systems. Software error correlation in multicomputer systems is also investigated.
- Subjects :
- Computer Operations And Hardware
Subjects
Details
- Language :
- English
- Database :
- NASA Technical Reports
- Notes :
- NAG1-613, , N00014-91-J-1116
- Publication Type :
- Report
- Accession number :
- edsnas.19930003352
- Document Type :
- Report