43 results on '"ROLLBACK recovery (Computer science)"'
Search Results
2. SnapFiner: A Page-Aware Snapshot System for Virtual Machines.
- Author
-
Cui, Lei, Hao, Zhiyu, Li, Lun, and Yun, Xiaochun
- Subjects
- *
VIRTUAL machine systems , *PHOTOGRAPHS , *CLOUD computing , *ROLLBACK recovery (Computer science) , *DATA recovery - Abstract
Virtual machine (VM) snapshot, enabling a VM to be resumed from a previously recorded state, is an essential part of cloud infrastructures. Unfortunately, the snapshot data are likely to be lost due to the high rate of disk failures, so that the associated VM fails to recover properly. To enhance data availability without compromising application performance upon rollback recovery, it is desired to place multiple replicas of snapshot across disperse disks. However, due to the large size of replica, it induces non-trivial storage cost when managing massive snapshots in clouds. In this paper, we investigate this problem and find out that the semantic gap existed between snapshot creation and snapshot storing is one key factor inducing high storage cost. To this end, we propose SnapFiner, a page-aware snapshot system for creating and storing massive snapshot files efficiently. First, SnapFiner acquires a fine-grained page categorization with an in-depth page exploration from three orthogonal views, thereby discovering more pages that can be excluded from the snapshot. Second, SnapFiner varies the number of replicas for different page categories based on a page-aware replication policy, achieving low storage cost without compromising availability and performance. Third, SnapFiner handles the loss of pages either intentionally dropped upon snapshot creation or unexpectedly damaged due to disk failures, enabling proper system execution after rollback recovery. We have implemented SnapFiner on QEMU/KVM to justify its practicality for Linux guests. The experimental results demonstrate that SnapFiner reduces the storage cost by 33 and 69.5 percent respectively compared to our previous work PARS and the naive approach on QEMU/KVM and HDFS. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
3. RollSec: Automatically Secure Software States Against General Rollback.
- Author
-
Dai, Weiqi, Du, Yukun, Jin, Hai, Qiang, Weizhong, Zou, Deqing, Xu, Shouhuai, and Liu, Zhongze
- Subjects
- *
ROLLBACK recovery (Computer science) , *DEBUGGING , *VIRTUAL machine systems , *AUTOMATION , *DATA security - Abstract
The rollback mechanism is critical in crash recovery and debugging, but its security problems have not been adequately addressed. This is justified by the fact that existing solutions always require modifications on target software or only work for specific scenarios. As a consequence, rollback is either neglected or restricted or prohibited in existing systems. In this paper, we systematically characterize security threats of rollback as abnormal states of non-deterministic variables and resumed program points caused by rollback. Based on this, we propose RollSec (for Rollback Security), which provides general measurements including state extracting, recording, and compensating, to maintain correctness of these abnormal states for eliminating rollback threats. RollSec can automatically extract these states based on language-independent information of software as protection targets, which will be monitored during run-time, and compensated to correct states on each rollback without requiring extra modifications or supports of specific architectures. At last, we implement a prototype of RollSec to verify its effectiveness, and conduct performance evaluations which demonstrate that only acceptable overhead is introduced. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
4. Reheating the Cold War: US, Russia, and the Revival of Rollback.
- Author
-
Sussman, Gerald
- Subjects
- *
COLD War, 1945-1991 , *ROLLBACK recovery (Computer science) , *INTERNATIONAL relations , *DEMOCRACY -- Economic aspects , *POLITICAL doctrines , *ECONOMICS - Abstract
A neoconservative coalition of oppositional forces, comprised of the Clinton wing of the Democratic Party and their allies in the Republican Party, the liberal mainstream media, and the deep state have promoted a new Cold War against Russia. This is intended as a mobilizing strategy to overturn the Trump presidency, weaken the Russian state, and reconstruct state legitimacy following years of decline in the quality of life and democracy in America. The coalition reconstructed the Cold War as an ideological tool in the interest of continuing to pursue domestic and global neoliberal policies and dealing with a fractious public disenchanted with government, its elected officials, the mainstream media, and a failing democracy. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
5. SBML Protocol for Conquering Simultaneous Failures with Group Dissemination Functionality.
- Author
-
AHN, JINHO
- Subjects
DATA logging ,ROLLBACK recovery (Computer science) ,SOCIAL networks ,CLOUD computing ,DRONE aircraft ,DISASTER resilience - Abstract
This paper presents a new sender based message logging (SBML) protocol to tolerate simultaneous failures by using the beneficial features of FIFO group communication links effectively. The protocol can lift the inherent weakness of the original SBML by replicating the log information of each message sent to a process group into the volatile storages of its members. Therefore, even if only one process in a group survives at a time, our protocol can progress the execution of the entire system without stopping and restarting it. Also, it needs no extra control message by piggybacking the additional information on the control message for logging every previous protocol essentially requires. The experimental results show our protocol can be a low cost solution for addressing the important drawback of the original SBML based on group communication without RSN replication functionality. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
6. Piccolo: A Fast and Efficient Rollback System for Virtual Machine Clusters.
- Author
-
Cui, Lei, Hao, Zhiyu, Peng, Yaqiong, and Yun, Xiaochun
- Subjects
- *
ROLLBACK recovery (Computer science) , *VIRTUAL machine systems , *APPLICATION software , *COMBINATORIAL optimization , *COMPUTER algorithms - Abstract
Rollback is an effective technique to resume the system execution from a recorded intermediate state upon failures, without having to restart the entire system. However, in virtualized environments, rollback of a virtual machine cluster (VMC) produces high network traffic and long service disruption, particularly for a large cluster used for scientific computing, thereby imposing significant overhead both on network and applications. This paper proposes Piccolo, a fast and efficient rollback system, to restore a VMC from snapshot files over data center network. First, we exploit the similarity among VMC snapshots and leverage multicast to deliver the identical pages across VMs placed on disperse hosts, thereby bypassing unnecessary transmission of a large number of pages. Second, we analyze the impact on network traffic of varying VM placements in data center network, formulate the traffic aware placement as an optimization problem, and design a two-tier approximation algorithm that efficiently solves the problem. In addition to presenting Piccolo, we detail its implementation, and evaluate it by a set of experiments. The results show that Piccolo could achieve a significant reduction in terms of total sent data, network traffic and rollback latency compared to the existing generic techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
7. Deadlock detector and solver (DDS).
- Author
-
Aldakheel, Eman
- Subjects
JAVA programming language ,COMPUTER software execution ,FAULT-tolerant computing ,ROLLBACK recovery (Computer science) ,SOFTWARE reliability - Abstract
Deadlock is among the most complex problems affecting the reliability of programs containing multiple, asynchronous threads. When undetected, deadlocks can lead to permanent thread blockage. Current detection methods are typically based on timeout and rollback of computations, resulting in significant delays. This paper presents Deadlock Detector and Solver (DDS), which can quickly detect and resolve circular deadlocks in Java programs. DDS uses a supervisory controller, which monitors program execution and automatically detects deadlocks resulting from hold-and-wait cycles on monitor locks. When a deadlock is detected, DDS uses a preemptive strategy to break the deadlock. Based on our experiments, DDS can in fact resolve deadlocks without significant run-time overhead. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
8. Multithreaded Stochastic PDES for Reactions and Diffusions in Neurons.
- Author
-
Lin, Zhongwei, Tropper, Carl, McDougal, Robert A., Ishlam Patoary, Mohammand Nazrul, Lytton, William W., Yao, Yiping, and Hines, Michael L.
- Subjects
STOCHASTIC partial differential equations ,REACTION-diffusion equations ,MOLECULAR dynamics ,MULTICORE processors ,ROLLBACK recovery (Computer science) ,COMPUTER simulation - Abstract
Cells exhibit stochastic behavior when the number of molecules is small. Hence a stochastic reaction-diffusion simulator capable of working at scale can provide a more accurate view of molecular dynamics within the cell. This article describes a parallel discrete event simulator, Neuron Time Warp-Multi Thread (NTW-MT), developed for the simulation of reaction diffusion models of neurons. To the best of our knowledge, this is the first parallel discrete event simulator oriented toward stochastic simulation of chemical reactions in a neuron. The simulator was developed as part of the NEURON project. NTW-MT is optimistic and thread based, which attempts to capitalize on multicore architectures used in high performance machines. It makes use of a multilevel queue for the pending event set and a single rollback message in place of individual antimessages to disperse contention and decrease the overhead of processing rollbacks. Global Virtual Time is computed asynchronously both within and among processes to get rid of the overhead for synchronizing threads. Memory usage is managed in order to avoid locking and unlocking when allocating and deallocating memory and to maximize cache locality. We verified our simulator on a calcium buffer model. We examined its performance on a calcium wave model, comparing it to the performance of a process based optimistic simulator and a threaded simulator which uses a single priority queue for each thread. Our multithreaded simulator is shown to achieve superior performance to these simulators. Finally, we demonstrated the scalability of our simulator on a larger Calcium-Induced Calcium Release (CICR) model and a more detailed CICR model. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
9. Analytic Model for Optimal Checkpoints in Mobile Real-time Systems.
- Author
-
Sung-Hwa Lim, Byoung-Hoon Lee, and Jai-Hoon Kim
- Subjects
MOBILE apps ,FAULT-tolerant computing ,COMPUTER input-output equipment ,MOBILE communication systems ,ROLLBACK recovery (Computer science) - Abstract
It is not practically feasible to apply hardware-based fault-tolerant schemes, such as hardware replication, in mobile devices. Therefore, software-based fault-tolerance techniques, such as checkpoint and rollback schemes, are required. In checkpoint and rollback schemes, the optimal checkpoint interval should be applied to obtain the best performance. Most previous studies focused on minimizing the expected execution time or response time for completing a given task. Currently, most mobile applications run in real-time environments. Therefore, it is extremely essential for mobile devices to employ optimal checkpoint intervals as determined by the real-time constraints of tasks. In this study, we tackle the problem of determining the optimal inter-checkpoint interval of checkpoint and rollback schemes to maximize the deadline meet ratio in real-time systems and to build a probabilistic cost model. From this cost model, we can numerically find the optimal checkpoint interval using mathematical tools. The performance of the proposed solution is evaluated using analytical estimates. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
10. A GATEWAY-CENTERED WORKFLOW ROLLBACK DECISION MODEL TOWARD AUTONOMOUS WORKFLOW PROCESS RECOVERY.
- Author
-
Hyun Ahn and Kwanghoon Pio Kim
- Subjects
- *
WORKFLOW management systems , *ROLLBACK recovery (Computer science) , *ERROR detection (Information theory) , *INFORMATION resources management , *WORKFLOW software - Abstract
In enacting a workflow process model, it is very important to control and trace each instance's execution as well as to keep it recoverable. Especially, the recoverability issue implies that the underlying workflow management system is able to not only provide the automatic error-detection functionality on its running exceptions but also to equip various autonomous recovery mechanisms to deal with the detected exceptional and risky situations. As a theoretical approach to resolve the autonomous workflow recovery issue, this paper tries to formalize a rollback-point decision tree structure based upon gateway-activities of a corresponding workflow process model, which is named as a gateway-centered workflow rollback decision model. We strongly believe that the proposed model ought to be one of those impeccable trials and pioneering contributions to improve and advance the capability of recovery in enacting workflow process models. [ABSTRACT FROM AUTHOR]
- Published
- 2016
11. The forearc extension in the Central Kuril Islands and the trench rollback.
- Author
-
Baranov, B., Lobkovsky, L., and Dozorova, K.
- Subjects
- *
ROLLBACK recovery (Computer science) , *TRENCHES , *GEOLOGIC faults , *SUBDUCTION , *BATHYMETRY - Abstract
On the basis of bathymetric and seismic data, obtained during cruises 37 (2005) and 41 (2006) of R/V Akademik M.A. Lavrentiev, a new structural scheme of transverse faults in the forearc of the Central Kuril Islands was compiled, the fault kinematics was studied, and a model of the extension zone in the structural pattern of the study area was proposed. According to this model, the trench rollback and development of back-arc basins resulted from the continuous supply of material into the upper mantle convection cell owing to subduction and an increase in the dynamic pressure that pushes the subducting plate, causing it to migrate toward the ocean. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
12. Compounding errors: why heightened regulation and taxation are bad antidotes for recessions and income inequality.
- Author
-
Epstein, Richard A
- Subjects
INCOME inequality ,RECESSIONS ,GOVERNMENT regulation ,TAXATION ,ROLLBACK recovery (Computer science) ,SOCIAL planning - Abstract
The current concerns with laggard growth and income inequality have led to a widespread set of demands for more regulation and higher taxation to reverse the trend. These two approaches move matters exactly in the wrong direction. The correct response is to find ways to reduce tax burdens and barriers to entry, and to reduce the political uncertainty associated with new government measures. It may well be too late, worldwide, for a substantial rollback in the welfare state. But the current proposals will only prolong the dismal results on both fronts in any arena in which they are tried. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
13. Rollback recovery with low overhead for fault tolerance in mobile ad hoc networks.
- Author
-
Jaggi, Parmeet Kaur and Singh, Awadhesh Kumar
- Subjects
ROLLBACK recovery (Computer science) ,FAULT-tolerant computing ,AD hoc computer networks ,INFRASTRUCTURE (Economics) ,NETWORK routing protocols ,MOBILE communication systems - Abstract
Mobile ad hoc networks (MANETs) have significantly enhanced the wireless networks by eliminating the need for any fixed infrastructure. Hence, these are increasingly being used for expanding the computing capacity of existing networks or for implementation of autonomous mobile computing Grids. However, the fragile nature of MANETs makes the constituent nodes susceptible to failures and the computing potential of these networks can be utilized only if they are fault tolerant. The technique of checkpointing based rollback recovery has been used effectively for fault tolerance in static and cellular mobile systems; yet, the implementation of existing protocols for MANETs is not straightforward. The paper presents a novel rollback recovery protocol for handling the failures of mobile nodes in a MANET using checkpointing and sender based message logging. The proposed protocol utilizes the routing protocol existing in the network for implementing a low overhead recovery mechanism. The presented recovery procedure at a node is completely domino-free and asynchronous. The protocol is resilient to the dynamic characteristics of the MANET; allowing a distributed application to be executed independently without access to any wired Grid or cellular network access points. We also present an algorithm to record a consistent global snapshot of the MANET. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
14. Movement-Based Checkpointing and Message Logging for Recovery in MANETs.
- Author
-
Jaggi, Parmeet and Singh, Awadhesh
- Subjects
AD hoc computer networks ,MOBILE communication systems ,NETWORK failures (Telecommunication) ,FAULT-tolerant computing ,ROLLBACK recovery (Computer science) ,COMPUTER algorithms ,GRAPH theory - Abstract
Mobile ad hoc networks (MANETs) are increasingly being employed for expanding the computing capabilities of existing cellular mobile systems and in the implementation of mobile computing grids. However, MANETs are susceptible to various transient as well as permanent failures and a fault tolerance technique is crucial in order to effectively utilize the constituent nodes as viable compute resources. Checkpointing and message logging based rollback recovery is a well established approach to provide fault tolerance in static and cellular mobile distributed systems; yet its use for achieving fault tolerance in MANETs is comparatively less explored. The existing recovery algorithms cannot be applied directly to MANETs due to their insufficiency in handling challenges like absence of static infrastructure, frequent node movement, constrained wireless bandwidth and limited stable storage. In this paper, we propose a checkpointing based rollback recovery protocol for clustered MANETs that determines the checkpointing frequency of a mobile node based on its mobility; thereby avoiding unnecessary checkpoints. The protocol uses a popular graph theoretic construct called connected dominating set to lower the communication overhead due to the recovery procedure. The findings of our scheme have been substantiated by the complexity analysis and simulation under varying network conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
15. Checkpointing and Roll back Recovery Protocols in Wireless Ad hoc Networks: A Review.
- Author
-
THAKUR, JAWAHAR, KALIA, ARVIND, and AWASTHI, LALIT
- Subjects
MOBILE communication systems ,ROLLBACK recovery (Computer science) ,AD hoc computer networks ,DISTRIBUTED computing ,COMPUTER programming - Abstract
The main focus in a single process checkpointing protocol is on finding optimal checkpoint interval to minimize the loss due to any fault, but in a distributed environment the main focus is on finding out and saving a global consistent state of the system. The challenge in finding a global consistent state is that interprocess communication creates dependencies that must be factored, otherwise the global checkpoint becomes useless. Mobile ad hoc networks throw up a plethora of challenges in tracking interprocess dependences including how to reliably save checkpoints in face of transience and node failures, where to save the checkpoints and how to reconstruct the stable global state from the nodes which are available after the fault. To add high availability and reliability to mobile networks, checkpoint based rollback recovery techniques are widely applicable. Checkpointing methods for traditional distributed systems cannot be applied directly to the mobile networks. This paper provides an overview of the available checkpointing strategies for mobile networks, comparing them on the various parameters. We conclude that no single strategy is optimal in all fault scenarios and that the perfect strategy may still be in the works. [ABSTRACT FROM AUTHOR]
- Published
- 2015
16. A Practical Approach of Storage Strategy for Grid Computing Environment.
- Author
-
Qureshi, Kalim
- Subjects
COMPUTER network protocols ,HIGH performance computing ,ELECTRONIC data processing ,ROLLBACK recovery (Computer science) ,FAULT-tolerant computing - Abstract
An efficient and reliable fault tolerance protocol plays an important role in making the system more stable. The most common technique used in High Performance Computing is rollback recovery, which relies on the availability of checkpoints and stability of storage media. Unstable media can result in failure of the nodes of the grid. Furthermore dedicating powerful resources solely as checkpoint storage results in loss of computation power of these resources which may become bottlenecks when the load on the network is high. A new protocol based on replication is proposed in this paper. To ensure the availability of valid checkpoints even in the case of checkpoint server or a whole cluster failure, the checkpoints are replicated on all checkpoint servers in the same cluster as well as on other clusters. To minimize the wastage of computational power of the most stable nodes in the cluster, our protocol utilizes the CPU cycles of dedicated servers in the case of high loads on the network. [ABSTRACT FROM AUTHOR]
- Published
- 2012
17. Specification and Synthesis of Hardware Checkpointing and Rollback Mechanisms.
- Author
-
Carven Chan, Schwartz-Narbonne, Daniel, Sethi, Divjyot, and Malik, Sharad
- Subjects
ROLLBACK recovery (Computer science) ,RTL (Computer program language) ,COMPUTER hardware description languages ,NETWORK routers ,INTEGRATED circuit interconnections ,NETWORK performance ,SYSTEMS design - Abstract
The increasing pressure to make hardware resilient to runtime failures has prompted development of design techniques for specific classes of systems, e.g. processors and routers. However, these techniques come at increased design and verification costs, thus limiting their broader application. In this work we describe a methodology for general RTL designs based on the widely usable checkpointing and rollback resiliency mechanism. We take a modeling and language approach that provides an appropriate set of abstractions for the resiliency logic. This cleanly separates the main design behavior from the resiliency behavior, leading to ease of design. Further, as the language abstractions can be automatically synthesized into resiliency logic, our methodology can merge with existing design flows. The concerns of verifying this additional resiliency logic can be addressed by synthesizing behavioral assertions capturing correct behavior. We demonstrate the use of this methodology on four examples, with synthesis for performance and area to estimate the overhead of the additional synthesis logic. [ABSTRACT FROM AUTHOR]
- Published
- 2012
18. A Dynamic Checkpointing and Rollback Recovery Solution Based on Task Switching.
- Author
-
Changheng Shao, Fengjing Shao, Xiaoning Song, and Rencheng Sun
- Subjects
ROLLBACK recovery (Computer science) ,SWITCHING theory ,FAULT-tolerant computing ,COMPUTER operating systems ,COMPUTER storage devices ,MATHEMATICAL proofs ,COMPUTER systems - Abstract
Fault tolerance is an important issue in operating system. Checkpointing and Rollback Recovery (CRR) is a key technique to fault tolerance. Its simplicity and effectiveness make it widely applied to fault maintenance of operating system. CRR can be divided into checkpoint storage and restoration. And checkpoint storage is key factor to real-time of checkpoint recovery. Current checkpoint storage is driven by clock and lack of real-time and flexibility. A dynamic CRR solution is proposed in this paper. In the solution, checkpoint storage occurs at the time of task switching rather than clock interrupt. Through applying it to SANC, the mechanism is proved to achieve high real-time of rollback recovery. [ABSTRACT FROM AUTHOR]
- Published
- 2009
19. Energy profile of rollback-recovery strategies in high performance computing.
- Author
-
Meneses, Esteban, Sarood, Osman, and Kalé, Laxmikant V.
- Subjects
- *
ROLLBACK recovery (Computer science) , *HIGH performance computing , *PROBLEM solving , *ENERGY consumption of computers , *FAULT-tolerant computing - Abstract
Extreme-scale computing is set to provide the infrastructure for the advances and breakthroughs that will solve some of the hardest problems in science and engineering. However, resilience and energy concerns loom as two of the major challenges for machines at that scale. The number of components that will be assembled in the supercomputers plays a fundamental role in these challenges. First, a large number of parts will substantially increase the failure rate of the system compared to the failure frequency of current machines. Second, those components have to fit within the power envelope of the installation and keep the energy consumption within operational margins. Extreme-scale machines will have to incorporate fault tolerance mechanisms and honor the energy and power restrictions. Therefore, it is essential to understand how fault tolerance and energy consumption interplay. This paper presents a comparative evaluation and analysis of energy consumption of three different rollback-recovery protocols: checkpoint/restart, message logging and parallel recovery. Our experimental evaluation shows parallel recovery has the minimum execution time and energy consumption. Additionally, we present an analytical model that projects parallel recovery can reduce energy consumption more than 37% compared to checkpoint/restart at extreme scale. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
20. OPTIMAL RANDOM BACKUP POLICIES FOR A DATABASE SYSTEM.
- Author
-
MINGCHIH CHEN, MIN WANG, and CHUN-YUAN CHENG
- Subjects
STATISTICAL sampling ,COMPARATIVE studies ,ROLLBACK recovery (Computer science) ,OVERTIME ,BACK up systems - Abstract
Some systems work for a job with random working and processing times, and are checked at periodic and random times. When a failure occurs, we execute the backup operation to the latest checking time. First, the expected costs with periodic and random backups are formulated, and comparisons between periodic and random policies are made. Next, we focus on the backup policies when the system is checked at the iVth interval of working times, two total expected costs are obtained and optimal numbers N* which minimize them are derived analytically. Finally, one modified backup model in which failures are detected only at checking times is proposed and two expected costs are obtained. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
21. Reverse computation for rollback-based fault tolerance in large parallel systems.
- Author
-
Perumalla, Kalyan and Park, Alfred
- Subjects
- *
REVERSIBLE computing , *ROLLBACK recovery (Computer science) , *FAULT-tolerant computing , *PARALLEL computers , *SCALABILITY , *COMPUTER networks , *CACHE memory - Abstract
Reverse computation is presented here as an important future direction in addressing the challenge of fault tolerant execution on very large cluster platforms for parallel computing. As the scale of parallel jobs increases, traditional checkpointing approaches suffer scalability problems ranging from computational slowdowns to high congestion at the persistent stores for checkpoints. Reverse computation can overcome such problems and is also better suited for parallel computing on newer architectures with smaller, cheaper or energy-efficient memories and file systems. Initial evidence for the feasibility of reverse computation in large systems is presented with detailed performance data from a particle (ideal gas) simulation scaling to 65,536 processor cores and 950 accelerators (GPUs). Reverse computation is observed to deliver very large gains relative to checkpointing schemes when nodes rely on their host processors/memory to tolerate faults at their accelerators. A comparison between reverse computation and checkpointing with measurements such as cache miss ratios, TLB misses and memory usage indicates that reverse computation is hard to ignore as a future alternative to be pursued in emerging architectures. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
22. Disnix: A toolset for distributed deployment.
- Author
-
van der Burg, Sander and Dolstra, Eelco
- Subjects
- *
SYSTEM administrators , *MACHINE learning , *COMPUTER architecture , *DEPLOYMENT (Military strategy) , *CASE studies , *ROLLBACK recovery (Computer science) - Abstract
Abstract: The process of deploying a distributed system in a network of machines is often very complex, laborious and time-consuming, while it is hard to guarantee that the system will work as expected and that specific non-functional deployment requirements from the domain are supported. In this paper we describe the Disnix toolset, which provides system administrators or developers with automatic deployment of a distributed system in a network of machines from declarative specifications and offers properties such as complete dependencies, atomic upgrades and rollbacks to make this process efficient and reliable. Disnix has an extensible architecture, allowing the integration of custom modules to make the deployment more convenient and suitable for the domain in which the system is to be used. Disnix has been under development for almost four years and has been applied to several types of distributed systems, including an industrial case study. [Copyright &y& Elsevier]
- Published
- 2014
- Full Text
- View/download PDF
23. Failure Avoidance in MPI Applications Using an Application-Level Approach.
- Author
-
Cores, Iván, Rodríguez, Gabriel, González, Patricia, and Martín, María J.
- Subjects
- *
FAILURE analysis , *MESSAGE passing (Computer science) , *APPLICATION software , *MACHINE theory , *ROLLBACK recovery (Computer science) , *COMPILERS (Computer programs) - Abstract
Execution times of large-scale computational science and engineering parallel applications are usually longer than the mean-time-between-failures. For this reason, hardware failures must be tolerated by the applications to ensure that not all computation done is lost on machine failures. Checkpointing and rollback recovery is one of the most popular techniques to provide fault tolerance support to parallel applications. However, when a failure occurs, most checkpointing mechanisms require a complete restart of the parallel application from the last checkpoint. New advances in the prediction of hardware failures have led to the development of proactive process migration approaches, where tasks are migrated in a preventive way when node failures are anticipated, avoiding the restart of the whole application. The work presented in this paper extends an application-level checkpointing framework to proactively migrate message passing interface (MPI) processes when impending failures are notified, without having to restart the entire application. The main features of the proposed solution are: low overhead in failure-free executions, avoiding the checkpoint dumping associated to rolling back strategies; low overhead at migration time, by means of the design of a light and asynchronous protocol to achieve a consistent global state; transparency for the user, thanks to the use of a compiler tool and a runtime library and portability, as it is not locked into a particular architecture, operating system or MPI implementation. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
24. A fully informed model-based checkpointing protocol for preventing useless checkpoints.
- Author
-
Wu, Jiang and Manivannan, D.
- Subjects
- *
ROLLBACK recovery (Computer science) , *DATA modeling , *COMPUTER network protocols , *INFORMATION processing , *FAULT-tolerant computing , *COMPUTER algorithms - Abstract
Checkpointing and rollback recovery are widely used techniques for handling failures in distributed systems. When processes involved in a distributed computation are allowed to take checkpoints independently without any coordination with each other, some or all of the checkpoints taken may not be part of any consistent global checkpoint, and hence, are useless for recovery. Communication-induced checkpointing algorithms allow processes to take checkpoints independently and also ensure that each checkpoint taken is part of a consistent global checkpoint by forcing processes to take some additional checkpoints. It is well known that it is impossible to design an optimal communication-induced checkpointing algorithm (i.e. a checkpointing algorithm that takes minimum number of forced checkpoints). So, researchers have designed communication-induced checkpointing algorithms that reduce forced checkpoints using different heuristics. In this paper, we present a communication-induced checkpointing algorithm which takes less number of forced checkpoints when compared to some of the existing checkpointing algorithms in its class. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
25. A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems.
- Author
-
Egwutuoha, Ifeanyi, Levy, David, Selic, Bran, and Chen, Shiping
- Subjects
- *
FAULT-tolerant computing , *HIGH performance computing , *PARALLEL processing , *COMPUTER workstation clusters , *COST effectiveness , *ROLLBACK recovery (Computer science) , *SURVEYS - Abstract
In recent years, High Performance Computing (HPC) systems have been shifting from expensive massively parallel architectures to clusters of commodity PCs to take advantage of cost and performance benefits. Fault tolerance in such systems is a growing concern for long-running applications. In this paper, we briefly review the failure rates of HPC systems and also survey the fault tolerance approaches for HPC systems and issues with these approaches. Rollback-recovery techniques which are most often used for long-running applications on HPC clusters are discussed because they are widely used for long-running applications on HPC systems. Specifically, the feature requirements of rollback-recovery are discussed and a taxonomy is developed for over twenty popular checkpoint/restart solutions. The intent of this paper is to aid researchers in the domain as well as to facilitate development of new checkpointing solutions. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
26. On system rollback and totalized fields: An algebraic approach to system change
- Author
-
Burgess, Mark and Couch, Alva
- Subjects
- *
ROLLBACK recovery (Computer science) , *ALGEBRA , *HYPOTHESIS , *ERROR-correcting codes , *STOCHASTIC convergence , *CHANGE management - Abstract
Abstract: In system operations the term rollback is often used to imply that arbitrary changes can be reversed i.e. ‘rolled back’ from an erroneous state to a previously known acceptable state. We show that this assumption is flawed and discuss error-correction schemes based on absolute rather than relative change. Insight may be gained by relating change management to the theory of computation. To this end, we reformulate previously-defined ‘convergent change operators’ of Burgess into the language of groups and rings. We show that, in this form, the problem of rollback from a convergent operation becomes equivalent to that of ‘division by zero’ in computation. Hence, we discuss how recent work by Bergstra and Tucker on zero-totalized fields helps to clear up long-standing confusion about the options for ‘rollback’ in change management. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
27. PREVENTIVE MIGRATION VS. PREVENTIVE CHECKPOINTING FOR EXTREME SCALE SUPERCOMPUTERS.
- Author
-
Cappello, Franck, Casanova, Henri, and Robert, YvesS
- Subjects
- *
SUPERCOMPUTERS , *FAULT-tolerant computing , *STOCHASTIC analysis , *HIGH performance computing , *ROLLBACK recovery (Computer science) , *PARALLEL processing - Abstract
An alternative to classical fault-tolerant approaches for large-scale clusters is failure avoidance, by which the occurrence of a fault is predicted and a preventive measure is taken. We develop analytical performance models for two types of preventive measures: preventive checkpointing and preventive migration. We instantiate these models for platform scenarios representative of current and future technology trends. We find that preventive migration is the better approach in the short term by orders of magnitude. However, in the longer term, both approaches have comparable merit with a marginal advantage for preventive checkpointing. We also develop an analytical model of the performance for fault tolerance based on periodic checkpointing and compare this approach to both failure avoidance techniques. We find that this comparison is sensitive to the nature of the stochastic distribution of the time between failures, and that failure avoidance is likely inferior to fault tolerance in the long term. Regardless, our result show that each approach is likely to achieve poor utilization for large-scale platforms (e.g., 220 nodes) unless the mean time between failures is large. We show how bounding parallel job size improves utilization, but conclude that achieving good utilization in future large-scale platforms will require a combination of techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
28. An 8–11 Gb/s Reference-Less Bang-Bang CDR Enabled by “Phase Reset”.
- Author
-
Shivnaraine, Ravi, Jalali, Mohammad Sadegh, Sheikholeslami, Ali, Kibune, Masaya, and Tamura, Hirotaka
- Subjects
- *
DATA recovery , *COMPUTER programming , *ROLLBACK recovery (Computer science) , *COMPUTER disaster recovery services , *DATA removal (Computer science) - Abstract
This paper embeds a “phase-reset” scheme into a bang-bang clock and data recovery (CDR) to periodically realign the clock phase to the data rising edge using a gated-VCO. This reduces both the CDR lock time and bit errors during pull-in, while increasing the CDR capture range. The CDR is fabricated in 65-nm CMOS, operates at 8-11 Gb/s, and demonstrates a 9 × increase in capture range. The CDR consumes 84 mW during lock, and 48 mW in steady state. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
29. Solve Blocking Problems with RCSI.
- Author
-
Survance, Kurt
- Subjects
DATABASES ,COMPUTER input-output equipment ,LOOP tiling (Computer science) ,ROLLBACK recovery (Computer science) ,INFORMATION storage & retrieval systems ,MAINTENANCE ,COST - Abstract
The article discusses how to solve the blocking problems on the database using the Read Committed Snapshot Isolation (RCSI). It states that run its alter database statement and set a busy database by using one of rollback options. It cites that use the rollback immediate command which will immediately start the rolling back transactions. It also mentions consider the potential cost of the blocking problems before implementing RCSI.
- Published
- 2012
30. SharePoint 2010 Disaster Recovery.
- Author
-
Klindt, Todd O.
- Subjects
DATA recovery ,ROLLBACK recovery (Computer science) ,CLIENT/SERVER computing ,DISTRIBUTED computing ,MICROSOFT software - Abstract
The article presents the first part of a series about disasters that may befall Microsoft SharePoint servers. It focuses on disasters related to the deletion of content and ways to prepare for such an event and respond to it. The importance of determining which disasters to plan for and explaining options and costs to customers. Tools for recovering deleted content include the Central Administration unattached content database recovery option.
- Published
- 2011
31. STMs in practice: Partial rollback vs pure abort mechanisms.
- Author
-
Anand, Anshu S., Shyamasundar, R. K., and Peri, Sathya
- Subjects
ROLLBACK recovery (Computer science) ,COMPUTER storage capacity ,COMPUTER software reusability ,COMPUTER software usability ,SOFTWARE architecture - Abstract
Summary: In this paper, we propose an enhanced Automatic Checkpointing and Partial Rollback (CaPR++) algorithm to realize Software Transactional Memory (STM), that employs partial rollback mechanism for conflict resolution. We have comparatively evaluated the "Abort" and "Partial Rollback" mechanisms for STMs. For purposes of comparison, we have used the state‐of‐the‐art RSTM system and for the "Partial Rollback", and we have used our earlier CaPR+ algorithm that has been enhanced for our requirements. Note that we have enriched the STAMP benchmarks with varied delayed transaction times. The results obtained demonstrate the effectiveness of the Partial Rollback mechanism over pure abort mechanisms for applications consisting of large transaction delays, with up to 1.6x performance gain for applications with large transactional delays. Our study makes the case for a hybrid system of pure aborts and partial rollbacks, which can extract the benefits of both mechanisms. Keeping in line with our study, we have proposed a hybrid implementation where some of the transactions of an application subscribe to abort mechanisms and the rest to partial rollback. Our initial implementation demonstrates various scenarios where the hybrid approach outperforms the pure abort and partial rollback approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
32. States Challenge Trump Administration's Rollback of Power Plant Emission Rules.
- Subjects
CLEAN Air Act (U.S.) ,COAL-fired power plants ,ROLLBACK recovery (Computer science) - Published
- 2019
33. HANDS-ON WITH DATA RESCUE DATA RECOVERY TOOLS.
- Author
-
CARUANA, ANTHONY
- Subjects
DATA recovery ,COMPUTER programming ,ROLLBACK recovery (Computer science) ,COMPUTER disaster recovery services ,DATA removal (Computer science) ,COMPUTERS ,COMPUTER software - Abstract
The article discusses data recovery tool that are needed in order to prevent data loss from happening. It states that data recovery may be time consuming and expensive and experts can disassemble the drive and rebuild it. It mentions the types of data loss such as hardware failure and software failure. It adds that the first step in the process was carrying out a block-level clone of the hard drive.
- Published
- 2016
34. CHARM: A CHECKPOINT-BASED ROLLBACK RECOVERY AND PROCESS MIGRATION SYSTEM FOR CLUSTER OF WORKSTATIONS.
- Author
-
DONGSHENG WANG, XIAOTIE DENG, and WEIMIN ZHENG
- Subjects
ROLLBACK recovery (Computer science) ,PROCESS migration (Electronic data processing) ,COMPUTER workstation clusters ,SOFTWARE maintenance ,COMPUTATIONAL complexity - Published
- 2000
35. Hiding rollback latency in log-based eager hardware transactional memory.
- Author
-
Sungjae Lee and Inhwan Lee
- Subjects
- *
ROLLBACK recovery (Computer science) , *HARDWARE , *COMPUTER input-output equipment , *ARRAY processors , *ELECTRONIC industries - Abstract
The use of a rollback buffer (RB) for hiding the rollback latency in log-based eager hardware transactional memory is proposed. The RB allows a transaction to abort without performing rollback, but still makes the transaction's old values immediately available. In effect, the rollback latency almost disappears. When running the Stanford transactional applications for multi-processing benchmark on a 16- core processor that implements the LogTM-SE, the speedup (decrease in execution time) achieved with a 2 KB RB is 15.8% on average. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
36. The future of the Volcker Rule: a poll by IFLR and Morrison & Foerster.
- Author
-
Labbé, Amélie
- Subjects
VOLCKER Rule (U.S.) ,FINANCIAL services industry ,DODD-Frank Wall Street Reform & Consumer Protection Act ,ROLLBACK recovery (Computer science) - Abstract
Click http://bit.ly/VolckerRulesurvey to access the poll. Responses are anonymous and confidential. [ABSTRACT FROM AUTHOR]
- Published
- 2018
37. In the news this week.
- Author
-
Crabb, John, Lai, Karry, and Jackson, Olly
- Subjects
AUDITING ,BONDS (Finance) ,INTERNATIONAL relations ,FIDUCIARY accounting ,ROLLBACK recovery (Computer science) - Abstract
This week features updates on locked box mechanisms, rising tensions between the US and China, and the latest on the fiduciary rule rollback [ABSTRACT FROM AUTHOR]
- Published
- 2018
38. Designing Over-the-Air, Secure Devices.
- Author
-
Howard, Andrew
- Subjects
INTERNET of things ,WIRELESS communications security ,SOFTWARE upgrades ,PRODUCT recall ,ROLLBACK recovery (Computer science) ,COMPUTER programming - Abstract
The article focuses on concerns related to production of Internet of Things (IoT) devices with management of security aspects. Topics discussed include implementation of updates in over-the-air (OTA) services; upgradation conditions leads to prevention of product recall conditions towards manufacturers; and illustration of rollback conditions with failure in code development cases.
- Published
- 2017
39. Restore Deleted Files with Previous Versions.
- Author
-
WILSON, MARK
- Subjects
COMPUTER files ,ROLLBACK recovery (Computer science) - Abstract
The article offers step-by-step instructions for restoring deleted files with previous versions of Windows 7 operating system.
- Published
- 2014
40. On Necessary and Sufficient Conditions for Deadlock-Free Routing in Wormhole Networks.
- Author
-
Verbeek, Freek and Schmaltz, Julien
- Subjects
- *
WORMHOLE routing , *NETWORK routing protocols , *ROLLBACK recovery (Computer science) , *NETWORK analysis (Communication) , *MULTIPROCESSORS , *INTEGRATED circuit interconnections - Abstract
Wormhole switching is a popular switching technique in interconnection networks. This technique is also prone to deadlocks. Adaptive routing algorithms provide alternative paths that can be used to escape congested areas and prevent some deadlocks to occur. If not designed carefully, these new paths may as well introduce deadlocks. A successful solution to deadlock prevention is to constrain the routing function such that it does not introduce any deadlock. Many necessary and sufficient conditions for deadlock-free routing have been proposed. The definition and the proof of these conditions are complex and error-prone. These conditions are often counterintuitive and difficult to understand. Moreover, they are not static, as they all require the analysis of configurations, i.e., the network state. The contribution of this paper is twofold. We present the first static necessary and sufficient condition for deadlock-free routing in wormhole networks. Our condition is much simpler and requires less assumptions than all previous ones. It is formally proven correct using an automated proof assistant. In particular, our condition applies to incoherent routing functions which was considered an open problem. Second, we prove the deadlock decision problem co-NP-complete for wormhole networks. [ABSTRACT FROM PUBLISHER]
- Published
- 2011
- Full Text
- View/download PDF
41. FCC to Outline Plan to Roll Back Net-Neutrality Rules.
- Author
-
McKinnon, John D.
- Subjects
- *
NETWORK neutrality , *INTERNET service providers , *ROLLBACK recovery (Computer science) , *TRADE associations - Published
- 2017
42. CFPB’s Mulvaney plots HMDA rollback, but it may not matter.
- Author
-
Berry, Kate
- Subjects
MONEYLENDERS ,ROLLBACK recovery (Computer science) - Abstract
Lenders would have a lighter data-reporting burden, but they may end up deciding to collect the data anyway. [ABSTRACT FROM AUTHOR]
- Published
- 2018
43. Kronologi teams with ST Electronics for Hong Kong expansion.
- Subjects
ROLLBACK recovery (Computer science) ,DATA protection - Published
- 2017
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.