Optimistically synchronized parallel discrete-event simulation is based on the use of communicating sequential processes. Optimistic synchronization means that the processes proceed under the assumption that a synchronized execution schedule is fortuitous. Periodic checkpointing of the state of a process allows the process to roll back to an earlier state when synchronization errors are detected. This paper examines the effects of varying the checkpoint interval on the execution time and memory space needed to perform a parallel simulation.
The empirical results presented in this paper were obtained from the simulation of closed stochastic queueing networks with several different topologies. Various intra-processor process scheduling algorithms and both lazy and aggressive cancellation strategies are considered. The empirical results are compared with analytical formulae predicting time-optimal checkpoint intervals. Two modes of operation, throttling and thrashing have been noted and their effect examined. As the checkpoint interval is increased from one, there is a throttling effect among processes on the same processor which improves performance. When the checkpoint interval is made too large, there is a thrashing effect caused by interaction between processes on different processors. It is shown that the time-optimal and space-optimal checkpoint intervals are not the same. Furthermore, a checkpoint interval that is too small adversely affects space more than time; whereas, a checkpoint interval that is too large adversely affects time more than space.
Copyright 1993 by Association for Computing Machinery, Inc.
Full text. BibTeX entry.