Funded by: (grant reference GR/M04617/01)
Dates: 1998-2001
Institutions: | University of Birmingham |
`Model checkers' are software tools which enable automatic checking of systems such as hardware circuits and communication protocols against specifications: when supplied with a description of a system and the intended specification they output `yes' if the specification is satisfied, and `no' otherwise, in which case a counter-example (a trace leading to error) is output as well.
This project is concerned with extending the model checking techniques to the case of randomized distributed algorithms. These are algorithms which are designed to run on hardware consisting of many interconnected processors (and often geographically `distributed', as in the case of a network of processors), and which use randomization, that is, make choices at random, using electronic coin flipping. It turns out that randomized algorithms are much faster than their deterministic counter- parts, but verifying their correctness against the specification is much more involved because of the need for sophisticated probabilistic analysis. Moreover, a model checker may be unable to output `yes' or `no', but instead only the probability of the specification being satisfied.
Distributed algorithms are designed to run on hardware consisting of many interconnected processors. A randomized algorithm contains an assignment to a variable of an outcome of tossing a fair coin or a random number generator. Since the seminal paper by Michael Rabin randomized algorithms have won universal approval, basically for two reasons: simplicity and speed. Randomization offers a powerful tool for symmetry breaking, which ensures the existence of solutions to certain network co-ordination problems, such as distributed consensus, even in the presence of failures. Randomized distributed algorithms include a number of theoretical algorithmic schemes, for example the dining philosophers protocol and the consensus of Aspnes & Herlihy, as well as the practical real-world protocols for Byzantine agreement due to Kursawe et al and root contention in the IEEE 1394 FireWire.
A necessary consequence of using randomization is the fact that the correctness statements must combine probabilistic analysis, typically based on some appropriate probability space on computation paths, with classical, assertion-based reasoning. Examples include: "the gas boiler may abort with probability at most 0.01", "termination occurs with probability 1", and "delivery of a video frame is guaranteed within time t with probability at least 0.98". The analysis of randomized distributed algorithms becomes very complex, sometimes leading to errors.
The aim of this project was to automate, all or in part, the verification process for randomized distributed algorithms, which could be achieved by extending the techniques of model checking to the domain of randomized algorithms. Model checking, and particularly symbolic model checking, has become an established industry standard for use in ensuring correctness of systems as well as exposing flaws in the existing designs, to mention e.g. tools SMV, SPIN, CWB and FDR, none of which can deal with probability. At the start of the project, several theoretical model checking algorithms could be found in the literature, but experimental work and tool implementation was lagging behind. In fact, there was only one tool, Probabilistic Verus, which was capable of handling probabilistic behaviour, albeit in a limited fashion, but which was never developed fully to be released for use.
The scientific challenge of the proposal was to firstly identify a suitable representation for randomized distributed algorithms, and then develop appropriate model checking algorithms and efficient data structures which are capable of dealing with the increase in complexity and size of models caused by the need to model distributed computation and include probabilistic behaviour.