Oxford logo

Project Report

Automatic Verification of Randomized Distributed Algorithms

We believe that this project fully satisfies the objectives as stated in the original proposal. Indeed, we have far exceeded the objectives we set for ourselves. Below we summarise the main achievements in relation to the objectives.

Semantic models and model checking algorithms:

We established that Markov decision processes (MDPs) are an appropriate semantic model to represent asynchronous randomized distributed algorithms, with discrete time Markov chains (DTMCs) contained within MDPs modelling the synchronous algorithms. The model checking of MDPs reduces to linear programming problems and linear equation systems for DTMCs. Though the MDP models can feature discrete time, the consideration of dense real-time in combination with probabilistic behaviour posed a significant challenge. We addressed this issue within this project by formulating a new model called probabilistic timed automata and associated logic PTCTL, and showed for the first time the decidability of PTCTL model checking for such automata based on the region graph construction and solving linear programming problems [KNSS99]. Jointly with a spin-off EPSRC project GR/N22960 (Verification of Quality of Service Properties in Timed Systems), we improved the efficiency of model checking for probabilistic timed automata by adapting the forwards [KNSS02] and backwards reachability analysis [KNS00,KNS01b] and deriving their continuous variant, for which we were able to solve an open problem by formulating an approximate method of model checking using a refinement of the region graph [KNSS00a]. Finally, we developed methods for handling open queues [EKN99], investigated approximating the probabilities by lower/upper bounds [BKN99] and formulated an algorithm for maximal probabilistic reachability for infinite-state probabilistic systems [KNS01b].

Symbolic probabilistic model checking:

We defined a variant of the language of Reactive Modules for use as a modelling language for PRISM and implemented model checking for the logic PCTL. Though not in the original plan, thanks to collaboration with Katoen, we were able to also extend our methods to handle continuous time Markov chains (CTMCs) and implement model checking for the logic CSL, improving efficiency of existing methods by O(N). Model checking involves a translation of a CTMC to a DTMC known as uniformisation, together with transient analysis [KKNP01].

We investigated in detail the performance of MTBDD data structures when representing and verifying probabilistic systems [HKN+03]. We discovered heuristics for boolean variables which help ensure the compactness of the MTBDD representation [dAKN+00] and established that the iterative methods for solving linear programming problems perform better on MTBDDs than e.g. Simplex [KNPS99]. Initially, in certain cases we observed inefficiency in numerical computation with MTBDDs which was several orders of magnitude slower compared to sparse matrices, though MTBDDs can represent larger systems; we addressed this inefficiency firstly through implementing a pre-computation algorithm which identifies all those states that satisfy the formula with probability 1 and 0 (BDD fixed point computation suffices in these cases) [dAKN+00,KNP00]. Secondly, we implemented a hybrid data structure, which retains an MTBDD-like representation of the matrix and works with a conventional vector, that, given further optimisations, can almost match the speed of sparse matrices while maintaining the advantage of lower memory requirement [KNP02b].

PRISM: Probabilistic Symbolic Model Checker:

Although we originally did not plan to implement a tool but instead aimed to collaborate with tool developers at CMU and Oxford, we were able to fully implement PRISM [KNP02a], internationally the first and currently still the only model checker for randomized distributed algorithms, and substantially advance the software technology that underpins probabilistic model checking. PRISM was released for use in September 2001 and is freely available for academic research purposes from its website, to which we refer the interested reader for more detail. PRISM can represent and usefully analyse very large non-trivial systems (to give an indication of the scale, quantitative analysis has been performed for systems with up to 1014 states). The main functionality of PRISM can be summarised as follows:

  • support for three types of models, discrete time Markov chains (DTMCs), Markov decision processes (MDPs) and continuous time Markov chains (CTMCs);
  • an easy to use system description language based on Reactive Modules;
  • implementation of model checking for the logic PCTL (for DTMCs and MDPs) [dAKN+00], both qualitative and quantitative probabilistic properties, with and without fairness assumptions, and logic CSL (for CTMCs) [KKNP01] which allows the specification of transient and steady-state properties;
  • three engines for numerical computation (symbolic, based on MTBDDs; explicit, based on sparse matrices; and a hybrid of the two) [KNP02b].

Verification methodology for randomized distributed algorithms:

We have successfully combined the Cadence SMV approach with PRISM modelling and analysis, and derived a methodology that can be applied to largely automate the verification process for randomized distributed algorithms whose probabilistic component is confined to a well-defined part of the protocol execution. In particular, we performed the first such automated verification for the benchmark randomized consensus algorithm of Aspnes & Herlihy [KNS01a] identified in the proposal, and were able to verify certain requirements for arbitrarily many processes, something that we did not anticipate we would be able to achieve at the time of writing. The methodology relies on verifying non-probabilistic requirements using Cadence SMV, which can be done for all N, where N is the number of processes, with appropriate abstractions, and the probabilistic requirements with PRISM (for finite configurations). The recently proposed non-trivial Byzantine agreement of Kursawe et al called ABBA was also verified using this methodology [KN02]. ABBA forms the core of a family of secure and efficient broadcast algorithms, and is therefore of practical relevance.

Case studies:

We made excellent progress with the case studies performed with PRISM: approximately 18 as opposed to 2 or 3 as we expected. This is largely a result of collaborations, both external (with Erlangen and Twente) and internal (with the EPSRC project GR/N22960). The case studies included the proposed benchmark algorithm of Aspnes & Herlihy and many more, ranging from the simple randomized dining philosophers, through typical performance modelling examples such as the Kanban, to IEEE 1394 FireWire root contention [KNS03b] and Kursawe's Byzantine agreement [KN02] protocols. The results of model checking and statistics for the three engines are included in the PRISM website to enable detailed comparison with similar tools.