Abstract.
We study the complexity of central controller synthesis problems for finite-state Markov decision processes, where
the objective is to optimize both the expected mean-payoff
performance of the system and its stability. We argue that the
basic theoretical notion of expressing the stability in terms of the
variance of the mean-payoff (called global variance in our paper)
is not always sufficient, since it ignores possible instabilities on
respective runs. For this reason we propose alternative definitions
of stability, which we call local and hybrid variance, and which
express how rewards on each run deviate from the run’s own
mean-payoff and from the expected mean-payoff, respectively.
We show that a strategy ensuring both the expected mean-
payoff and the variance below given bounds requires randomization and memory, under all the above semantics of variance. We
then look at the problem of determining whether there is a such
a strategy. For the global variance, we show that the problem
is in PSPACE, and that the answer can be approximated in
pseudo-polynomial time. For the hybrid variance, the analogous
decision problem is in NP, and a polynomial-time approximating
algorithm also exists. For local variance, we show that the
decision problem is in NP. Since the overall performance can be
traded for stability (and vice versa), we also present algorithms
for approximating the associated Pareto curve in all the three
cases.
Finally, we study a special case of the decision problems,
where we require a given expected mean-payoff together with
zero variance. Here we show that the problems can be all solved
in polynomial time.
|