Notes:
The original publication is available at link.springer.com.
|
Abstract.
We study the problem of achieving a given value in Markov
decision processes (MDPs) with several independent discounted reward
objectives. We consider a generalised version of discounted reward objectives, in which the amount of discounting depends on the states visited
and on the objective. This definition extends the usual definition of discounted reward, and allows to capture the systems in which the value of
different commodities diminish at different and variable rates.
We establish results for two prominent subclasses of the problem, namely
state-discount models where the discount factors are only dependent on
the state of the MDP (and independent of the objective), and reward-discount models where they are only dependent on the objective (but
not on the state of the MDP). For the state-discount models we use a
straightforward reduction to expected total reward and show that the
problem whether a value is achievable can be solved in polynomial time.
For the reward-discount model we show that memory and randomisation of the strategies are required, but nevertheless that the problem is
decidable and it is sufficient to consider strategies which after a certain
number of steps behave in a memoryless way.
For the general case, we show that when restricted to graphs (i.e. MDPs
with no randomisation), pure strategies and discount factors of the form
1/n where n is an integer, the problem is in PSPACE and finite memory
suffices for achieving a given value. We also show that when the discount
factors are not of the form 1/n, the memory required by a strategy can
be infinite.
|