REAL

A complexity analysis of Policy Iteration through combinatorial matrices arising from Unique Sink Orientations

Gerencsér, Balázs and Hollanders, R. and Delvenne, J-C. and Jungers, Raphaël M. (2017) A complexity analysis of Policy Iteration through combinatorial matrices arising from Unique Sink Orientations. JOURNAL OF DISCRETE ALGORITHMS, 44. pp. 21-38. ISSN 1570-8667

[img]
Preview
Text
1407.4293v2.pdf

Download (807kB) | Preview

Abstract

Unique Sink Orientations (USOs) are an appealing abstraction of several major optimization problems of applied mathematics such as Linear Programming (LP), Markov Decision Processes (MDPs) or 2-player Turn Based Stochastic Games (2TBSGs). A polynomial time algorithm to find the sink of a USO would translate into a strongly polynomial time algorithm to solve the aforementioned problems—a major quest for all three cases. In the case of an acyclic USO of a cube, a situation that captures both MDPs and 2TBSGs, one can apply the well-known Policy Iteration (PI) algorithm. The study of its complexity is the object of this work. Despite its exponential worst case complexity, the principle of PI is a powerful source of inspiration for other methods. In 2012, Hansen and Zwick introduced a new combinatorial relaxation of the complexity problem for PI resulting in what we call Order-Regular (OR) matrices. They conjectured that the maximum number of rows of such matrices—an upper bound on the number of steps of PI—should follow the Fibonacci sequence. As our first contribution, we disprove the lower bound part of Hansen and Zwick's conjecture. Then, for our second contribution, we (exponentially) improve the Ω(1.4142n) lower bound on the number of steps of PI from Schurr and Szabó in the case of OR matrices and obtain an Ω(1.4269n) bound. © 2017 Elsevier B.V.

Item Type: Article
Additional Information: N1 Funding details: ARC 13/18-054, Fédération Wallonie-Bruxelles N1 Funding details: ARC 14/19-060, Fédération Wallonie-Bruxelles N1 Funding text: The authors wish to thank Thomas Dueholm Hansen for the fruitful discussions on the topic of Order-Regular matrices, Acyclic Unique Sink Orientations and Stochastic Games. Moreover, this work was supported by ARC grants ARC 13/18-054 and ARC 14/19-060 from the French Community of Belgium and by grant no. SPF-PS-PAI-P7/19 for the IAP network ‘DYSCO’ funded by the Belgian Federal State. The scientific responsibility rests with the authors.
Uncontrolled Keywords: Matrix algebra; Strongly polynomial time algorithm; Sink orientations; Polynomial-time algorithms; Markov decision processes; Combinatorial relaxations; stochastic systems; Problem solving; Polynomial approximation; Optimization; Markov processes; Linear programming; iterative methods; Unique Sink Orientations; Policy iteration; Complexity bounds; Combinatorial matrices
Subjects: Q Science / természettudomány > QA Mathematics / matematika
SWORD Depositor: MTMT SWORD
Depositing User: MTMT SWORD
Date Deposited: 02 Feb 2018 09:49
Last Modified: 02 Feb 2018 09:49
URI: http://real.mtak.hu/id/eprint/73744

Actions (login required)

Edit Item Edit Item