Action selection for MDPs: anytime AO* vs. UCT

Inicio | ¿Qué es? | Contacto

English | Català

Consultar RECERCAT

Por comunidades y
colecciones Por fecha Por autores Por títulos Por temas (CDU)

Consultar departamento

Por fecha Por autores Por títulos Por temas (CDU)

Estadisticas

Del documento Todo RECERCAT

Mi RECERCAT

Entrar Alertas por correo-e

Directorio de otros repositorios

RECERCAT Principal > Universitat Pompeu Fabra > Articles, congressos, llibres > Visualizar documento

Para acceder a los documentos con el texto completo, por favor, siga el siguiente enlace: http://hdl.handle.net/10230/35973

Título:	Action selection for MDPs: anytime AO* vs. UCT
Autor/a:	Bonet, Blai; Geffner, Héctor
Abstract:	Comunicació presentada a: the 26th AAAI Conference on Artificial Intelligence, celebrada a Toronto, Canadà, del 22 al 26 de juliol de 2012
Abstract:	In the presence of non-admissible heuristics, A* and other best-first algorithms can be converted into anytime optimal algorithms over OR graphs, by simply continuing the search after the first solution is found. The same trick, however, does not work for best-first algorithms over AND/OR graphs, that must be able to expand leaf nodes of the explicit graph that are not necessarily part of the best partial solution. Anytime optimal variants of AO* must thus address an exploration-exploitation tradeoff: they cannot just ”exploit”, they must keep exploring as well. In this work, we develop one such variant of AO* and apply it to finite-horizon MDPs. This Anytime AO* algorithm eventually delivers an optimal policy while using non-admissible random heuristics that can be sampled, as when the heuristic is the cost of a base policy that can be sampled with rollouts. We then test Anytime AO* for action selection over large infinite-horizon MDPs that cannot be solved with existing off-line heuristic search and dynamic programming algorithms, and compare it with UCT.
Abstract:	H. Geffner is partially supported by grants TIN2009-10232, MICINN, Spain, and EC-7PM SpaceBook.
Derechos:	© 2012, Association for the Advancement of Artificial Intelligence (www.aaai.org)
Tipo de documento:	Objeto de conferencia Artículo - Versión aceptada
Editor:	Association for the Advancement of Artificial Intelligence (AAAI)
Compartir:

Mostrar el registro completo del ítem

Documentos relacionados

Otros documentos del mismo autor/a

Planning as heuristic search

Bonet, Blai; Geffner, Héctor

mGPT: a probabilistic planner based on heuristic search

Bonet, Blai; Geffner, Héctor

Planning under partial observability by classical replanning: theory and experiments

Bonet, Blai; Geffner, Héctor

Heuristics for planning with penalties and rewards formulated in logic and computed through circuits

Bonet, Blai; Geffner, Héctor

Width and complexity of belief tracking in non-deterministic conformant and contingent planning

Bonet, Blai; Geffner, Héctor

Accesibilidad | Aviso legal | Política de Cookies | Documentos de uso interno

Coordinación

Patrocinio