Whizbang Labs

Publication List Details

Period

2001 - 2001

Number

2

Co-Authors

Experiments with Infinite-Horizon, Policy-Gradient Estimation (2001)

Jonathan Baxter, Whizbang Labs, Peter L. Bartlett, Biowulf Technologies, Lex Weaver

In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm...

Infinite-Horizon Policy-Gradient Estimation (2001)

Jonathan Baxter, Whizbang Labs, Peter L. Bartlett, Biowulf Technologies

Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems...