Douglas Aberdeen

Publication List Details

Period

2001 - 2007

Number

20

Co-Authors

Concurrent Probabilistic Temporal Planning with Policy-Gradients (2007)

Aberdeen, Douglas, Buffet, Olivier

We present an any-time concurrent probabilistic temporal planner that includes continuous and discrete uncertainties and metric functions. Our approach is a direct policy search that attempts to...

FF+FPG: Guiding a Policy-Gradient Planner (2007)

Aberdeen, Douglas, Buffet, Olivier

The Factored Policy-Gradient planner (FPG) was a successful competitor in the probabilistic track of the 2006 International Planning Competition (IPC). FPG is innovative because it scales to large...

Policy-Gradients for PSRs and POMDPs (2007)

Aberdeen, Douglas, Buffet, Olivier, Thomas, Owen

In uncertain and partially observable environments control policies must be a function of the complete history of actions and observations.Rather than present an ever growing history to a learner, we...

Natural Actor-Critic for Road Traffic Optimisation (2007)

Richter, Silvia, Aberdeen, Douglas, Yu, Jin

Current road-traffic optimisation practice around the world is a combination of hand tuned policies with a small degree of automatic adaption. Even state-of-the art research controllers need good...

Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation (2005)

Schraudolph, Nicol N., Yu, Jin, Aberdeen, Douglas

Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously ill-behaved optimization problems. We improve its robustness and speed of...

A Two-Teams Approach for Robust Probabilistic Temporal Planning (2005)

Buffet, Olivier, Aberdeen, Douglas

Large real-world Probabilistic Temporal Planning (PTP) is a very challenging research field. A common approach is to model such problems as Markov Decision Problems (MDP) and use dynamic programming...

Simulation Methods for Uncertain Decision-Theoretic Planning (2005)

Aberdeen, Douglas, Buffet, Olivier

Experience based reinforcement learning (RL) systems are known to be useful for dealing with domains that are \emph{a priori} unknown. We believe that experience based methods may also be useful when...

Robust Planning with (L)RTDP (2005)

Buffet, Olivier, Aberdeen, Douglas

Stochastic Shortest Path problems (SSPs), a subclass of Markov Decision Problems (MDPs), can be efficiently dealt with using Real-Time Dynamic Programming (RTDP). Yet, MDP models are often uncertain...

Prottle: A Probabilistic Temporal Planner (2005)

Little, Iain, Aberdeen, Douglas, Sylvie, Thiebaux

Planning with concurrent durative actions and probabilistic effects, or probabilistic temporal planning, is a relatively new area of research. The challenge is to replicate the success of modern...

Filtered Reinforcement Learning (2004)

Aberdeen, Douglas

Reinforcement learning (RL) algorithms attempt to assign the credit for rewards to the actions that contributed to the reward. Thus far, credit assignment has been done in one of two ways: uniformly,...

Decision-Theoretic Military Operations Planning (2004)

Douglas Aberdeen, Sylvie Thi Ebaux, Lin Zhang

Military operations planning involves concurrent actions, resource assignment, and conflicting costs. Individual tasks sometimes fail with a known probability, promoting a decision-theoretic...

Decision-Theoretic Military Operations Planning (2004)

Aberdeen, Douglas, Thiebaux, Sylvie, Zhang, Lin

Military operations planning involves concurrent actions, resource assignment, and conflicting costs. Individual tasks sometimes fail with a known probability, promoting a decision-theoretic...

A (Revised) Survey of Approximate Methods for (2003)

Douglas Aberdeen

Partially observable Markov decision processes (POMDPs) are interesting because they provide a general framework for learning in the presence of multiple forms of uncertainty. We survey methods for...

Scaling Internal-State Policy-Gradient Methods for POMDPs (2002)

Douglas Aberdeen, Jonathan Baxter

Policy-gradient methods have received increased attention recently as a mechanism for learning to act in partially observable environments.

Unknown (2002)

Douglas Aberdeen, Jonanthan Baxter

Policy-gradient algorithms are attractive as a scalable approach to learning approximate policies for controlling partially observable Markov decision processes (POMDPs). POMDPs can be used to model...

Learning POMDP Policies with Internal State using Gradient Ascent (2001)

Douglas Aberdeen, Jonathan Baxter

In [8, 9] we introduced GPOMDP, an algorithm for estimating the gradient of the average reward for arbitrary Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized...

Internal State GPOMDP with Trace Filtering (2001)

Douglas Aberdeen, Jonathan Baxter, Peter L. Bartlett

GPOMDP is an algorithm for estimating the gradient of the average reward for arbitrary Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies. It...

Emmerald: A Fast Matrix-Matrix Multiply Using Intel's SSE Instructions (2001)

Douglas Aberdeen, Jonathan Baxter

Generalised matrix-matrix multiplication forms the kernel of many mathematical algorithms, hence a faster matrix-matrix multiply immediately benefits these algorithms. In this paper we implement...

General Matrix-Matrix Multiplication Using SIMD Features of the PIII (2001)

Douglas Aberdeen, Jonathan Baxter

Generalised matrix-matrix multiplication forms the kernel of many mathematical algorithms. A faster matrix-matrix multiply immediately benets these algorithms. In this paper we implement ecient...