Experiments with Infinite-Horizon, Policy-Gradient Estimation (2001)
Jonathan Baxter, Whizbang Labs, Peter L. Bartlett, Biowulf Technologies, Lex Weaver
In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm...
The Optimal Reward Baseline for Gradient-Based Reinforcement Learning (2001)
There exist a number of reinforcement learning algorithms which learn by climbing the gradient of expected reward. Their long-run convergence has been proved, even in partially observable...
STD(): learning state differences with TD() (2001)
TD() with function approximation has proved empirically successful for some complex reinforcement learning problems. For linear approximation, TD() has been shown to minimise the squared error...
A Multi-Agent, Policy-Gradient approach to Network Routing (2001)
Nigel Tao, Jonathan Baxter, Lex Weaver
Network routing is a distributed decision problem which naturally admits numerical performance measures, such as the average time for a packet to travel from source to destination. Olpomdp, a...
Reinforcement Learning From State and Temporal Differences (2000)
TD() with function approximation has proved empirically successful for some complex reinforcement learning problems. For linear approximation, TD() has been shown to minimise the squared error...
Sorting Integers on the AP1000 (2000)
Sorting is one of the classic problems of computer science. Whilst well understood on sequential machines, the diversity of architectures amongst parallel systems means that algorithms do not perform...
Design and Evaluation of Mechanisms for a Multicomputer Object Store (2000)
Multicomputers have traditionally been viewed as powerful compute engines. It is from this perspective that they have been applied to various problems in order to achieve significant performance...
Direct Gradient-Based Reinforcement Learning: II. Gradient Ascent Algorithms and Experiments (2000)
Jonathan Baxter, Lex Weaver, Peter Bartlett
In [2] we introduced GPOMDP, an algorithm for computing arbitrarily accurate approximations to the performance gradient of parameterized partially observable Markov decision processes (POMDPs). The...
Direct Gradient-Based Reinforcement Learning: II. Gradient Ascent Algorithms and Experiments (2000)
Jonathan Baxter, Lex Weaver, Peter Bartlett
In [2] we introduced GPOMDP, an algorithm for computing arbitrarily accurate approximations to the performance gradient of parameterized partially observable Markov decision processes (POMDPs).
KnightCap: A chess program that learns by combining (2000)
Jonathan Baxter, Andrew Tridgell, Lex Weaver
In this paper we present TDLeaf(), a variation on the TD() algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program...
Direct Gradient-Based Reinforcement Learning: (1999)
Jonathan Baxter, Lex Weaver, Peter Bartlett
In [2] we introduced GPOMDP, an algorithm for computing arbitrarily accurate approximations to the performance gradient of parameterized partially observable Markov decision processes (POMDPs).
Direct Gradient-Based Reinforcement Learning: (1999)
Jonathan Baxter, Lex Weaver, Peter Bartlett
In [2] we introduced GPOMDP, an algorithm for computing arbitrarily accurate approximations to the performance gradient of parameterized partially observable Markov decision processes (POMDPs).
Direct Gradient-Based Reinforcement Learning: II. Gradient Ascent Algorithms and Experiments (1999)
Jonathan Baxter, Lex Weaver, Peter Bartlett
In [2] we introduced GPOMDP, an algorithm for computing arbitrarily accurate approximations to the performance gradient of parameterized partially observable Markov decision processes (POMDPs). The...
Learning From State Differences: (1999)
TD() with function approximation has proved empirically successful for some complex reinforcement learning problems. For linear approximation, TD() has been shown to minimise the squared error...
KnightCap: A chess program that learns by combining TD(lambda) with game-tree search (1999)
Baxter, Jonathan, Tridgell, Andrew, Weaver, Lex
In this paper we present TDLeaf(lambda), a variation on the TD(lambda) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess...
TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search (1999)
Baxter, Jonathan, Tridgell, Andrew, Weaver, Lex
In this paper we present TDLeaf(lambda), a variation on the TD(lambda) algorithm that enables it to be used in conjunction with minimax search. We present some experiments in both chess and...
Pre-fetching tree-structured data in distributed memory (1998)
A distributed heap storage manager has been implemented on the Fujitsu AP1000 multicomputer. The performance of various pre-fetching strategies is experimentally compared. Subjective programming...
Evolution of Neural Networks to Play the Game of Dots-and-Boxes (1998)
Weaver, Lex, Bossomaier, Terry
Dots-and-Boxes is a child's game which remains analytically unsolved. We implement and evolve artificial neural networks to play this game, evaluating them against simple heuristic players. Our...
Experiments in Parameter Learning Using Temporal Differences (1998)
Jonathan Baxter, Andrew Tridgell, Lex Weaver
In this paper we discuss the problem of automatically learning evaluation function parameters in a chess program. In particular, we describe some experiments in which our chess program KnightCap...
Experiments in Parameter Learning Using Temporal Differences (1998)
Jonathan Baxter, Andrew Tridgell, Lex Weaver
In this paper we discuss the problem of automatically learning evaluation function parameters in a chess program. In particular, we describe some experiments in which our chess program KnightCap...
KnightCap: A chess program that learns by combining TD(lambda) with game-tree search (1998)
Jonathan Baxter, Andrew Tridgell, Lex Weaver
In this paper we present TDLeaf(), a variation on the TD() algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program...
KnightCap: A chess program that learns by combining (1998)
Jonathan Baxter, Andrew Tridgell, Lex Weaver
In this paper we present TDLeaf(), a variation on the TD() algorithm that enables it to be used in conjunction with minimax search. We present some experiments in which our chess program,...
TDLeaf(): Combining Temporal Difference learning with game-tree search. (1998)
Jonathan Baxter, Andrew Tridgell, Lex Weaver
In this paper we present TDLeaf(), a variation on the TD() algorithm that enables it to be used in conjunction with minimax search. We present some experiments in both chess and backgammon which...
Sorting Integers on the AP1000 (1998)
Sorting is one of the classic problems of computer science. Whilst well understood on sequential machines, the diversity of architectures amongst parallel systems means that algorithms do not perform...
Sorting Integers on the AP1000 (1998)
Sorting is one of the classic problems of computer science. Whilst well understood on sequential machines, the diversity of architectures amongst parallel systems means that algorithms do not perform...
TDLeaf(): Combining Temporal Difference Learning with Game-Tree Search. (1998)
Jonathan Baxter, Andrew Tridgell, Lex Weaver
In this paper we present TDLeaf(), a variation on the TD() algorithm that enables it to be used in conjunction with minimax search. We present some experiments in both chess and backgammon which...
TDLeaf(): Combining Temporal Difference Learning with Game-Tree Search. (1998)
Jonathan Baxter, Andrew Tridgell, Lex Weaver
In this paper we present TDLeaf(), a variation on the TD() algorithm that enables it to be used in conjunction with minimax search. We present some experiments in both chess and backgammon which...
KnightCap: A chess program that learns by combining TD(lambda) with minimax search (1998)
Jonathan Baxter, Andrew Tridgell, Lex Weaver
In this paper we present TDLeaf(), a variation on the TD() algorithm that enables it to be used in conjunction with minimax search. We present some experiments in which our chess program,...
KnightCap: A chess program that learns by combining (1998)
Jonathan Baxter, Andrew Tridgell, Lex Weaver
In this paper we present TDLeaf(), a variation on the TD() algorithm that enables it to be used in conjunction with minimax search. We present some experiments in which our chess program,...
Evolution of Neural Networks to Play the Game of Dots-and-Boxes (1996)
Dots-and-Boxes is a child's game which remains analytically unsolved. We implement and evolve artificial neural networks to play this game, evaluating them against simple heuristic players. Our...
Evolution of Neural Networks to Play the Game of Dots-and-Boxes (1996)
Dots-and-Boxes is a child's game which remains analytically unsolved. We implement and evolve artificial neural networks to play this game, evaluating them against simple heuristic players. Our...
Pre-Fetching Tree-Structured Data in Distributed Memory (1996)
A distributed heap storage manager has been implemented on the Fujitsu AP1000 multicomputer. The performance of various pre-fetching strategies is experimentally compared. Subjective programming...
Pre-Fetching Tree-Structured Data in Distributed Memory (1996)
A distributed heap storage manager has been implemented on the Fujitsu AP1000 multicomputer. The performance of various pre-fetching strategies is experimentally compared. Subjective programming...