Jonathan Baxter

Boosting Algorithms as Gradient Descent (2002)

Llew Mason, Jonathan Baxter, Peter Bartlett, Marcus Frean

Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the...

Improved Generalization through Explicit Optimization of Margins (2002)

Peter L. Bartlett, Jonathan Baxter

Recent theoretical results have shown that the generalization performance of thresholded convex combinations of base classifiers is greatly improved if the underlying convex combination has large...

Direct Optimization of Margins Improves Generalization in Combined Classifiers (2002)

Llew Mason, Peter Bartlett, Jonathan Baxter

Sonar Cumulative training margin distributions for AdaBoost versus our "Direct Optimization Of Margins" (DOOM) algorithm.

Scaling Internal-State Policy-Gradient Methods for POMDPs (2002)

Douglas Aberdeen, Jonathan Baxter

Policy-gradient methods have received increased attention recently as a mechanism for learning to act in partially observable environments.

Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning (2002)

Evan Greensmith, Peter L. Bartlett, Jonathan Baxter

We consider the use of two additive control variate methods to reduce the variance of performance gradient estimates in reinforcement learning problems. The first approach we consider is the baseline...

Experiments with Infinite-Horizon, Policy-Gradient Estimation (2001)

Jonathan Baxter, Whizbang Labs, Peter L. Bartlett, Biowulf Technologies, Lex Weaver

In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm...

Infinite-Horizon Policy-Gradient Estimation (2001)

Jonathan Baxter, Whizbang Labs, Peter L. Bartlett, Biowulf Technologies

Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems...

Learning POMDP Policies with Internal State using Gradient Ascent (2001)

Douglas Aberdeen, Jonathan Baxter

In [8, 9] we introduced GPOMDP, an algorithm for estimating the gradient of the average reward for arbitrary Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized...

Internal State GPOMDP with Trace Filtering (2001)

Douglas Aberdeen, Jonathan Baxter, Peter L. Bartlett

GPOMDP is an algorithm for estimating the gradient of the average reward for arbitrary Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies. It...

STD(): learning state differences with TD() (2001)

Lex Weaver, Jonathan Baxter

TD() with function approximation has proved empirically successful for some complex reinforcement learning problems. For linear approximation, TD() has been shown to minimise the squared error...

A Multi-Agent, Policy-Gradient approach to Network Routing (2001)

Nigel Tao, Jonathan Baxter, Lex Weaver

Network routing is a distributed decision problem which naturally admits numerical performance measures, such as the average time for a packet to travel from source to destination. Olpomdp, a...

Direct Optimization of Margins Improves Generalization in Combined Classifiers (2001)

Llew Mason, Peter Bartlett, Jonathan Baxter

0 0 1 Cumulative training margin distributions for AdaBoost versus our "Direct Optimization Of Margins" (DOOM) algorithm. The dark curve is AdaBoost, the light curve is DOOM. DOOM sacrifices...

Emmerald: A Fast Matrix-Matrix Multiply Using Intel's SSE Instructions (2001)

Douglas Aberdeen, Jonathan Baxter

Generalised matrix-matrix multiplication forms the kernel of many mathematical algorithms, hence a faster matrix-matrix multiply immediately benefits these algorithms. In this paper we implement...

General Matrix-Matrix Multiplication Using SIMD Features of the PIII (2001)

Douglas Aberdeen, Jonathan Baxter

Generalised matrix-matrix multiplication forms the kernel of many mathematical algorithms. A faster matrix-matrix multiply immediately benets these algorithms. In this paper we implement ecient...

Learning To Play Chess Using Temporal-Differences (2001)

Jonathan Baxter, Andrew Tridgell

. In this paper we present TDLEAF(), a variation on the TD() algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program...

Stochastic Optimization of Controlled Partially Observable Markov Decision Processes (2000)

Peter L. Bartlett, Jonathan Baxter, Whizbang Labs East

We introduce an on-line algorithm for finding local maxima of the average reward in a Partially Observable Markov Decision Process (POMPD) controlled by a parameterized policy. Optimization is over...

Reinforcement Learning From State and Temporal Differences (2000)

Lex Weaver, Jonathan Baxter

TD() with function approximation has proved empirically successful for some complex reinforcement learning problems. For linear approximation, TD() has been shown to minimise the squared error...

92c/MFlops/s, Ultra-Large-Scale Neural-Network Training on a PIII Cluster (2000)

Jonathan Baxter, Robert Edwards

Artificial neural networks with millions of adjustable parameters and a similar number of training examples are a potential solution for difficult, large-scale pattern recognition problems in areas...

98c/MFlop, Ultra-Large-Scale Neural-Network Training on a PIII Cluster (2000)

Jonathan Baxter, Robert Edwards

Artificial neural networks with millions of adjustable parameters and a similar number of training examples are a potential solution for difficult, large-scale pattern recognition problems in areas...

Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning (2000)

Peter L. Bartlett, Jonathan Baxter

We model reinforcement learning as the problem of learning to control a Partially Observable Markov Decision Process (POMDP), and focus on gradient ascent approaches to this problem. In [3] we...

Reinforcement Learning in POMDP's via Direct Gradient Ascent (2000)

Jonathan Baxter, Peter L. Bartlett

This paper discusses theoretical and experimental aspects of gradient-based approaches to the direct optimization of policy performance in controlled POMDPs. We introduce GPOMDP, a REINFORCE-like...

Direct Gradient-Based Reinforcement Learning: II. Gradient Ascent Algorithms and Experiments (2000)

Jonathan Baxter, Lex Weaver, Peter Bartlett

In [2] we introduced GPOMDP, an algorithm for computing arbitrarily accurate approximations to the performance gradient of parameterized partially observable Markov decision processes (POMDPs). The...

The Canonical Metric for Vector Quantization (2000)

Jonathan Baxter, Surrey Tw Ex

To measure the quality of a set of vector quantization points a means of measuring the distance between two points is required. Common metrics such as the Hamming and Euclidean metrics, while...

A Model of Inductive Bias Learning (2000)

Jonathan Baxter

A major problem in machine learning is that of inductive bias: how to choose a learner's hypothesis space so that it is large enough to contain a solution to the problem being learnt, yet small...

A Model of Inductive Bias Learning (2000)

Jonathan Baxter

A major problem in machine learning is that of inductive bias: how to choose a learner's hypothesis space so that it is large enough to contain a solution to the problem being learnt, yet small...

Boosting Algorithms as Gradient Descent (2000)

Llew Mason, Jonathan Baxter, Peter Bartlett, Marcus Frean

We provide an abstract characterization of boosting algorithms as gradient descent on cost-functionals in an inner-product function space. We prove convergence of these functional-gradient-descent...

Direct Gradient-Based Reinforcement Learning: II. Gradient Ascent Algorithms and Experiments (2000)

Jonathan Baxter, Lex Weaver, Peter Bartlett

In [2] we introduced GPOMDP, an algorithm for computing arbitrarily accurate approximations to the performance gradient of parameterized partially observable Markov decision processes (POMDPs).

KnightCap: A chess program that learns by combining (2000)

Jonathan Baxter, Andrew Tridgell, Lex Weaver

In this paper we present TDLeaf(), a variation on the TD() algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program...

Direct Gradient-Based Reinforcement Learning: (1999)

Jonathan Baxter, Lex Weaver, Peter Bartlett

In [2] we introduced GPOMDP, an algorithm for computing arbitrarily accurate approximations to the performance gradient of parameterized partially observable Markov decision processes (POMDPs).

Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation Algorithms (1999)

Jonathan Baxter, Peter L. Bartlett

Despite their many empirical successes, approximate value-function based approaches to reinforcement learning suffer from a paucity of theoretical guarantees on the performance of the policy...

Direct Gradient-Based Reinforcement Learning: (1999)

Jonathan Baxter, Lex Weaver, Peter Bartlett

In [2] we introduced GPOMDP, an algorithm for computing arbitrarily accurate approximations to the performance gradient of parameterized partially observable Markov decision processes (POMDPs).

Direct Gradient-Based Reinforcement Learning: II. Gradient Ascent Algorithms and Experiments (1999)

Jonathan Baxter, Lex Weaver, Peter Bartlett

In [2] we introduced GPOMDP, an algorithm for computing arbitrarily accurate approximations to the performance gradient of parameterized partially observable Markov decision processes (POMDPs). The...

Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation Algorithms (1999)

Jonathan Baxter, Peter L. Bartlett

Despite their many empirical successes, approximate value-function based approaches to reinforcement learning suffer from a paucity of theoretical guarantees on the performance of the policy...

Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation Algorithms (1999)

Jonathan Baxter, Peter L. Bartlett

Despite their many empirical successes, approximate value-function based approaches to reinforcement learning suffer from a paucity of theoretical guarantees on the performance of the policy...

Learning From State Differences: (1999)

Lex Weaver, Jonathan Baxter

TD() with function approximation has proved empirically successful for some complex reinforcement learning problems. For linear approximation, TD() has been shown to minimise the squared error...

Boosting Algorithms as Gradient Descent (1999)

Llew Mason, Jonathan Baxter, Peter Bartlett, Marcus Frean

Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the...

Boosting Algorithms as Gradient Descent in Function Space (1999)

Llew Mason, Jonathan Baxter, Peter Bartlett, Marcus Frean

Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the...

Improved Generalization through Explicit Optimization of Margins (1999)

Peter Bartlett, Jonathan Baxter

Recent theoretical results have shown that the generalization performance of thresholded convex combinations of base classifiers is greatly improved if the underlying convex combination has large...

Direct Optimization of Margins Improves Generalization in Combined Classifiers (1999)

Llew Mason, Peter Bartlett, Jonathan Baxter

0 0 1 Cumulative training margin distributions for AdaBoost versus our "Direct Optimization Of Margins" (DOOM) algorithm. The dark curve is AdaBoost, the light curve is DOOM. DOOM sacrifices...

KnightCap: A chess program that learns by combining TD(lambda) with game-tree search (1999)

Baxter, Jonathan, Tridgell, Andrew, Weaver, Lex

In this paper we present TDLeaf(lambda), a variation on the TD(lambda) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess...

TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search (1999)

Baxter, Jonathan, Tridgell, Andrew, Weaver, Lex

In this paper we present TDLeaf(lambda), a variation on the TD(lambda) algorithm that enables it to be used in conjunction with minimax search. We present some experiments in both chess and...

Experiments in Parameter Learning Using Temporal Differences (1998)

Jonathan Baxter, Andrew Tridgell, Lex Weaver

In this paper we discuss the problem of automatically learning evaluation function parameters in a chess program. In particular, we describe some experiments in which our chess program KnightCap...

Experiments in Parameter Learning Using Temporal Differences (1998)

Jonathan Baxter, Andrew Tridgell, Lex Weaver

In this paper we discuss the problem of automatically learning evaluation function parameters in a chess program. In particular, we describe some experiments in which our chess program KnightCap...

KnightCap: A chess program that learns by combining TD(lambda) with game-tree search (1998)

Jonathan Baxter, Andrew Tridgell, Lex Weaver

In this paper we present TDLeaf(), a variation on the TD() algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program...

Direct Optimization of Margins Improves Generalization in Combined Classifiers (1998)

Llew Mason, Peter Bartlett, Jonathan Baxter

0 0 1 Cumulative training margin distributions for AdaBoost versus our "Direct Optimization Of Margins" (DOOM) algorithm. The dark curve is AdaBoost, the light curve is DOOM. DOOM sacrifices...

Direct Optimization of Margins Improves Generalization in Combined Classifiers (1998)

Llew Mason, Peter Bartlett, Jonathan Baxter

0 0 1 Cumulative training margin distributions for AdaBoost versus our "Direct Optimization Of Margins" (DOOM) algorithm. The dark curve is AdaBoost, the light curve is DOOM. DOOM sacrifices...

KnightCap: A chess program that learns by combining (1998)

Jonathan Baxter, Andrew Tridgell, Lex Weaver

In this paper we present TDLeaf(), a variation on the TD() algorithm that enables it to be used in conjunction with minimax search. We present some experiments in which our chess program,...

TDLeaf(): Combining Temporal Difference learning with game-tree search. (1998)

Jonathan Baxter, Andrew Tridgell, Lex Weaver

In this paper we present TDLeaf(), a variation on the TD() algorithm that enables it to be used in conjunction with minimax search. We present some experiments in both chess and backgammon which...

The Evolution of Learning Algorithms for Artificial Neural Networks (1998)

Jonathan Baxter

. In this paper we investigate a neural network model in which weights between computational nodes are modified according to a local learning rule. To determine whether local learning rules are...

Theoretical Models of Learning to Learn (1998)

Jonathan Baxter

. A Machine can only learn if it is biased in some way. Typically the bias is supplied by hand, for example through the choice of an appropriate set of features. However, if the learning machine is...

The Canonical Distortion Measure for Vector Quantization and Function Approximation (1998)

Jonathan Baxter

To measure the quality of a set of vector quantization points a means of measuring the distance between a random point and its quantization is required. Common metrics such as the Hamming and...

Learning Internal Representations (1998)

Jonathan Baxter

Probably the most important problem in machine learning is the preliminary biasing of a learner's hypothesis space so that it is small enough to ensure good generalisation from reasonable training...

The Canonical Distortion Measure in Feature Space and 1-NN Classification (1998)

Jonathan Baxter

We prove that the Canonical Distortion Measure (CDM) [2, 3] is the optimal distance measure to use for 1 nearest-neighbour (1-NN) classification, and show that it reduces to squared Euclidean...

TDLeaf(): Combining Temporal Difference Learning with Game-Tree Search. (1998)

Jonathan Baxter, Andrew Tridgell, Lex Weaver

In this paper we present TDLeaf(), a variation on the TD() algorithm that enables it to be used in conjunction with minimax search. We present some experiments in both chess and backgammon which...

TDLeaf(): Combining Temporal Difference Learning with Game-Tree Search. (1998)

Jonathan Baxter, Andrew Tridgell, Lex Weaver

In this paper we present TDLeaf(), a variation on the TD() algorithm that enables it to be used in conjunction with minimax search. We present some experiments in both chess and backgammon which...

KnightCap: A chess program that learns by combining TD(lambda) with minimax search (1998)

Jonathan Baxter, Andrew Tridgell, Lex Weaver

In this paper we present TDLeaf(), a variation on the TD() algorithm that enables it to be used in conjunction with minimax search. We present some experiments in which our chess program,...

KnightCap: A chess program that learns by combining (1998)

Jonathan Baxter, Andrew Tridgell, Lex Weaver

In this paper we present TDLeaf(), a variation on the TD() algorithm that enables it to be used in conjunction with minimax search. We present some experiments in which our chess program,...

A Result Relating Convex N-Widths to Covering Numbers With Some Applications to Neural Networks (1997)

Jonathan Baxter, Peter Bartlett

. In general, approximating classes of functions defined over high-dimensional input spaces by linear combinations of a fixed set of basis functions or "features" is known to be hard. Typically, the...

A Result Relating Convex (1997)

Jonathan Baxter, Peter Bartlett

. In general, approximating classes of functions defined over high-dimensional input spaces by linear combinations of a fixed set of basis functions or "features" is known to be hard. Typically, the...

Learning Model Bias (1996)

Jonathan Baxter

In this paper the problem of learning appropriate domain-specific bias is addressed. It is shown that this can be achieved by learning many related tasks from the same domain, and a theorem is given...

The Canonical Metric For Vector Quantization (1995)

Jonathan Baxter

. To measure the quality of a set of vector quantization points a means of measuring the distance between two points is required. Common metrics such as the Hamming and Euclidean metrics, while...

Histone hypomethylation is an indicator of epigenetic plasticity in quiescent lymphocytes

Baxter, Jonathan, Sauer, Stephan, Peters, Antoine, John, Rosalind, Williams, Ruth, Caparros, Marie-Laure, ...

Post-translational modifications of histone amino termini are thought to convey epigenetic information that extends the coding potential of DNA. In particular, histone lysine methylation has been...

Histone hypomethylation is an indicator of epigenetic plasticity in quiescent lymphocytes

Baxter, Jonathan, Sauer, Stephan, Peters, Antoine, John, Rosalind, Williams, Ruth, Caparros, Marie-Laure, ...

Post-translational modifications of histone amino termini are thought to convey epigenetic information that extends the coding potential of DNA. In particular, histone lysine methylation has been...