Peter L. Bartlett

Margin adaptive model selection in statistical learning (2008)

Arlot, Sylvain, Bartlett, Peter L.

A classical condition for fast learning rates is the margin condition, first introduced by Mammen and Tsybakov. We tackle in this paper the problem of adaptivity to this condition in the context of...

Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii (2007)

Bartlett, Peter L., Mendelson, Shahar

Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii [arXiv:0708.0083]

Comment on "Support Vector Machines with Applications" (2006)

Bartlett, Peter L., Jordan, Michael I., McAuliffe, Jon D.

Comment on "Support Vector Machines with Applications" [math.ST/0612817]

Local Rademacher complexities (2005)

Bartlett, Peter L., Bousquet, Olivier, Mendelson, Shahar

We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical...

Local Rademacher complexities (2005)

Bartlett, Peter L., Bousquet, Olivier, Mendelson, Shahar

We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical...

Large Margin Methods for Structured Classification: Exponentiated gradient algorithms and PAC-Bayesian generalization bounds (2004)

Peter L. Bartlett, Michael Collins, David Mcallester, Ben Taskar

We consider the problem of structured classification, where the task is to predict a label y from an input x, and y has meaningful internal structure.

Discussions of boosting papers, and rejoinders (2004)

Bartlett, Peter L., Bickel, Peter J., Bühlmann, Peter, Freund, Yoav, Friedman, Jerome, Hastie, Trevor, ...

Discussions of: "Process consistency for AdaBoost" [Ann. Statist. 32 (2004), no. 1, 13-29] by W. Jiang; "On the Bayes-risk consistency of regularized boosting methods" [ibid., 30-55] by G. Lugosi and...

Local complexities for empirical risk minimization (2004)

Bartlett, Peter L, Mendelson, Shahar, Philips, Petra

We present sharp bounds on the risk of the empirical minimization algorithm under mild assumptions on the class. We introduce the notion of isomorphic coordinate projections and show that this leads...

Convexity, Classification, and Risk Bounds (2003)

Risk Bounds, Peter L. Bartlett, Michael I. Jordan, Jon D. Mcaulie

Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex...

LETTER Communicated by John Platt New Support Vector Algorithms (2003)

Bernhard Sch Olkopf, Alex J. Smola, Robert C. Williamson, Peter L. Bartlett

this article with the regression case. To explain this, we will introduce a suitable definition of a margin that is maximized in both cases

Convexity, Classification, and Risk Bounds (2003)

Risk Bounds, Peter L. Bartlett, Michael I. Jordan, Jon D. Mcaulie

Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex...

Local Rademacher Complexities (2002)

Peter L. Bartlett, Olivier Bousquet, Shahar Mendelson

We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical...

Journal of Machine Learning Research 3 (2002) 463-482 Submitted 11/01; Published 11/02 Rademacher and Gaussian Complexities: (2002)

Peter L. Bartlett, Shahar Mendelson, M. Long

We investigate the use of certain data-dependent estimates of the complexity of a function class, called Rademacher and Gaussian complexities. In a decision theoretic setting, we prove general risk...

Rademacher and Gaussian Complexities: Risk Bounds and Structural Results (2002)

Peter L. Bartlett, Shahar Mendelson, M. Long

We investigate the use of certain data-dependent estimates of the complexity of a function class, called Rademacher and gaussian complexities. In a decision theoretic setting, we prove general risk...

Generalization Error of Combined Classifiers (2002)

Llew Mason, Peter L. Bartlett, Mostefa Golea

this paper we present an upper bound on the generalization error of any thresholded convex combination of functions which are themselves thresholded convex combinations of functions in terms of the...

Improved Generalization through Explicit Optimization of Margins (2002)

Peter L. Bartlett, Jonathan Baxter

Recent theoretical results have shown that the generalization performance of thresholded convex combinations of base classifiers is greatly improved if the underlying convex combination has large...

Error Bounds for Voting Classifiers Using Margin Cost Functions (2002)

Llew Mason, Peter L. Bartlett

Recent theoretical results haveshown that the accuracy of thresholded real-valued functions (suchasvoting classifiers) is greatly improved if the underlying function has large margins on the training...

Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning (2002)

Evan Greensmith, Peter L. Bartlett, Jonathan Baxter

We consider the use of two additive control variate methods to reduce the variance of performance gradient estimates in reinforcement learning problems. The first approach we consider is the baseline...

Experiments with Infinite-Horizon, Policy-Gradient Estimation (2001)

Jonathan Baxter, Whizbang Labs, Peter L. Bartlett, Biowulf Technologies, Lex Weaver

In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm...

Infinite-Horizon Policy-Gradient Estimation (2001)

Jonathan Baxter, Whizbang Labs, Peter L. Bartlett, Biowulf Technologies

Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems...

Internal State GPOMDP with Trace Filtering (2001)

Douglas Aberdeen, Jonathan Baxter, Peter L. Bartlett

GPOMDP is an algorithm for estimating the gradient of the average reward for arbitrary Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies. It...

Hardness Results for Neural Network Approximation Problems (2001)

Peter L. Bartlett

Introduction Previous negative results for learning two-layer neural network classifiers show that it is difficult to find a network that correctly classifies all examples in a training set. However,...

Rademacher and Gaussian Complexities: Risk Bounds and Structural Results (2001)

Peter L. Bartlett

We investigate the use of certain data-dependent estimates of the complexity of a function class, called Rademacher and gaussian complexities. In a decision theoretic setting, we prove general risk...

Stochastic Optimization of Controlled Partially Observable Markov Decision Processes (2000)

Peter L. Bartlett, Jonathan Baxter, Whizbang Labs East

We introduce an on-line algorithm for finding local maxima of the average reward in a Partially Observable Markov Decision Process (POMPD) controlled by a parameterized policy. Optimization is over...

Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning (2000)

Peter L. Bartlett, Jonathan Baxter

We model reinforcement learning as the problem of learning to control a Partially Observable Markov Decision Process (POMDP), and focus on gradient ascent approaches to this problem. In [3] we...

Model Selection and Error Estimation (2000)

Peter L. Bartlett, Stephane Boucheron, Gabor Lugosi

We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error...

Reinforcement Learning in POMDP's via Direct Gradient Ascent (2000)

Jonathan Baxter, Peter L. Bartlett

This paper discusses theoretical and experimental aspects of gradient-based approaches to the direct optimization of policy performance in controlled POMDPs. We introduce GPOMDP, a REINFORCE-like...

Model Selection and Error Estimation (2000)

Peter L. Bartlett, St Ephane Boucheron, G Abor Lugosi

We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error...

Function Learning from Interpolation (2000)

Peter L. Bartlett, Martin Anthony

In this paper, we study a statistical property of classes of real-valued functions that we call approximation from interpolated examples. We derive a characterisation of function classes that have...

!()+, -./01 23456 (2000)

John Shawe-taylor, Peter L. Bartlett, Robert C. Williamson, Martin Anthony

The paper introduces some generalizations of Vapnik's method of structural risk minimisation (SRM). As well as making explicit some of the details on SRM, it provides a result that allows one to...

Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation Algorithms (1999)

Jonathan Baxter, Peter L. Bartlett

Despite their many empirical successes, approximate value-function based approaches to reinforcement learning suffer from a paucity of theoretical guarantees on the performance of the policy...

Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation Algorithms (1999)

Jonathan Baxter, Peter L. Bartlett

Despite their many empirical successes, approximate value-function based approaches to reinforcement learning suffer from a paucity of theoretical guarantees on the performance of the policy...

Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation Algorithms (1999)

Jonathan Baxter, Peter L. Bartlett

Despite their many empirical successes, approximate value-function based approaches to reinforcement learning suffer from a paucity of theoretical guarantees on the performance of the policy...

Generalization Error of Combined Classifiers (1999)

Llew Mason, Peter L. Bartlett, Mostefa Golea

this paper we present an upper bound on the generalization error of any thresholded convex combination of functions which are themselves thresholded convex combinations of functions in terms of the...

The VC-Dimension and Pseudodimension of Two-Layer Neural Networks with Discrete Inputs (1999)

Peter L. Bartlett, Robert C. Williamson

We give upper bounds on the Vapnik-Chervonenkis dimension and pseudodimension of two-layer neural networks that use the standard sigmoid function or radial basis function and have inputs from...

Structural Risk Minimization over Data-Dependent Hierarchies (1999)

John Shawe-taylor, Peter L. Bartlett, Robert C. Williamson, Martin Anthony

The paper introduces some generalizations of Vapnik's method of structural risk minimisation (SRM). As well as making explicit some of the details on SRM, it provides a result that allows one to...

The Importance of Convexity in Learning with Squared Loss (1999)

Peter L. Bartlett, Robert C. Williamson

We show that if the closure of a function class F under the metric induced by some probability distribution is not convex, then the sample complexity for agnostically learning F with squared loss...

Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks (1999)

Peter L. Bartlett, Vitaly Maiorov, Ron Meir

We compute upper and lower bounds on the VC dimension of feedforward networks of units with piecewise polynomial activation functions. We show that if the number of layers is fixed, then the VC...

Learning with a Slowly Changing Distribution (1999)

Peter L. Bartlett

In this paper, we consider the problem of learning a subset of a domain from randomly chosen examples when the probability distribution of the examples changes slowly but continually throughout the...

Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks (1999)

Peter L. Bartlett, Vitaly Maiorov, Ron Meir

We compute upper and lower bounds on the VC dimension of feedforward networks of units with piecewise polynomial activation functions. We show that if the number of layers is fixed, then the VC...

Generalization in decision trees and DNF: Does size matter? (1999)

Mostefa Golea, Peter L. Bartlett, Wee Sun Lee

Recent theoretical results for pattern classification with thresholded real-valued functions (such as support vector machines, sigmoid networks, and boosting) give bounds on misclassification...

Structural Risk Minimization over Data-Dependent Hierarchies (1999)

John Shawe-taylor, Peter L. Bartlett, Robert C. Williamson, Martin Anthony

The paper introduces some generalizations of Vapnik's method of structural risk minimisation (SRM). As well as making explicit some of the details on SRM, it provides a result that allows one to...

The Importance of Convexity in Learning with Squared Loss (1999)

Peter L. Bartlett, Robert C. Williamson

We show that if the closure of a function class F under the metric induced by some probability distribution is not convex, then the sample complexity for agnostically learning F with squared loss...

Correction to "Lower Bounds on the VC-Dimension of Smoothly Parametrized Function Classes" (1999)

Wee Sun Lee, Peter L. Bartlett, Robert C. Williamson

The paper [3] gives lower bounds on the VC-dimension of various smoothly parametrized function classes. The results were proven by showing a relationship between the uniqueness of decision boundaries...

Generalization in decision trees and DNF: Does size matter? (1999)

Mostefa Golea, Peter L. Bartlett, Wee Sun Lee

Recent theoretical results for pattern classification with thresholded real-valued functions (such as support vector machines, sigmoid networks, and boosting) give bounds on misclassification...

Lower Bounds on the VC-Dimension of Smoothly Parametrized Function Classes (1999)

Wee Sun Lee, Peter L. Bartlett, Robert C. Williamson

We examine the relationship between the VC-dimension and the number of parameters of a thresholded smoothly parametrized function class. We show that the VC-dimension of such a function class is at...

Efficient Agnostic Learning of Neural Networks with Bounded Fan-in (1999)

Wee Sun Lee, Peter L. Bartlett, Robert C. Williamson

We show that the class of two layer neural networks with bounded fan-in is efficiently learnable in a realistic extension to the Probably Approximately Correct (PAC) learning model. In this model, a...

Covering Numbers for Support Vector Machines (1999)

Guo, Ying, Bartlett, Peter L., Shawe-Taylor, John, Williamson, Robert C.

Support vector (SV) machines are linear classifiers that use the maximum margin hyperplane in a feature space defined by a kernel function. Until recently, the only bounds on the generalization...

Covering Numbers for Support Vector Machines (1999)

Guo, Ying, Bartlett, Peter L., Shawe-Taylor, John, Williamson, Robert C.

Support vector (SV) machines are linear classifiers that use the maximum margin hyperplane in a feature space defined by a kernel function. Until recently, the only bounds on the generalization...

Covering Numbers for Support Vector Machines (1999)

Guo, Ying, Bartlett, Peter L., Shawe-Taylor, John, Williamson, Robert C.

Support vector (SV) machines are linear classifiers that use the maximum margin hyperplane in a feature space defined by a kernel function. Until recently, the only bounds on the generalization...

Covering Numbers for Support Vector Machines (1999)

Guo, Ying, Bartlett, Peter L., Shawe-Taylor, John, Williamson, Robert C.

Support vector (SV) machines are linear classifiers that use the maximum margin hyperplane in a feature space defined by a kernel function. Until recently, the only bounds on the generalization...

Learning Changing Concepts by Exploiting the Structure of Change (1998)

Peter L. Bartlett, Shai Ben-david, Sanjeev R. Kulkarni

This paper examines learning problems in which the target function is allowed to change. The learner sees a sequence of random examples, labelled according to a sequence of functions, and must...

Learning Changing Concepts by Exploiting the Structure of Change (1998)

Peter L. Bartlett, Shai Ben-david, Sanjeev R. Kulkarni

This paper examines learning problems in which the target function is allowed to change. The learner sees a sequence of random examples, labelled according to a sequence of functions, and must...

Structural Risk Minimization over Data-Dependent Hierarchies (1998)

John Shawe-taylor, Peter L. Bartlett, Robert C. Williamson, Martin Anthony

The paper introduces some generalizations of Vapnik's method of structural risk minimisation (SRM). As well as making explicit some of the details on SRM, it provides a result that allows one to...

Structural Risk Minimization over Data-Dependent Hierarchies (1998)

John Shawe-taylor, Peter L. Bartlett, Robert C. Williamson, Martin Anthony

The paper introduces some generalizations of Vapnik's method of structural risk minimisation (SRM). As well as making explicit some of the details on SRM, it provides a result that allows one to...

Learning Changing Concepts by Exploiting the Structure of Change (1998)

Peter L. Bartlett, Shai Ben-david, Sanjeev R. Kulkarni

This paper examines learning problems in which the target function is allowed to change. The learner sees a sequence of random examples, labelled according to a sequence of functions, and must...

Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks (1998)

Peter L. Bartlett, Vitaly Maiorov, Ron Meir

We compute upper and lower bounds on the VC dimension and pseudo dimension of feedforward neural networks composed of piecewise polynomial activation functions. We show that if the number of layers...

Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks (1998)

Peter L. Bartlett, Vitaly Maiorov, Ron Meir

We compute upper and lower bounds on the VC dimension and pseudo dimension of feedforward neural networks composed of piecewise polynomial activation functions. We show that if the number of layers...

A Framework for Structural Risk Minimisation (1997)

John Shawe-taylor, Royal Holloway, Peter L. Bartlett, Robert C. Williamson, Martin Anthony

The paper introduces a framework for studying structural risk minimisation. The model views structural risk minimisation in a PAC context. It then considers the more general case when the hierarchy...

The Sample Complexity of Pattern Classification With Neural Networks: The Size of the Weights is More Important Than the Size of the Network (1997)

Peter L. Bartlett

Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number...

Member, IEEE (1997)

Peter L. Bartlett

Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number...

Member, IEEE (1997)

Peter L. Bartlett

Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number...

The Complexity of Model Classes, and Smoothing Noisy Data (1997)

Peter L. Bartlett, Sanjeev R. Kulkarni

We consider the problem of smoothing a sequence of noisy observations using a fixed class of models. Via a deterministic analysis, we obtain necessary and sufficient conditions on the noise sequence...

The Complexity of Model Classes, and Smoothing Noisy Data (1997)

Peter L. Bartlett, Sanjeev R. Kulkarni

We consider the problem of smoothing a sequence of noisy observations using a fixed class of models. Via a deterministic analysis, we obtain necessary and sufficient conditions on the noise sequence...

For Valid Generalization, the Size of the Weights is More Important Than the Size of the Network (1997)

Peter L. Bartlett

This paper shows that if a large neural network is used for a pattern classification problem, and the learning algorithm finds a network with small weights that has small squared error on the...

For Valid Generalization, the Size of the Weights is More Important Than the Size of the Network (1997)

Peter L. Bartlett

This paper shows that if a large neural network is used for a pattern classification problem, and the learning algorithm finds a network with small weights that has small squared error on the...

Error and Variance Bounds in Multilayer Neural Networks (1997)

David R. Lovell, Peter L. Bartlett

Recently, upper bounds have been derived on the expectation and variance of errors at the output of a multilayer feedforward neural network with input and weight errors. We investigate the validity...

Learning Changing Concepts by Exploiting the Structure of Change (1996)

Peter L. Bartlett, Shai Ben-david, Sanjeev R. Kulkarni

This paper examines learning problems in which the target function is allowed to change. The learner sees a sequence of random examples, labelled according to a sequence of functions, and must...

The Complexity of Model Classes and Smoothing Noisy Data (1996)

Peter L. Bartlett, Sanjeev R. Kulkarni

We consider the problem of smoothing a sequence of noisy observations using a fixed class of models. Via a deterministic analysis, we obtain necessary and sufficient conditions on the noise sequence...

A Framework for Structural Risk Minimisation (1996)

John Shawe-taylor, Royal Holloway, Peter L. Bartlett, Robert C. Williamson, Martin Anthony

The paper introduces a framework for studying structural risk minimisation. The model views structural risk minimisation in a PAC context. It then considers the more general case when the hierarchy...

The Complexity of Model Classes and Smoothing Noisy Data (1996)

Peter L. Bartlett, Sanjeev R. Kulkarni

We consider the problem of smoothing a sequence of noisy observations using a fixed class of models. Via a deterministic analysis, we obtain necessary and sufficient conditions on the noise sequence...

Learning Changing Concepts by Exploiting the Structure of Change (1996)

Peter L. Bartlett, Shai Ben-david, Sanjeev R. Kulkarni

This paper examines learning problems in which the target function is allowed to change. The learner sees a sequence of random examples, labelled according to a sequence of functions, and must...

A Framework for Structural Risk Minimisation (1996)

John Shawe-taylor, Royal Holloway, Peter L. Bartlett, Robert C. Williamson, Martin Anthony

The paper introduces a framework for studying structural risk minimisation. The model views structural risk minimisation in a PAC context. It then considers the more general case when the hierarchy...

Efficient Agnostic Learning of Neural Networks with Bounded Fan-in (1996)

Wee Sun Lee, Peter L. Bartlett, Robert C. Williamson

We show that the class of two layer neural networks with bounded fan-in is efficiently learnable in a realistic extension to the Probably Approximately Correct (PAC) learning model. In this model, a...

The VC-Dimension and Pseudodimension of Two-Layer Neural Networks with Discrete Inputs (1996)

Peter L. Bartlett, Robert C. Williamson

We give upper bounds on the Vapnik-Chervonenkis dimension and pseudodimension of two-layer neural networks that use the standard sigmoid function or radial basis function and have inputs from...

The VC-Dimension and Pseudodimension of Two-Layer Neural Networks with Discrete Inputs (1996)

Peter L. Bartlett, Robert C. Williamson

We give upper bounds on the Vapnik-Chervonenkis dimension and pseudodimension of two-layer neural networks that use the standard sigmoid function or radial basis function and have inputs from...

Correction to "Lower Bounds on the VC-Dimension of Smoothly Parametrized Function Classes" (1996)

Wee Sun Lee, Peter L. Bartlett, Robert C. Williamson

The paper [3] gives lower bounds on the VC-dimension of various smoothly parametrized function classes. The results were proven by showing a relationship between the uniqueness of decision boundaries...

The Importance of Convexity in Learning with Squared Loss (1996)

Peter L. Bartlett, Robert C. Williamson

We show that if the closure of a function class F under the metric induced by some probability distribution is not convex, then the sample complexity for agnostically learning F with squared loss...

The Importance of Convexity in Learning with Squared Loss (1996)

Wee Sun Lee, Peter L. Bartlett, Robert C. Williamson

We show that if the closure of a function class F under the metric induced by some probability distribution is not convex, then the sample complexity for agnostically learning F with squared loss...

Exploiting Random Walks for Learning (1995)

Peter L. Bartlett, Klaus-uwe Hoffgen, Paul Fischer, Lehrstuhl Informatik

In this paper we consider an approach to passive learning. In contrast to the classical PAC model we do not assume that the examples are independently drawn according to an underlying distribution,...

Fat-Shattering and the Learnability of Real-Valued Functions (1995)

Philip M. Long, Robert C. Williamson, Peter L. Bartlett

We consider the problem of learning real-valued functions from random examples when the function values are corrupted with noise. With mild conditions on independent o