Margin adaptive model selection in statistical learning (2008)
Arlot, Sylvain, Bartlett, Peter L.
A classical condition for fast learning rates is the margin condition, first introduced by Mammen and Tsybakov. We tackle in this paper the problem of adaptivity to this condition in the context of...
Bartlett, Peter L., Mendelson, Shahar
Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii [arXiv:0708.0083]
Comment on "Support Vector Machines with Applications" (2006)
Bartlett, Peter L., Jordan, Michael I., McAuliffe, Jon D.
Comment on "Support Vector Machines with Applications" [math.ST/0612817]
Local Rademacher complexities (2005)
Bartlett, Peter L., Bousquet, Olivier, Mendelson, Shahar
We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical...
Local Rademacher complexities (2005)
Bartlett, Peter L., Bousquet, Olivier, Mendelson, Shahar
We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical...
Peter L. Bartlett, Michael Collins, David Mcallester, Ben Taskar
We consider the problem of structured classification, where the task is to predict a label y from an input x, and y has meaningful internal structure.
Discussions of boosting papers, and rejoinders (2004)
Bartlett, Peter L., Bickel, Peter J., Bühlmann, Peter, Freund, Yoav, Friedman, Jerome, Hastie, Trevor, ...
Discussions of: "Process consistency for AdaBoost" [Ann. Statist. 32 (2004), no. 1, 13-29] by W. Jiang; "On the Bayes-risk consistency of regularized boosting methods" [ibid., 30-55] by G. Lugosi and...
Local complexities for empirical risk minimization (2004)
Bartlett, Peter L, Mendelson, Shahar, Philips, Petra
We present sharp bounds on the risk of the empirical minimization algorithm under mild assumptions on the class. We introduce the notion of isomorphic coordinate projections and show that this leads...
Convexity, Classification, and Risk Bounds (2003)
Risk Bounds, Peter L. Bartlett, Michael I. Jordan, Jon D. Mcaulie
Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex...
LETTER Communicated by John Platt New Support Vector Algorithms (2003)
Bernhard Sch Olkopf, Alex J. Smola, Robert C. Williamson, Peter L. Bartlett
this article with the regression case. To explain this, we will introduce a suitable definition of a margin that is maximized in both cases
Convexity, Classification, and Risk Bounds (2003)
Risk Bounds, Peter L. Bartlett, Michael I. Jordan, Jon D. Mcaulie
Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex...
Local Rademacher Complexities (2002)
Peter L. Bartlett, Olivier Bousquet, Shahar Mendelson
We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical...
Peter L. Bartlett, Shahar Mendelson, M. Long
We investigate the use of certain data-dependent estimates of the complexity of a function class, called Rademacher and Gaussian complexities. In a decision theoretic setting, we prove general risk...
Rademacher and Gaussian Complexities: Risk Bounds and Structural Results (2002)
Peter L. Bartlett, Shahar Mendelson, M. Long
We investigate the use of certain data-dependent estimates of the complexity of a function class, called Rademacher and gaussian complexities. In a decision theoretic setting, we prove general risk...
Generalization Error of Combined Classifiers (2002)
Llew Mason, Peter L. Bartlett, Mostefa Golea
this paper we present an upper bound on the generalization error of any thresholded convex combination of functions which are themselves thresholded convex combinations of functions in terms of the...
Improved Generalization through Explicit Optimization of Margins (2002)
Peter L. Bartlett, Jonathan Baxter
Recent theoretical results have shown that the generalization performance of thresholded convex combinations of base classifiers is greatly improved if the underlying convex combination has large...
Error Bounds for Voting Classifiers Using Margin Cost Functions (2002)
Recent theoretical results haveshown that the accuracy of thresholded real-valued functions (suchasvoting classifiers) is greatly improved if the underlying function has large margins on the training...
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning (2002)
Evan Greensmith, Peter L. Bartlett, Jonathan Baxter
We consider the use of two additive control variate methods to reduce the variance of performance gradient estimates in reinforcement learning problems. The first approach we consider is the baseline...
Experiments with Infinite-Horizon, Policy-Gradient Estimation (2001)
Jonathan Baxter, Whizbang Labs, Peter L. Bartlett, Biowulf Technologies, Lex Weaver
In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm...
Infinite-Horizon Policy-Gradient Estimation (2001)
Jonathan Baxter, Whizbang Labs, Peter L. Bartlett, Biowulf Technologies
Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems...
Internal State GPOMDP with Trace Filtering (2001)
Douglas Aberdeen, Jonathan Baxter, Peter L. Bartlett
GPOMDP is an algorithm for estimating the gradient of the average reward for arbitrary Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies. It...
Hardness Results for Neural Network Approximation Problems (2001)
Introduction Previous negative results for learning two-layer neural network classifiers show that it is difficult to find a network that correctly classifies all examples in a training set. However,...
Rademacher and Gaussian Complexities: Risk Bounds and Structural Results (2001)
We investigate the use of certain data-dependent estimates of the complexity of a function class, called Rademacher and gaussian complexities. In a decision theoretic setting, we prove general risk...
Stochastic Optimization of Controlled Partially Observable Markov Decision Processes (2000)
Peter L. Bartlett, Jonathan Baxter, Whizbang Labs East
We introduce an on-line algorithm for finding local maxima of the average reward in a Partially Observable Markov Decision Process (POMPD) controlled by a parameterized policy. Optimization is over...
Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning (2000)
Peter L. Bartlett, Jonathan Baxter
We model reinforcement learning as the problem of learning to control a Partially Observable Markov Decision Process (POMDP), and focus on gradient ascent approaches to this problem. In [3] we...
Model Selection and Error Estimation (2000)
Peter L. Bartlett, Stephane Boucheron, Gabor Lugosi
We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error...
Reinforcement Learning in POMDP's via Direct Gradient Ascent (2000)
Jonathan Baxter, Peter L. Bartlett
This paper discusses theoretical and experimental aspects of gradient-based approaches to the direct optimization of policy performance in controlled POMDPs. We introduce GPOMDP, a REINFORCE-like...
Model Selection and Error Estimation (2000)
Peter L. Bartlett, St Ephane Boucheron, G Abor Lugosi
We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error...
Function Learning from Interpolation (2000)
Peter L. Bartlett, Martin Anthony
In this paper, we study a statistical property of classes of real-valued functions that we call approximation from interpolated examples. We derive a characterisation of function classes that have...
John Shawe-taylor, Peter L. Bartlett, Robert C. Williamson, Martin Anthony
The paper introduces some generalizations of Vapnik's method of structural risk minimisation (SRM). As well as making explicit some of the details on SRM, it provides a result that allows one to...
Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation Algorithms (1999)
Jonathan Baxter, Peter L. Bartlett
Despite their many empirical successes, approximate value-function based approaches to reinforcement learning suffer from a paucity of theoretical guarantees on the performance of the policy...
Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation Algorithms (1999)
Jonathan Baxter, Peter L. Bartlett
Despite their many empirical successes, approximate value-function based approaches to reinforcement learning suffer from a paucity of theoretical guarantees on the performance of the policy...
Direct Gradient-Based Reinforcement Learning: I. Gradient Estimation Algorithms (1999)
Jonathan Baxter, Peter L. Bartlett
Despite their many empirical successes, approximate value-function based approaches to reinforcement learning suffer from a paucity of theoretical guarantees on the performance of the policy...
Generalization Error of Combined Classifiers (1999)
Llew Mason, Peter L. Bartlett, Mostefa Golea
this paper we present an upper bound on the generalization error of any thresholded convex combination of functions which are themselves thresholded convex combinations of functions in terms of the...
The VC-Dimension and Pseudodimension of Two-Layer Neural Networks with Discrete Inputs (1999)
Peter L. Bartlett, Robert C. Williamson
We give upper bounds on the Vapnik-Chervonenkis dimension and pseudodimension of two-layer neural networks that use the standard sigmoid function or radial basis function and have inputs from...
Structural Risk Minimization over Data-Dependent Hierarchies (1999)
John Shawe-taylor, Peter L. Bartlett, Robert C. Williamson, Martin Anthony
The paper introduces some generalizations of Vapnik's method of structural risk minimisation (SRM). As well as making explicit some of the details on SRM, it provides a result that allows one to...
The Importance of Convexity in Learning with Squared Loss (1999)
Peter L. Bartlett, Robert C. Williamson
We show that if the closure of a function class F under the metric induced by some probability distribution is not convex, then the sample complexity for agnostically learning F with squared loss...
Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks (1999)
Peter L. Bartlett, Vitaly Maiorov, Ron Meir
We compute upper and lower bounds on the VC dimension of feedforward networks of units with piecewise polynomial activation functions. We show that if the number of layers is fixed, then the VC...
Learning with a Slowly Changing Distribution (1999)
In this paper, we consider the problem of learning a subset of a domain from randomly chosen examples when the probability distribution of the examples changes slowly but continually throughout the...
Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks (1999)
Peter L. Bartlett, Vitaly Maiorov, Ron Meir
We compute upper and lower bounds on the VC dimension of feedforward networks of units with piecewise polynomial activation functions. We show that if the number of layers is fixed, then the VC...
Generalization in decision trees and DNF: Does size matter? (1999)
Mostefa Golea, Peter L. Bartlett, Wee Sun Lee
Recent theoretical results for pattern classification with thresholded real-valued functions (such as support vector machines, sigmoid networks, and boosting) give bounds on misclassification...
Structural Risk Minimization over Data-Dependent Hierarchies (1999)
John Shawe-taylor, Peter L. Bartlett, Robert C. Williamson, Martin Anthony
The paper introduces some generalizations of Vapnik's method of structural risk minimisation (SRM). As well as making explicit some of the details on SRM, it provides a result that allows one to...
The Importance of Convexity in Learning with Squared Loss (1999)
Peter L. Bartlett, Robert C. Williamson
We show that if the closure of a function class F under the metric induced by some probability distribution is not convex, then the sample complexity for agnostically learning F with squared loss...
Correction to "Lower Bounds on the VC-Dimension of Smoothly Parametrized Function Classes" (1999)
Wee Sun Lee, Peter L. Bartlett, Robert C. Williamson
The paper [3] gives lower bounds on the VC-dimension of various smoothly parametrized function classes. The results were proven by showing a relationship between the uniqueness of decision boundaries...
Generalization in decision trees and DNF: Does size matter? (1999)
Mostefa Golea, Peter L. Bartlett, Wee Sun Lee
Recent theoretical results for pattern classification with thresholded real-valued functions (such as support vector machines, sigmoid networks, and boosting) give bounds on misclassification...
Lower Bounds on the VC-Dimension of Smoothly Parametrized Function Classes (1999)
Wee Sun Lee, Peter L. Bartlett, Robert C. Williamson
We examine the relationship between the VC-dimension and the number of parameters of a thresholded smoothly parametrized function class. We show that the VC-dimension of such a function class is at...
Efficient Agnostic Learning of Neural Networks with Bounded Fan-in (1999)
Wee Sun Lee, Peter L. Bartlett, Robert C. Williamson
We show that the class of two layer neural networks with bounded fan-in is efficiently learnable in a realistic extension to the Probably Approximately Correct (PAC) learning model. In this model, a...
Covering Numbers for Support Vector Machines (1999)
Guo, Ying, Bartlett, Peter L., Shawe-Taylor, John, Williamson, Robert C.
Support vector (SV) machines are linear classifiers that use the maximum margin hyperplane in a feature space defined by a kernel function. Until recently, the only bounds on the generalization...
Covering Numbers for Support Vector Machines (1999)
Guo, Ying, Bartlett, Peter L., Shawe-Taylor, John, Williamson, Robert C.
Support vector (SV) machines are linear classifiers that use the maximum margin hyperplane in a feature space defined by a kernel function. Until recently, the only bounds on the generalization...
Covering Numbers for Support Vector Machines (1999)
Guo, Ying, Bartlett, Peter L., Shawe-Taylor, John, Williamson, Robert C.
Support vector (SV) machines are linear classifiers that use the maximum margin hyperplane in a feature space defined by a kernel function. Until recently, the only bounds on the generalization...
Covering Numbers for Support Vector Machines (1999)
Guo, Ying, Bartlett, Peter L., Shawe-Taylor, John, Williamson, Robert C.
Support vector (SV) machines are linear classifiers that use the maximum margin hyperplane in a feature space defined by a kernel function. Until recently, the only bounds on the generalization...
Learning Changing Concepts by Exploiting the Structure of Change (1998)
Peter L. Bartlett, Shai Ben-david, Sanjeev R. Kulkarni
This paper examines learning problems in which the target function is allowed to change. The learner sees a sequence of random examples, labelled according to a sequence of functions, and must...
Learning Changing Concepts by Exploiting the Structure of Change (1998)
Peter L. Bartlett, Shai Ben-david, Sanjeev R. Kulkarni
This paper examines learning problems in which the target function is allowed to change. The learner sees a sequence of random examples, labelled according to a sequence of functions, and must...
Structural Risk Minimization over Data-Dependent Hierarchies (1998)
John Shawe-taylor, Peter L. Bartlett, Robert C. Williamson, Martin Anthony
The paper introduces some generalizations of Vapnik's method of structural risk minimisation (SRM). As well as making explicit some of the details on SRM, it provides a result that allows one to...
Structural Risk Minimization over Data-Dependent Hierarchies (1998)
John Shawe-taylor, Peter L. Bartlett, Robert C. Williamson, Martin Anthony
The paper introduces some generalizations of Vapnik's method of structural risk minimisation (SRM). As well as making explicit some of the details on SRM, it provides a result that allows one to...
Learning Changing Concepts by Exploiting the Structure of Change (1998)
Peter L. Bartlett, Shai Ben-david, Sanjeev R. Kulkarni
This paper examines learning problems in which the target function is allowed to change. The learner sees a sequence of random examples, labelled according to a sequence of functions, and must...
Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks (1998)
Peter L. Bartlett, Vitaly Maiorov, Ron Meir
We compute upper and lower bounds on the VC dimension and pseudo dimension of feedforward neural networks composed of piecewise polynomial activation functions. We show that if the number of layers...
Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks (1998)
Peter L. Bartlett, Vitaly Maiorov, Ron Meir
We compute upper and lower bounds on the VC dimension and pseudo dimension of feedforward neural networks composed of piecewise polynomial activation functions. We show that if the number of layers...
A Framework for Structural Risk Minimisation (1997)
John Shawe-taylor, Royal Holloway, Peter L. Bartlett, Robert C. Williamson, Martin Anthony
The paper introduces a framework for studying structural risk minimisation. The model views structural risk minimisation in a PAC context. It then considers the more general case when the hierarchy...
Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number...
Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number...
Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number...
The Complexity of Model Classes, and Smoothing Noisy Data (1997)
Peter L. Bartlett, Sanjeev R. Kulkarni
We consider the problem of smoothing a sequence of noisy observations using a fixed class of models. Via a deterministic analysis, we obtain necessary and sufficient conditions on the noise sequence...
The Complexity of Model Classes, and Smoothing Noisy Data (1997)
Peter L. Bartlett, Sanjeev R. Kulkarni
We consider the problem of smoothing a sequence of noisy observations using a fixed class of models. Via a deterministic analysis, we obtain necessary and sufficient conditions on the noise sequence...
This paper shows that if a large neural network is used for a pattern classification problem, and the learning algorithm finds a network with small weights that has small squared error on the...
This paper shows that if a large neural network is used for a pattern classification problem, and the learning algorithm finds a network with small weights that has small squared error on the...
Error and Variance Bounds in Multilayer Neural Networks (1997)
David R. Lovell, Peter L. Bartlett
Recently, upper bounds have been derived on the expectation and variance of errors at the output of a multilayer feedforward neural network with input and weight errors. We investigate the validity...
Learning Changing Concepts by Exploiting the Structure of Change (1996)
Peter L. Bartlett, Shai Ben-david, Sanjeev R. Kulkarni
This paper examines learning problems in which the target function is allowed to change. The learner sees a sequence of random examples, labelled according to a sequence of functions, and must...
The Complexity of Model Classes and Smoothing Noisy Data (1996)
Peter L. Bartlett, Sanjeev R. Kulkarni
We consider the problem of smoothing a sequence of noisy observations using a fixed class of models. Via a deterministic analysis, we obtain necessary and sufficient conditions on the noise sequence...
A Framework for Structural Risk Minimisation (1996)
John Shawe-taylor, Royal Holloway, Peter L. Bartlett, Robert C. Williamson, Martin Anthony
The paper introduces a framework for studying structural risk minimisation. The model views structural risk minimisation in a PAC context. It then considers the more general case when the hierarchy...
The Complexity of Model Classes and Smoothing Noisy Data (1996)
Peter L. Bartlett, Sanjeev R. Kulkarni
We consider the problem of smoothing a sequence of noisy observations using a fixed class of models. Via a deterministic analysis, we obtain necessary and sufficient conditions on the noise sequence...
Learning Changing Concepts by Exploiting the Structure of Change (1996)
Peter L. Bartlett, Shai Ben-david, Sanjeev R. Kulkarni
This paper examines learning problems in which the target function is allowed to change. The learner sees a sequence of random examples, labelled according to a sequence of functions, and must...
A Framework for Structural Risk Minimisation (1996)
John Shawe-taylor, Royal Holloway, Peter L. Bartlett, Robert C. Williamson, Martin Anthony
The paper introduces a framework for studying structural risk minimisation. The model views structural risk minimisation in a PAC context. It then considers the more general case when the hierarchy...
Efficient Agnostic Learning of Neural Networks with Bounded Fan-in (1996)
Wee Sun Lee, Peter L. Bartlett, Robert C. Williamson
We show that the class of two layer neural networks with bounded fan-in is efficiently learnable in a realistic extension to the Probably Approximately Correct (PAC) learning model. In this model, a...
The VC-Dimension and Pseudodimension of Two-Layer Neural Networks with Discrete Inputs (1996)
Peter L. Bartlett, Robert C. Williamson
We give upper bounds on the Vapnik-Chervonenkis dimension and pseudodimension of two-layer neural networks that use the standard sigmoid function or radial basis function and have inputs from...
The VC-Dimension and Pseudodimension of Two-Layer Neural Networks with Discrete Inputs (1996)
Peter L. Bartlett, Robert C. Williamson
We give upper bounds on the Vapnik-Chervonenkis dimension and pseudodimension of two-layer neural networks that use the standard sigmoid function or radial basis function and have inputs from...
Correction to "Lower Bounds on the VC-Dimension of Smoothly Parametrized Function Classes" (1996)
Wee Sun Lee, Peter L. Bartlett, Robert C. Williamson
The paper [3] gives lower bounds on the VC-dimension of various smoothly parametrized function classes. The results were proven by showing a relationship between the uniqueness of decision boundaries...
The Importance of Convexity in Learning with Squared Loss (1996)
Peter L. Bartlett, Robert C. Williamson
We show that if the closure of a function class F under the metric induced by some probability distribution is not convex, then the sample complexity for agnostically learning F with squared loss...
The Importance of Convexity in Learning with Squared Loss (1996)
Wee Sun Lee, Peter L. Bartlett, Robert C. Williamson
We show that if the closure of a function class F under the metric induced by some probability distribution is not convex, then the sample complexity for agnostically learning F with squared loss...
Exploiting Random Walks for Learning (1995)
Peter L. Bartlett, Klaus-uwe Hoffgen, Paul Fischer, Lehrstuhl Informatik
In this paper we consider an approach to passive learning. In contrast to the classical PAC model we do not assume that the examples are independently drawn according to an underlying distribution,...
Fat-Shattering and the Learnability of Real-Valued Functions (1995)
Philip M. Long, Robert C. Williamson, Peter L. Bartlett
We consider the problem of learning real-valued functions from random examples when the function values are corrupted with noise. With mild conditions on independent o