# New Generalization Bounds for Learning Kernels

@article{Cortes2009NewGB, title={New Generalization Bounds for Learning Kernels}, author={Corinna Cortes and Mehryar Mohri and Afshin Rostamizadeh}, journal={ArXiv}, year={2009}, volume={abs/0912.3309} }

This paper presents several novel generalization bounds for the problem of learning kernels based on the analysis of the Rademacher complexity of the corresponding hypothesis sets. Our bound for learning kernels with a convex combination of p base kernels has only a log(p) dependency on the number of kernels, p, which is considerably more favorable than the previous best bound given for the same problem. We also give a novel bound for learning with a linear combination of p base kernels with an… Expand

#### Topics from this paper

#### 11 Citations

On the Statistical Efficiency of Optimal Kernel Sum Classifiers

- Computer Science, Mathematics
- ICML 2019
- 2019

We propose a novel combination of optimization tools with learning theory bounds in order to analyze the sample complexity of optimal kernel sum classifiers. This contrasts the typical learning… Expand

Optimality Implies Kernel Sum Classifiers are Statistically Efficient

- Computer Science, Mathematics
- ICML
- 2019

We propose a novel combination of optimization tools with learning theory bounds in order to analyze the sample complexity of optimal kernel sum classifiers. This contrasts the typical learning… Expand

The Statistical Cost of Robust Kernel Hyperparameter Tuning

- Computer Science, Mathematics
- NeurIPS
- 2020

This paper provides finite-sample guarantees for the problem of finding the best interpolant from a class of kernels with unknown hyperparameters, assuming only that the noise is square-integrable, and shows that hyperparameter optimization increases sample complexity by just a logarithmic factor in comparison to the setting where optimal parameters are known in advance. Expand

Multiple Kernel Learning from $U$-Statistics of Empirical Measures in the Feature Space

- Computer Science, Mathematics
- ArXiv
- 2019

It is shown that the empirical estimate of U-statistic converges to its population value with respect to all admissible distributions as the number of the training samples increase, and the generalization bounds of the proposed kernel learning approach are established. Expand

Sample Complexity of Learning Mahalanobis Distance Metrics

- Computer Science, Mathematics
- NIPS
- 2015

This work provides PAC-style sample complexity rates for supervised metric learning and reveals that augmenting the metric learning optimization criterion with a simple norm-based regularization can help adapt to a dataset's intrinsic complexity, yielding better generalization. Expand

Federated Classification with Low Complexity Reproducing Kernel Hilbert Space Representations

- Computer Science
- ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020

This work proposes a method for federated learning in which each agent learns a low complexity Reproducing kernel Hilbert space representation and shows that as the sample size grows, the solution obtained by the central unit converges to that obtained by an omniscient classifier which has access to all samples from all agents. Expand

Federated Classification using Parsimonious Functions in Reproducing Kernel Hilbert Spaces

- Computer Science, Engineering
- 2020

It is shown that the solution of the federated learner converges to that of the centralized learner asymptotically as the sample size increases, and that the accuracy of the method convergesto that of a centralized learners with increasing sample size. Expand

Kernel-Based Random Vector Functional-Link Network for Fast Learning of Spatiotemporal Dynamic Processes

- Computer Science
- IEEE Transactions on Systems, Man, and Cybernetics: Systems
- 2019

Simulations on two typical industrial thermal processes show that the proposed spatiotemporal model has superior model performance than neural networks and least square support vector machine. Expand

Combining heterogeneous data sources for neuroimaging based diagnosis: re-weighting and selecting what is important

- Computer Science, Medicine
- NeuroImage
- 2019

The proposed approach, called EasyMKLFS, outperforms baselines (e.g. SVM and SimpleMKL), state-of-the-art random forests (RF) and feature selection (FS) methods and pursues an optimal and sparse solution to facilitate interpretability. Expand

Combining heterogeneous data sources for neuroimaging based diagnosis: re-weighting and selecting what is important

- Biology, Computer Science
- 2018

A framework based on a recent multiple kernel learning algorithm called EasyMKL is proposed and the benefits of this approach for diagnosing two different mental health diseases are investigated, showing that the proposed approach outperforms baselines and state-of-the-art random forests and feature selection methods. Expand

#### References

SHOWING 1-10 OF 27 REFERENCES

Generalization Bounds for Learning the Kernel Problem

- Computer Science
- COLT
- 2009

In this paper we develop a novel probabilistic generalization bound for learning the kernel problem. First, we show that the generalization analysis of the regularized kernel learning system reduces… Expand

Learning Convex Combinations of Continuously Parameterized Basic Kernels

- Mathematics, Computer Science
- COLT
- 2005

There always exists an optimal kernel which is the convex combination of at most m + 1 basic kernels, where m is the sample size, and provide a necessary and sufficient condition for a kernel to be optimal. Expand

Learning the Kernel Function via Regularization

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2005

It is shown that, although K may be an uncountable set, the optimal kernel is always obtained as a convex combination of at most m+2 basic kernels, where m is the number of data examples. Expand

Learning Non-Linear Combinations of Kernels

- Mathematics, Computer Science
- NIPS
- 2009

A projection-based gradient descent algorithm is given for solving the optimization problem of learning kernels based on a polynomial combination of base kernels and it is proved that the global solution of this problem always lies on the boundary. Expand

On the Complexity of Learning the Kernel Matrix

- Computer Science, Mathematics
- NIPS
- 2002

A suitable way of constraining the class is proposed and an efficient algorithm is used to solve the resulting optimization problem, preliminary experimental results are presented, and an alignment-based approach is compared. Expand

Learning Bounds for Support Vector Machines with Learned Kernels

- Mathematics, Computer Science
- COLT
- 2006

This work bounds the estimation error of a large margin classifier when the kernel, relative to which this margin is defined, is chosen from a family of kernels based on the training sample, and gives simple bounds on the pseudodimension for families of Gaussian kernels. Expand

L2 Regularization for Learning Kernels

- Computer Science, Mathematics
- UAI
- 2009

This paper presents a novel theoretical analysis of the problem of learning kernels with the same family of kernels but with an L2 regularization instead, and gives learning bounds for orthogonal kernels that contain only an additive term O(√p/m) when compared to the standard kernel ridge regression stability bound. Expand

Learning the Kernel with Hyperkernels

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2005

The equivalent representer theorem for the choice of kernels is state and a semidefinite programming formulation of the resulting optimization problem is presented, which leads to a statistical estimation problem similar to the problem of minimizing a regularized risk functional. Expand

A DC-programming algorithm for kernel selection

- Computer Science
- ICML
- 2006

This work builds upon a formulation involving a minimax optimization problem and a recently proposed greedy algorithm for learning the kernel to create a new algorithm which outperforms a previously proposed method. Expand

Rademacher and Gaussian Complexities: Risk Bounds and Structural Results

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2001

This work investigates the use of certain data-dependent estimates of the complexity of a function class called Rademacher and Gaussian complexities and proves general risk bounds in terms of these complexities in a decision theoretic setting. Expand