Loading [MathJax]/extensions/MathMenu.js
F-SVM: Combination of Feature Transformation and SVM Learning via Convex Relaxation | IEEE Journals & Magazine | IEEE Xplore

F-SVM: Combination of Feature Transformation and SVM Learning via Convex Relaxation


Abstract:

The generalization error bound of the support vector machine (SVM) depends on the ratio of the radius and margin. However, conventional SVM only considers the maximizatio...Show More

Abstract:

The generalization error bound of the support vector machine (SVM) depends on the ratio of the radius and margin. However, conventional SVM only considers the maximization of the margin but ignores the minimization of the radius, which restricts its performance when applied to joint learning of feature transformation and the SVM classifier. Although several approaches have been proposed to integrate the radius and margin information, most of them either require the form of the transformation matrix to be diagonal, or are nonconvex and computationally expensive. In this paper, we suggest a novel approximation for the radius of the minimum enclosing ball in feature space, and then propose a convex radius-margin-based SVM model for joint learning of feature transformation and the SVM classifier, i.e., F-SVM. A generalized block coordinate descent method is adopted to solve the F-SVM model, where the feature transformation is updated via the gradient descent and the classifier is updated by employing the existing SVM solver. By incorporating with kernel principal component analysis, F-SVM is further extended for joint learning of nonlinear transformation and the classifier. F-SVM can also be incorporated with deep convolutional networks to improve image classification performance. Experiments on the UCI, LFW, MNIST, CIFAR-10, CIFAR-100, and Caltech101 data sets demonstrate the effectiveness of F-SVM.
Published in: IEEE Transactions on Neural Networks and Learning Systems ( Volume: 29, Issue: 11, November 2018)
Page(s): 5185 - 5199
Date of Publication: 05 February 2018

ISSN Information:

PubMed ID: 29994427

Funding Agency:

References is not available for this document.

I. Introduction

The support vector machine (SVM) and its extensions are one class of the most successful machine learning methods [1], and have been widely adopted in various application fields [2], [3]. Actually, SVM aims to seek the optimal hyperplane with the maximum margin principle, but the generalization error of SVM actually is a function of the ratio of the radius and margin, i.e., radius-margin error bound [4]. When feature mapping is given, the radius is fixed and can be ignored, and thus SVM can safely minimize the generalization error by maximizing the margin. However, for joint learning of feature transformation and the classifier, the radius information is valuable and cannot be ignored.

Select All
1.
V. N. Vapnik, Statistical Learning Theory, New York, NY, USA:Wiley, 1998.
2.
H. Do, A. Kalousis and M. Hilario, "Feature weighting using margin and radius based error bound optimization in SVMs", Proc. ECML PKDD, pp. 315-329, 2009.
3.
J. Wu and H. Yang, "Linear regression-based efficient SVM learning for large-scale classification", IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 10, pp. 2357-2369, Oct. 2015.
4.
V. Vapnik and O. Chapelle, "Bounds on error expectation for support vector machines", Neural Comput., vol. 12, no. 9, pp. 2013-2036, Sep. 2000.
5.
P. K. Shivaswamy and T. Jebara, "Maximum relative margin and data-dependent regularization", J. Mach. Learn. Res., vol. 11, pp. 747-788, Feb. 2010.
6.
X. Zhu, P. Gong, Z. Zhao and C. Zhang, "Learning similarity metric with SVM", Proc. Int. Joint Conf. Neural Netw. (IJCNN), pp. 1-8, Jun. 2012.
7.
H. Do and A. Kalousis, "Convex formulations of radius-margin based support vector machines", Proc. Int. Conf. Mach. Learn. (ICML), pp. 169-177, 2013.
8.
F. Wang, W. Zuo, L. Zhang, D. Meng and D. Zhang, "A kernel classification framework for metric learning", IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 9, pp. 1950-1962, Sep. 2015.
9.
C. Shen, J. Kim, F. Liu, L. Wang and A. van den Hengel, "Efficient dual approach to distance metric learning", IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 2, pp. 394-406, Feb. 2014.
10.
Z. Xu, K. Q. Weinberger and O. Chapelle, Distance metric learning for kernel machines, 2012, [online] Available: https://arxiv.org/abs/1208.3422.
11.
O. Chapelle, V. Vapnik, O. Bousquet and S. Mukherjee, "Choosing multiple parameters for support vector machines", Mach. Learn., vol. 46, no. 1, pp. 131-159, 2002.
12.
H. Do, A. Kalousis, A. Woznica and M. Hilario, "Margin and radius based multiple kernel learning", Proc. ECML PKDD, vol. 5781, pp. 330-343, 2009.
13.
X. Liu, L. Wang, J. Yin, E. Zhu and J. Zhang, "An efficient approach to integrating radius information into multiple kernel learning", IEEE Trans. Cybern., vol. 43, no. 2, pp. 557-569, Apr. 2013.
14.
K. Gai, G. Chen and C.-S. Zhang, "Learning kernels with radiuses of minimum enclosing balls", Proc. Adv. Neural Inf. Process. Syst. (NIPS), pp. 649-657, 2010.
15.
J. Nocedal and S. Wright, Numerical Optimization, New York, NY, USA:Springer, 2006.
16.
R. A. Horn and C. R. Johnson, Matrix Analysis, New York, NY, USA:Cambridge Univ. Press, 1988.
17.
J.-F. Cai, E. J. Candès and Z. Shen, "A singular value thresholding algorithm for matrix completion", SIAM J. Optim., vol. 20, no. 4, pp. 1956-1982, 2010.
18.
A. Hyvärinen and E. Oja, "Independent component analysis: Algorithms and applications", Neural Netw., vol. 13, no. 4, pp. 411-430, 2000.
19.
X. Shi, Z. Guo, F. Nie, L. Yang, J. You and D. Tao, "Two-dimensional whitening reconstruction for enhancing robustness of principal component analysis", IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 10, pp. 2130-2136, Oct. 2016.
20.
R. Girshick and J. Malik, "Training deformable part models with decorrelated features", Proc. Int. Conf. Comput. Vis. (ICCV), pp. 3016-3023, Dec. 2013.
21.
S. Jayasumana, R. Hartley, M. Salzmann, H. Li and M. Harandi, "Kernel methods on the riemannian manifold of symmetric positive definite matrices", Proc. Int. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 73-80, 2013.
22.
Q. Wang, P. Li, L. Zhang and W. Zuo, "Towards effective codebookless model for image classification", Pattern Recognit., vol. 59, pp. 63-71, Nov. 2016.
23.
Y. Xu and W. Yin, "A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion", SIAM J. Imag. Sci., vol. 6, no. 3, pp. 1758-1789, 2013.
24.
H. Attouch, J. Bolte, P. Redont and A. Soubeyran, "Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka–Łojasiewicz inequality", Math. Oper. Res., vol. 35, no. 2, pp. 438-457, 2010.
25.
A. Krizhevsky and G. Hinton, "Learning multiple layers of features from tiny images", 2009.
26.
L. Fei-Fei, R. Fergus and P. Perona, "One-shot learning of object categories", IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 4, pp. 594-611, Apr. 2006.
27.
J. Hu, J. Lu and Y.-P. Tan, "Discriminative deep metric learning for face verification in the wild", Proc. Int. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1875-1882, Jun. 2014.
28.
X. Zhu, Z. Lei, J. Yan, D. Yi and S. Z. Li, "High-fidelity pose and expression normalization for face recognition in the wild", Proc. Int. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 787-796, Jun. 2015.
29.
C. Ding, J. Choi, D. Tao and L. S. Davis, "Multi-directional multi-level dual-cross patterns for robust face recognition", IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 3, pp. 518-531, 2016.
30.
G. B. Huang, V. Jain and E. Learned-Miller, "Unsupervised joint alignment of complex images", Proc. Int. Conf. Comput. Vis. (ICCV), pp. 1-8, Oct. 2007.

Contact IEEE to Subscribe

References

References is not available for this document.