Abstract

Support Vector Machines (SVMs) are among the most popular and successful classification algorithms. Kernel SVMs often reach state-of-the-art accuracies, but suffer from the curse of kernelization due to linear model growth with data size on noisy data. Linear SVMs have the ability to efficiently learn from truly large data, but they are applicable to a limited number of domains due to the low representational power. To fill the representability and scalability gap between linear and nonlinear SVMs, in this paper we propose the Adaptive Multi-hyperplane Machine (AMM) algorithm that accomplishes fast training and prediction and has capability to solve nonlinear classification problems. AMM model consists of a set of linear hyperplanes (weights), each assigned to one of the multiple classes, and predicts based on the associated class of the weight that provides the largest prediction. The number of weights is automatically determined through an iterative algorithm based on the stochastic gradient descent algorithm which is guaranteed to converge to a local optimum. Since the generalization bound decreases with the number of weights, a weight pruning criterion is proposed and analyzed. The experiments on several large data sets show that AMM is nearly as fast during training and prediction as the state-of-the-art linear SVM solver and that it can be orders of magnitude faster than kernel SVM. In accuracy, AMM is somewhere between linear and kernel SVMs. For example, on an OCR task with 8 million highly dimensional training examples, AMM takes 300 seconds on a single-core processor to converge and achieves 0.54% error rate which is comparable to the kernel SVM's 0.43% error rate achieved after 2 days of training on 512 processors and significantly higher than 2.03% error rate of a linear solver that took comparable training time. The results indicate that AMM should be considered as an attractive option when solving large-scale classification problems.

The software used in the experiments is available for download. Both MATLAB (mex) and C++ versions are provided.

Paper citation

Please cite as:
Wang, Z., Djuric, N., Crammer, K., Vucetic, S., Trading Representability for Scalability: Adaptive Multi-Hyperplane Machine for Nonlinear Classification,
17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), San Diego, CA, 2011.