## Frequently Asked Questions

A C++ Toolbox for Scalable SVM Approximations

- For AMM, the model consists of a number of linear hyperplanes for each class of a multi-class problem. Each hyperplane is a linear combination of the subset of training examples misclassified during the online training, although, due to the online nature of AMM training, only the hyperplane is stored while the examples are discarded.
- For LLSVM, the model stores a number of
*landmark points*used to map the training and test examples to a new space where a linear SVM is solved. The landmark points can be chosen in several ways: a) we randomly choose a number of training examples as landmark points, b) we run*k*-means/*k*-medoids on the training set and use the found data points as landmark points. Note that in the b) case the landmark points are not necessarily examples that can be found in the training set. - For BSGD, the model consists of a number of support vectors. However, depending on the strategy used to maintain the limited budget where the model is stored during the online training, the support vectors are not necessarily the training examples. In the current implementation of BSGD the budget is maintained either by
*merging*of two existing support vectors to obtain only one which effectively reduces the model size, or by randomly removing one of the existing support vectors. Note that in the case of merging the support vectors stored in the model are not necessarily examples that can be found in the training set.

- For AMM, the budget refers to a maximum number of linear hyperplanes per each class of a multi-class problem. The budget is maintained through deletion of hyperplanes during training (called pruning), as well as through explicit limitation on the number of hyperplanes per class. Once the maximum number of hyperplanes is reached, no more weights can be added to the model until some hyperplanes get pruned. Note, however, that very often the number of hyperplanes per class is smaller than the maximum allowed budget due to the fact that AMM
*adapts*the number of hyperplanes to the difficulty of the classification problem at hand. In practice, this results in smaller number of hyperplanes for less-difficult classification tasks. - For LLSVM, the budget refers to a total number of
*landmark points*used by the Nystrom method to approximate the kernel matrix associated with training or testing data set. The landmark points can be chosen in several ways: a) we randomly choose a number of training examples as landmark points, b) we run*k*-means/*k*-medoids on the training set and use the found data points as landmark points. - For BSGD, the budget refers to a maximum number of support vectors stored as a model. When the number of support vectors exceeds the maximum allowed number, in the current implementation of BSGD the budget is maintained either by
*merging*two existing support vectors to obtain only one which effectively reduces the model size, or by randomly removing one of the existing support vectors. Note that in the case of merging the support vectors stored in the model are no longer examples that can be found in the training set.

- Linear and RBF kernel are most often used for the BSGD and LLSVM methods. However, in principle, any kernel can be used for these methods, and we have included various kernels in the toolbox. On the other hand, AMM only uses linear kernel, while still being able to solve highly non-linear problems.

- Due to the online nature of the training of AMM, BSGD, and LLSVM, adding more cores would not affect the performance of BudgetedSVM significantly. However, we found that adding more RAM would result in significant speed-ups in both training and testing phases.