I. Introduction
Support vector machines (SVMs) [1] are a popular classification technique. The principle behind SVMs is that a number of support vectors are determined by solving a quadratic optimisation problem. These support vectors are subsequently used in order to define a decision boundary for the classifier. To apply SVMs to datasets that are not linearly separable, the well-known kernel trick [2] may be used, mapping the data into a higher dimensional space in which the dataset is linearly separable. The basic quadratic optimisation model of an SVM is given by: \begin{equation*}\max f(\alpha)=\sum_{i=1}^{n}\alpha_{i}-\frac{1}{2}\sum_{i,j=1}^{n}\alpha_{i}\alpha_{j}y_{i}y_{j}K(\mathrm{x}_{i},\ \mathrm{x}_{j}), \tag{1}\end{equation*}
subject to the constraints
\begin{equation*}\sum_{i=1}^{n}\alpha_{i}y_{i} = 0 \tag{2}\end{equation*}
\begin{equation*}0\leq \alpha_{i} \leq C for all i\in\{1,\ldots, n\}, \tag{3}\end{equation*}
where denotes the kernel function, and denotes a labelled dataset containing samples from two classes , where , and n denotes the number of observations. Those in α are the support vectors, while C is a constant parameter which quantifies the punishment awarded within the objective function for erroneous classification. The optimisation problem in equations (1)- (3) is solved when the Karush Kuhn Tucker (KKT) conditions [3] are satisfied.