The Unscrambler® X has the capability to use support vector machine (SVM) for both regression and classification modeling. SVM classification is based on statistical learning wherein a function that describes a hyperplane for optimal separation of classes is determined. SVM finds value in classification when linear functions are not adequate to achieve complete class separation.

In SVM, kernel functions are used to map data from the original space into a new feature space, thus providing the ability to handle nonlinear classification cases. The kernels can be viewed as a mapping of nonlinear data to a higher dimensional feature space, while providing a computation shortcut by allowing linear algorithms to work with higher dimensional feature space.

In this new space SVM will search for the samples that lie on the borderline between the classes, i.e. to find the samples that are ideal for separating the classes; these samples are named support vectors. It is the samples defined as support vectors that are used to generate the rule for classifying new samples.

A summary of the SVM classification model results is visualized in a confusion matrix that carries information about the predicted and actual classifications of samples, with each row showing the instances in a predicted class, and each column representing the instances in an actual class. In the below confusion matrix, all the “Setosa” samples are properly attributed to the “Setosa” group. One actual “Virginica” sample is predicted as “Versicolor”. Likewise two samples with actual value “Versicolor” are predicted as “Virginica”.

SVM classification model

A plot of the model classification results is also displayed, showing the various classes as they were classified for a 2D scatter plot of the original variables. Changing the axis to choose different variables to plot (use the arrows or drop-down list in the toolbar) is useful to see for which combinations of pairs of variables give a good separation between the classes being modeled.