We judge the value of a model on how well it works in predicting values for new data. In the model development stage steps are taken to remove outlying samples, and to identify the variables that will give optimal prediction performance. One step in the analysis is to perform validation, wherein the predicted values from the model are compared with known values. In this step an error (RMSEP) can be calculated, which gives a measure of the error that can be expected for future samples. When predicting unknowns there needs to be some means of judging how good a prediction is, and whether the objects being predicted are similar to those on which the model was developed. In The Unscrambler® X we use deviation and outlier tools as measures of similarity of predicted samples to calibration samples.
The deviation is an indicator of whether the predicted results are reliable, as it is a measure of the samples’ residual and leverage relative to the calibration data.
It is calculated by the following equation:
- ResYValVar is the residual Y-variance from the validated residual
- ResXValSamppred is the residual X-validation variance that comes from the prediction of the new sample
- ResXValTot is the average residual X-validation in the model
- Hi is leverage of the new sample predicted relative to the calibration set
- ICal is number of calibration samples
- a is the number of factors/components in the calibration model
The default view on prediction in The Unscrambler® X shows the predicted values with deviation, where the predicted value is the red horizontal line (with a red vertical line for the RMSEP when the values are known) and the blue rectangle representing the deviation for each sample.