
Evaluating the accuracy using cross-validation
The cross-validation is an important concept in machine learning. In the previous recipe, we split the data into training and testing datasets. However, in order to make it more robust, we need to repeat this process with different subsets. If we just fine-tune it for a particular subset, we may end up overfitting the model. Overfitting refers to a situation where we fine-tune a model too much to a dataset and it fails to perform well on unknown data. We want our machine learning model to perform well on unknown data.
Getting ready…
Before we discuss how to perform cross-validation, let's talk about performance metrics. When we are dealing with machine learning models, we usually care about three things: precision, recall, and F1 score. We can get the required performance metric using the parameter scoring. Precision refers to the number of items that are correctly classified as a percentage of the overall number of items in the list. Recall refers to the number of items that are retrieved as a percentage of the overall number of items in the training list.
Let's consider a test dataset containing 100 items, out of which 82 are of interest to us. Now, we want our classifier to identify these 82 items for us. Our classifier picks out 73 items as the items of interest. Out of these 73 items, only 65 are actually the items of interest and the remaining eight are misclassified. We can compute precision in the following way:
- The number of correct identifications = 65
- The total number of identifications = 73
- Precision = 65 / 73 = 89.04%
To compute recall, we use the following:
- The total number of interesting items in the dataset = 82
- The number of items retrieved correctly = 65
- Recall = 65 / 82 = 79.26%
A good machine learning model needs to have good precision and good recall simultaneously. It's easy to get one of them to 100%, but the other metric suffers! We need to keep both the metrics high at the same time. To quantify this, we use an F1 score, which is a combination of precision and recall. This is actually the harmonic mean of precision and recall:
F1 score = 2 * precision * recall / (precision + recall)
In the preceding case, the F1 score will be as follows:
F1 score = 2 * 0.89 * 0.79 / (0.89 + 0.79) = 0.8370
How to do it…
- Let's see how to perform cross-validation and extract performance metrics. We will start with the accuracy:
num_validations = 5 accuracy = cross_validation.cross_val_score(classifier_gaussiannb, X, y, scoring='accuracy', cv=num_validations) print "Accuracy: " + str(round(100*accuracy.mean(), 2)) + "%"
- We will use the preceding function to compute precision, recall, and the F1 score as well:
f1 = cross_validation.cross_val_score(classifier_gaussiannb, X, y, scoring='f1_weighted', cv=num_validations) print "F1: " + str(round(100*f1.mean(), 2)) + "%" precision = cross_validation.cross_val_score(classifier_gaussiannb, X, y, scoring='precision_weighted', cv=num_validations) print "Precision: " + str(round(100*precision.mean(), 2)) + "%" recall = cross_validation.cross_val_score(classifier_gaussiannb, X, y, scoring='recall_weighted', cv=num_validations) print "Recall: " + str(round(100*recall.mean(), 2)) + "%"