How do you decide a threshold for, let’s say classification? The higher the threshold, the lower the recall, but the higher the precision, and vice versa. This is called the Precision/Recall Tradeoff.
One way is to plot all possible thresholds.
- Use decision_function method
y_scores = cross_val_predict(model, X_train, y_train, cv=3, method="decision_function")
- With the scores previously computed, compute all possible thresholds
from sklearn.metrics import precision_recall_curve precisions, recalls, thresholds = precision_recall_curve(y_train, y_scores)
- Plot
def plot_precision_recall_vs_threshold(precisions, recalls, thresholds): plt.plot(thresholds, precisions[:-1], "b--", label="Precision", linewidth=2) plt.plot(thresholds, recalls[:-1], "g-", label="Recall", linewidth=2) plt.legend(loc="center right", fontsize=16) plt.xlabel("Threshold", fontsize=16) plt.grid(True) plt.figure(figsize=(8, 4)) plot_precision_recall_vs_threshold(precisions, recalls, thresholds) save_fig("precision_recall_vs_threshold_plot") plt.show()
Another way is to plot precision directly against recall.
def plot_precision_vs_recall(precisions, recalls):
plt.plot(recalls, precisions, "b-", linewidth=2)
plt.xlabel("Recall", fontsize=16)
plt.ylabel("Precision", fontsize=16)
plt.axis([0, 1, 0, 1])
plt.grid(True)
plt.figure(figsize=(8, 6))
plot_precision_vs_recall(precisions, recalls)
save_fig("precision_vs_recall_plot")
plt.show()
You can see that precision falls dramatically around 80% recall. So, for this case, you will probably want to select a precision/recall trade-off before that fall which is about 60% recall.
References:
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition