83. Precision/Recall Tradeoff

How do you decide a threshold for, let’s say classification? The higher the threshold, the lower the recall, but the higher the precision, and vice versa. This is called the Precision/Recall Tradeoff.

One way is to plot all possible thresholds.

Use decision_function method

y_scores = cross_val_predict(model, X_train, y_train, cv=3,
                                method="decision_function")

With the scores previously computed, compute all possible thresholds

from sklearn.metrics import precision_recall_curve
precisions, recalls, thresholds = precision_recall_curve(y_train, y_scores)

Plot

def plot_precision_recall_vs_threshold(precisions, recalls, thresholds):
    plt.plot(thresholds, precisions[:-1], "b--", label="Precision", linewidth=2)
    plt.plot(thresholds, recalls[:-1], "g-", label="Recall", linewidth=2)
    plt.legend(loc="center right", fontsize=16) 
    plt.xlabel("Threshold", fontsize=16)        
    plt.grid(True)                             

plt.figure(figsize=(8, 4))                                                                  
plot_precision_recall_vs_threshold(precisions, recalls, thresholds)
save_fig("precision_recall_vs_threshold_plot")                                       
plt.show()

Another way is to plot precision directly against recall.

def plot_precision_vs_recall(precisions, recalls):
    plt.plot(recalls, precisions, "b-", linewidth=2)
    plt.xlabel("Recall", fontsize=16)
    plt.ylabel("Precision", fontsize=16)
    plt.axis([0, 1, 0, 1])
    plt.grid(True)

plt.figure(figsize=(8, 6))
plot_precision_vs_recall(precisions, recalls)

save_fig("precision_vs_recall_plot")
plt.show()

You can see that precision falls dramatically around 80% recall. So, for this case, you will probably want to select a precision/recall trade-off before that fall which is about 60% recall.

References:
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

Related Posts

403. Data Distribution Shifts

402. Data Leakage

397. Finding Data