83. Precision/Recall Tradeoff

How do you decide a threshold for, let’s say classification? The higher the threshold, the lower the recall, but the higher the precision, and vice versa. This is called the Precision/Recall Tradeoff.

One way is to plot all possible thresholds.

  1. Use decision_function method
    y_scores = cross_val_predict(model, X_train, y_train, cv=3,
                                    method="decision_function")
    
  2. With the scores previously computed, compute all possible thresholds
    from sklearn.metrics import precision_recall_curve
    precisions, recalls, thresholds = precision_recall_curve(y_train, y_scores)
    
  3. Plot
    def plot_precision_recall_vs_threshold(precisions, recalls, thresholds):
        plt.plot(thresholds, precisions[:-1], "b--", label="Precision", linewidth=2)
        plt.plot(thresholds, recalls[:-1], "g-", label="Recall", linewidth=2)
        plt.legend(loc="center right", fontsize=16) 
        plt.xlabel("Threshold", fontsize=16)        
        plt.grid(True)                             
    
    plt.figure(figsize=(8, 4))                                                                  
    plot_precision_recall_vs_threshold(precisions, recalls, thresholds)
    save_fig("precision_recall_vs_threshold_plot")                                       
    plt.show()
    

Another way is to plot precision directly against recall.

def plot_precision_vs_recall(precisions, recalls):
    plt.plot(recalls, precisions, "b-", linewidth=2)
    plt.xlabel("Recall", fontsize=16)
    plt.ylabel("Precision", fontsize=16)
    plt.axis([0, 1, 0, 1])
    plt.grid(True)

plt.figure(figsize=(8, 6))
plot_precision_vs_recall(precisions, recalls)

save_fig("precision_vs_recall_plot")
plt.show()


You can see that precision falls dramatically around 80% recall. So, for this case, you will probably want to select a precision/recall trade-off before that fall which is about 60% recall.

References:
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition