83. Precision/Recall Tradeoff

How do you decide a threshold for, let’s say classification? The higher the threshold, the lower the recall, but the higher the precision, and vice versa. This is called the Precision/Recall Tradeoff.

One way is to plot all possible thresholds.

  1. Use decision_function method
    y_scores = cross_val_predict(model, X_train, y_train, cv=3,
  2. With the scores previously computed, compute all possible thresholds
    from sklearn.metrics import precision_recall_curve
    precisions, recalls, thresholds = precision_recall_curve(y_train, y_scores)
  3. Plot
    def plot_precision_recall_vs_threshold(precisions, recalls, thresholds):
        plt.plot(thresholds, precisions[:-1], "b--", label="Precision", linewidth=2)
        plt.plot(thresholds, recalls[:-1], "g-", label="Recall", linewidth=2)
        plt.legend(loc="center right", fontsize=16) 
        plt.xlabel("Threshold", fontsize=16)        
    plt.figure(figsize=(8, 4))                                                                  
    plot_precision_recall_vs_threshold(precisions, recalls, thresholds)

Another way is to plot precision directly against recall.

def plot_precision_vs_recall(precisions, recalls):
    plt.plot(recalls, precisions, "b-", linewidth=2)
    plt.xlabel("Recall", fontsize=16)
    plt.ylabel("Precision", fontsize=16)
    plt.axis([0, 1, 0, 1])

plt.figure(figsize=(8, 6))
plot_precision_vs_recall(precisions, recalls)


You can see that precision falls dramatically around 80% recall. So, for this case, you will probably want to select a precision/recall trade-off before that fall which is about 60% recall.

