I'm running some supervised experiments for a binary prediction problem. I'm using 10-fold cross validation to evaluate performance in terms of mean average precision (average precision for each fold divided by the number of folds for cross validation - 10 in my case). I would like to plot PR-curves of the result of mean average precision over these 10 folds, however I'm not sure the best way to do this.

A previous question in the Cross Validated Stack Exchange site raised this same problem. A comment recommended working through this example on plotting ROC curves across folds of cross validation from the Scikit-Learn site, and tailoring it to average precision. Here is the relevant section of code I've modified to try this idea:

`from scipy import interp # Other packages/functions are imported, but not crucial to the question max_ent = LogisticRegression() mean_precision = 0.0 mean_recall = np.linspace(0,1,100) mean_average_precision = [] for i in set(folds): y_scores = max_ent.fit(X_train, y_train).decision_function(X_test) precision, recall, _ = precision_recall_curve(y_test, y_scores) average_precision = average_precision_score(y_test, y_scores) mean_average_precision.append(average_precision) mean_precision += interp(mean_recall, recall, precision) # After this line of code, inspecting the mean_precision array shows that # the majority of the elements equal 1. This is the part that is confusing me # and is contributing to the incorrect plot. mean_precision /= len(set(folds)) # This is what the actual MAP score should be mean_average_precision = sum(mean_average_precision) / len(mean_average_precision) # Code for plotting the mean average precision curve across folds plt.plot(mean_recall, mean_precision) plt.title('Mean AP Over 10 folds (area=%0.2f)' % (mean_average_precision)) plt.show() `

The code runs, however in my case the mean average precision curve is incorrect. For some reason, the array I have assigned to store the `mean_precision`

scores (`mean_tpr`

variable in the ROC example) computes the first element to be near zero, and all other elements to be 1 after dividing by the number of folds. Below is a visualization of the `mean_precision`

scores plotted against the `mean_recall`

scores. As you can see, the plot jumps to 1 which is inaccurate.

So my hunch is something is going awry in the update of `mean_precision`

(`mean_precision += interp(mean_recall, recall, precision)`

) at in each fold of cross-validation, but it's unclear how to fix this. Any guidance or help would be appreciated.