How to Plot PR-Curve Over 10 folds of Cross Validation in Scikit-Learn

I'm running some supervised experiments for a binary prediction problem. I'm using 10-fold cross validation to evaluate performance in terms of mean average precision (average precision for each fold divided by the number of folds for cross validation - 10 in my case). I would like to plot PR-curves of the result of mean average precision over these 10 folds, however I'm not sure the best way to do this.

A previous question in the Cross Validated Stack Exchange site raised this same problem. A comment recommended working through this example on plotting ROC curves across folds of cross validation from the Scikit-Learn site, and tailoring it to average precision. Here is the relevant section of code I've modified to try this idea:

from scipy import interp # Other packages/functions are imported, but not crucial to the question max_ent = LogisticRegression() mean_precision = 0.0 mean_recall = np.linspace(0,1,100) mean_average_precision = [] for i in set(folds): y_scores =, y_train).decision_function(X_test) precision, recall, _ = precision_recall_curve(y_test, y_scores) average_precision = average_precision_score(y_test, y_scores) mean_average_precision.append(average_precision) mean_precision += interp(mean_recall, recall, precision) # After this line of code, inspecting the mean_precision array shows that # the majority of the elements equal 1. This is the part that is confusing me # and is contributing to the incorrect plot. mean_precision /= len(set(folds)) # This is what the actual MAP score should be mean_average_precision = sum(mean_average_precision) / len(mean_average_precision) # Code for plotting the mean average precision curve across folds plt.plot(mean_recall, mean_precision) plt.title('Mean AP Over 10 folds (area=%0.2f)' % (mean_average_precision))

The code runs, however in my case the mean average precision curve is incorrect. For some reason, the array I have assigned to store the mean_precision scores (mean_tpr variable in the ROC example) computes the first element to be near zero, and all other elements to be 1 after dividing by the number of folds. Below is a visualization of the mean_precision scores plotted against the mean_recall scores. As you can see, the plot jumps to 1 which is inaccurate. How to Plot PR-Curve Over 10 folds of Cross Validation in Scikit-Learn
So my hunch is something is going awry in the update of mean_precision (mean_precision += interp(mean_recall, recall, precision) ) at in each fold of cross-validation, but it's unclear how to fix this. Any guidance or help would be appreciated.

Category:python Views:3 Time:2015-10-23

Related post

  • 10*10 fold cross validation in scikit-learn? 2011-11-26

    Is class sklearn.cross_validation.ShuffleSplit(n, n_iterations=10, test_fraction=0.10000000000000001, indices=True, random_state=None) the right way for 10*10fold CV in scikit-learn? (By changing the random_state to 10 different numbers) Because I di

  • How do you plot elliptic curves over a finite field using matlab 2012-02-06

    I need to draw an elliptic curve over the finite field F17(in other words, I want to draw some specific dots on the curve), but somehow I don't get it right. The curve is defined by the equation: y^2 = x^3 +x + 1 (mod 17) I tried the way below, but i

  • How to plot three curves on same plot with same X axis but different Y axes in MATLAB? 2011-07-09

    Possible Duplicate: Plotting 4 curves in a single plot, with 3 y-axes I have three vectors of data: A, B, C, that are function of time t (same t-values to 3 of them). I want to plot all three in same graph (3 different curves), but MATLAB makes them

  • How to print out the predicted class after cross-validation in WEKA 2011-09-06

    Once a 10-fold cross-validation is done with a classifier, how can I print out the prediced class of every instance and the distribution of these instances? J48 j48 = new J48(); Evaluation eval = new Evaluation(newData); eval.crossValidateModel(j48,

  • How to plot page views over time with qplot? 2011-03-26

    I have loaded a log onto a dataframe v. You can see the output of head(v): user_id page_id timestamp 1 139 1612783 2011-02-22 06:24:40 2 139 1612783 2011-02-22 06:28:40 3 139 1612783 2011-02-22 06:41:01 How can I qplot the number of page_id's per day

  • CSS: how to "plot" invisible text over an image? 2010-10-31

    Say I have a webpage containing a 200x200 image. At coordinates 50,50,150,150 (x1, y1, x2, y2), say I have the letter "A". How do I do the following in CSS: Overlay the image with the character "A" starting at the coordinates x1, y1? Size the text su

  • How to attain custom control over the Data Annotation Validation feature in ASP.NET MVC 2 using LinqToSQL? 2009-08-03

    I'm using LinqToSQL, creating my entities with the designer in studio which nicely creates the designer class with all corresponding entity-classes. Normally when I wanted some custom stuff added to my entities, I would create a partial class and do

  • Problem with axis limits when plotting curve over histogram 2010-11-17

    newbie here. I have a script to create graphs that has a bit that goes something like this: png(Test.png) ht=hist(step[i],20) curve(insert_function_here,add=TRUE) I essentially want to plot a curve of a distribution over an histogram. My problem is t

  • how to calculate roc curves? 2012-10-19

    I write a classifier (Gaussian Mixture Model) to classify five human actions. For every observation the classifier compute the posterior probability to belong to a cluster. I want to valutate the performance of my system parameterized with a threshol

  • Gnuplot: How to plot each line in a file after some pause 2008-10-22

    i have a 3 column datafile and i wanted to use splot to plot the same. But what i want is that gnuplot plots first row (in some colour, say red) and then pauses for say 0.3 secs and then moves on to plotting next row (in other colour, not in red, say

  • In matlab, how to draw a grid over an image 2010-11-15

    How to draw a grid over an image. It should become part of that image itself. It should be able to show some rows and columns over the image itself. The lines for rows and columns can be specified. Actually I was encouraged by the way some research p

  • How to plot how variables change in Visual Studio 2011-02-07

    oes anyone know a way to plot how a watched variable changes over time in Visual Studio 2010? I.e. if you had the following code double someVariable; for ( int i = 0; i < 20; i++) { someVariable = Math.Pi() * i; } and you watched 'someVariable' in

  • How to plot density of two datasets on same scale in one figure? 2011-06-28

    How to plot the density of a single column dataset as dots? For example x <- c(1:40) On the same plot using the same scale of the x-axis and y-axis, how to add another data set as line format which represent the density of another data that repres

  • Plotting multive curves in R 2011-07-04

    I need to plot multi curves in a single graph in R, for example (a,b) and (a,c) in the same graph, where a,b and c are data vectors. Anyone know how to do this? Thanks. cheng --------------Solutions------------- You can do this using the plot and lin

  • Plotting multiple curves same graph and same scale 2011-07-28

    This is a follow-up of this question. I wanted to plot multiple curves on the same graph but so that my new curves respect the same y-axis scale generated by the first curve. Notice the following example: y1 <- c(100, 200, 300, 400, 500) y2 <-

  • How to draw a curve in Qt? 2011-07-30

    Possible Duplicate: How to draw clothoids graphically in Qt? I am trying to draw some curves in my application and I would like to know the best method to do that. The curves are actually segments of clothoids and I know the start and end point for e

  • Plot two curves in logistic regression in R 2012-02-13

    I am running logistic regression in R (glm). I then manage to plot the result. My code is as follow: temperature.glm = glm(Response~Temperature, data=mydata,family=binomial) plot(mydata$Temperature,mydata$Response, ,xlab="Temperature",ylab="Probabili

  • Howto Plot ROC curve in R with only known SN/PPV/Cutoff info 2012-02-16

    Given such data: #Cutpoint SN (1-PPV) 5 0.56 0.01 7 0.78 0.19 9 0.91 0.58 How can I plot ROC curve with R that produce similar result like the attached ? I know ROCR package but it doesn't take such input. --------------Solutions------------- If you

  • Plot a curve between two points in canvas 2012-03-30

    Given a set of points, say (10, 10) and (50, 10), how can I plot a curve between them? My geometry is a bit rusty and I'm not sure of which canvas method to use (arc(), quadradicCurveTo(), etc..). Can anyone point me in the right direction? ---------

  • How to plot quadrat counts on top of a map in ggplot2 in a heatmap-like style 2012-04-24

    I am trying to take a shapefile and points that are contained within it and end up with a plot of the shapefile, the points, and then eventually a quadrat analysis overlayed on top with some amount of alpha transparency. I tried and came up with this

Copyright (C), All Rights Reserved.

processed in 0.067 (s). 11 q(s)