Skip to content Skip to sidebar Skip to footer

Why Roc_auc Produces Weird Results In Sklearn?

I have a binary classification problem where I use the following code to get my weighted avarege precision, weighted avarege recall, weighted avarege f-measure and roc_auc. df = pd

Solution 1:

It is not weird, because comparing all these other metrics with AUC is like comparing apples to oranges.

Here is a high-level description of the whole process:

  • Probabilistic classifiers (like RF here) produce probability outputs p in [0, 1].
  • To get hard class predictions (0/1), we apply a threshold to these probabilities; if not set explicitly (like here), this threshold is implicitly taken to be 0.5, i.e. if p>0.5 then class=1, else class=0.
  • Metrics like accuracy, precision, recall, and f1-score are calculated over the hard class predictions 0/1, i.e after the threshold has been applied.
  • In contrast, AUC measures the performance of a binary classifier averaged over the range of all possible thresholds, and not for a particular threshold.

So, it can certainly happen, and it can indeed lead to confusion among new practitioners.

The second part of my answer in this similar question might be helpful for more details. Quoting:

According to my experience at least, most ML practitioners think that the AUC score measures something different from what it actually does: the common (and unfortunate) use is just like any other the-higher-the-better metric, like accuracy, which may naturally lead to puzzles like the one you express yourself.

Post a Comment for "Why Roc_auc Produces Weird Results In Sklearn?"