Implementation of ROC AUC Score

Python implementation of ROC AUC Score

Feb 07, 2025

Introduction

This post is a continuation of the ROC and AUC Interpretation. Please make sure that you understand that post before reading this one.

In this post, we will implement a ROC AUC Score in Python with O(nlog⁡n) runtime complexity.

You can also access this post on my personal page maitbayev.github.io/posts/roc-auc-implementation/ with better code highlight.

Explanation

Implementation

Let’s setup our environment:

import numpy as np

np.random.seed(0)
n = 100
target = np.random.randint(0, 2, n)
predicted = np.random.rand(n)

We randomly generated targets and predicted probability scores. Let’s check the result of sklearn.metrics.roc_auc_score:

import sklearn
sklearn.metrics.roc_auc_score(target, predicted)

np.float64(0.4277597402597403)

Our implementation should have the same score.

Trapezoid Area

First, let’s implement a helper function that finds the area of the trapezoid defined by two points (x0,y0) and (x1,y1).

To achieve this, we can add the area of the rectangle and the area of the right triangle, which is:

\( \begin{align} \text{Area}&=(x_1-x_0) \times y0+\frac{1}{2}(x_1-x_0) \times (y_1-y_0)\\ &= \frac{1}{2}(x_1-x_0) \times (2y_0+y_1 - y_0)\\ &= \frac{1}{2}(x_1-x_0) \times (y_0 + y_1)\\ \end{align} \)

We can express the formula with Python code:

def trapezoid_area(p0, p1):
    return (p1[0] - p0[0]) * (p0[1] + p1[1]) / 2.0

ROC AUC Score

Now our main implementation:

def roc_auc_score(target, predicted):
    n = target.shape[0]
    num_positive = np.sum(target == 1)
    num_negative = n - num_positive 
    # argsort in reverse order
    order = np.argsort(predicted)[::-1]
    last = [0, 0]
    num_true_positive = 0
    num_false_positive = 0
    score = 0
    for index in range(n):
        # Make sure that the new threshold is unique
        if index == 0 or predicted[order[index]] != predicted[order[index - 1]]:
            # True positive rate
            tpr = num_true_positive / num_positive
            # False positive rate
            fpr = num_false_positive / num_negative
            # New point on the ROC curve
            cur = [fpr, tpr]
            
            score += trapezoid_area(last, cur)
            last = cur
        
        if target[order[index]] == 1:
            num_true_positive += 1
        else:
            num_false_positive += 1
    score += trapezoid_area(last, [1, 1])

    return score

Let’s verify the result:

roc_auc_score(target, predicted)

np.float64(0.4277597402597403)

Nice, we got exactly the same result as sklearn.

It is better explained in the code, but roughly our algorithm is:

Sort items by their predicted scores, from largest to smallest
Process the sorted items one by one in a loop
1. Form the current point on the ROC curve by the ratios: (num_false_positive / num_negative, num_true_positive / num_positive)
2. Add the trapezoid area formed by the previous point and the current one
3. If the current item is positive, then increase num_true_positive by one
4. If the current item is negative, then increase num_false_positive by one

The End

I hope you enjoyed this post.

Madiyar's Page

Discussion about this post

Ready for more?