Everything You Need To Know About F1-score

2 minute read

Published: October 04, 2023

Introduction

The $F_1$ score is the harmonic mean of the percision and recall. It thus symmetrically represents both precision and recall in one metric.

The highest possible value of $F_1$ score is 1.0, indicating perfect precision and recall, and the lowest value is 0, if either precision or recall are zero.

Definition

Harmonic Mean

In mathematics, the harmonic mean is one of several kind of average. The harmonic mean $H$ of the positive real numbers $x_1, x_2,…, x_n$ is defined to be

\[H(x_1, x_2, ...., x_n) = \frac{n}{\frac{1}{x_1} + \frac{1}{x_2} + ... + \frac{1}{x_n}} = \frac{n}{\sum_{i=1}^{n}\frac{1}{x_i}}\]

Precision and Recall

Precision ($P$) is defined as number of true positives ($TP$) over the number of true positives plus the number of false positives ($FP$)

\[P = \frac{TP}{TP+FP}\]

Recall ($R$) is defined as number of true positives ($TP$) over the number of true positives plus the number of false negatives ($FN$)

\[R = \frac{TP}{TP+FN}\]

$F_1$ score is defined as the harmonic mean of precision and recall

\[F_1 = \frac{2}{\frac{1}{P} + \frac{1}{R}} = 2 \times \frac{P \times R}{P+R}\]

The relationship between recall and precision can be observed through Average Precision ($AP$)

\[AP=\sum_n(R_n-R_{n-1})P_n\]

where $P_n$ and $R_n$ are the precision and recall at the $n^{th}$ threshold. A pair $(R_k, P_k)$ is referred to as an operating point as figure below

$F_1$ in binary classification tasks

>>> from sklearn import metrics
>>> y_pred = [0, 1, 0, 0]
>>> y_true = [0, 1, 0, 1]
>>> metrics.precision_score(y_true, y_pred)
1.0
>>> metrics.recall_score(y_true, y_pred)
0.5
>>> metrics.f1_score(y_true, y_pred)
0.66...

$F_1$ in multiclass and multilabel classification tasks

Precision, Recall, $F_1$ can applied to each label independently. After, There are a few ways to combine results across labels, specified by the average arguments of scikit-learn

>>> from sklearn import metrics
>>> y_true = [0, 1, 2, 0, 1, 2]
>>> y_pred = [0, 2, 1, 0, 0, 1]
>>> metrics.precision_score(y_true, y_pred, average='macro')
0.22...
>>> metrics.recall_score(y_true, y_pred, average='micro')
0.33...
>>> metrics.f1_score(y_true, y_pred, average='weighted')
0.26...
>>> metrics.fbeta_score(y_true, y_pred, average='macro', beta=0.5)
0.23...
>>> metrics.precision_recall_fscore_support(y_true, y_pred, beta=0.5, average=None)
(array([0.66..., 0.        , 0.        ]), array([1., 0., 0.]), array([0.71..., 0.        , 0.        ]), array([2, 2, 2]...))

Conclusions

The $F_1$ score is often used in machine learning. However, the $F_1$ do not take True negatives $(TN)$ into account.

References

Share on

Twitter Facebook LinkedIn

Tuong Tran Ngoc

Everything You Need To Know About F1-score

Introduction

Definition

Harmonic Mean

Precision and Recall

$F_1$ in binary classification tasks

$F_1$ in multiclass and multilabel classification tasks

Conclusions

References

Share on

You May Also Enjoy

Vài dòng suy nghĩ - 2025

Hello, Kalapa

Core Probability

Code faster with VSCode