Continuing the trip into the land of accuracy measures, today we’ll visit the precision recall F-score trio.
In R, assuming that `y`

is a logical vector with true classes and `p`

is a logical vector with predictions, we have:

```
precision<-function(y,p) sum(y&p)/sum(p)
recall<-function(y,p) sum(y&p)/sum(y)
```

While F-score is defined as a harmonic mean of precision and recall, it is usually calculated as such; fortunately we can do some algebra, getting $F:=\frac{2}{\frac{1}{\text{precision}}+\frac{1}{\text{recall}}}=\frac{2\sum y\wedge p}{\sum p+\sum y}.$ Nicer, but there is also $\sum{p\vee y}=\sum p+\sum y-\sum y\wedge p$, so that $F=\frac{2}{1+\frac{\sum p\vee y}{\sum p\wedge y}},$ which translates into this sweet one-liner:

`Fscore<-function(y,p) 2/(1+sum(y|p)/sum(y&p))`

Sometimes, especially in *sparse* cases (when there are few positive objects), we only have a vector of names, indices or ids of both real positives and positive predictions; then, the `%in%`

operator becomes very handy:

```
precisionS<-function(Y,P) mean(P%in%Y)
recallS<-function(Y,P) mean(Y%in%P)
FscoreS<-function(Y,P) mean(c(Y,P)%in%intersect(P,Y))
```

The latter is also a nice interpretation of F-score; one can say it measures to what extent the sets of positives and positive predictions overlap.
To this end, it is obvious that F is invariant to true negatives (i.e., correctly predicted negative objects); one may add as many of them as one whishes, and F will stay the same, while measures like accuracy or AUROC will start to converge to 1.
This may be either good and bad, depending on the use-case.
Generally F is good for All of those three measures critically relay on proper labelling of classes, both in `y`

and `p`

; trashy F value often means this is messed up.
In contrast to AUROC, there is no easy *flipping formula*; it depends on the dataset size and the fraction of positive objects.

You’ll find the code here.

Care to share? Please use this permanent link.

Previous post: Augh-ROC, full post list.