Continuing the trip into the land of accuracy measures, today we’ll visit the precision recall F-score trio.
In R, assuming that
y is a logical vector with true classes and
p is a logical vector with predictions, we have:
precision<-function(y,p) sum(y&p)/sum(p) recall<-function(y,p) sum(y&p)/sum(y)
While F-score is defined as a harmonic mean of precision and recall, it is usually calculated as such; fortunately we can do some algebra, getting Nicer, but there is also , so that which translates into this sweet one-liner:
Sometimes, especially in sparse cases (when there are few positive objects), we only have a vector of names, indices or ids of both real positives and positive predictions; then, the
%in% operator becomes very handy:
precisionS<-function(Y,P) mean(P%in%Y) recallS<-function(Y,P) mean(Y%in%P) FscoreS<-function(Y,P) mean(c(Y,P)%in%intersect(P,Y))
The latter is also a nice interpretation of F-score; one can say it measures to what extent the sets of positives and positive predictions overlap. To this end, it is obvious that F is invariant to true negatives (i.e., correctly predicted negative objects); one may add as many of them as one whishes, and F will stay the same, while measures like accuracy or AUROC will start to converge to 1. This may be either good and bad, depending on the use-case. Generally F is good for detection problems, where there are trivial negatives and one looks for an outliers or objects following certain pattern.
All of those three measures critically relay on proper labelling of classes, both in
p; trashy F value often means this is messed up.
In contrast to AUROC, there is no easy flipping formula; it depends on the dataset size and the fraction of positive objects.
You’ll find the code here.