Continuing the trip into the land of accuracy measures, today we’ll visit the precision recall F-score trio.
In R, assuming that
y is a logical vector with true classes and
p is a logical vector with predictions, we have:
precision<-function(y,p) sum(y&p)/sum(p) recall<-function(y,p) sum(y&p)/sum(y)
While F-score is defined as a harmonic mean of precision and recall, it is usually calculated as such; fortunately we can do some algebra, getting Nicer, but there is also , so that which translates into this sweet one-liner:
Sometimes, especially in sparse cases (when there are few positive objects), we only have a vector of names, indices or ids of both real positives and positive predictions; then, the
%in% operator becomes very handy:
The latter is also a nice interpretation of F-score; one can say it measures to what extent the sets of positives and positive predictions overlap. To this end, it is obvious that F is invariant to true negatives (i.e., correctly predicted negative objects); one may add as many of them as one whishes, and F will stay the same, while measures like accuracy or AUROC will start to converge to 1. This may be either good and bad, depending on the use-case. Generally F is good for detection problems, where there are trivial negatives and one looks for an outliers or objects following certain pattern.
precisionS<-function(Y,P) mean(P%in%Y) recallS<-function(Y,P) mean(Y%in%P) FscoreS<-function(Y,P) mean(c(Y,P)%in%intersect(P,Y))
All of those three measures critically relay on proper labelling of classes, both in
p; trashy F value often means this is messed up.
In contrast to AUROC, there is no easy flipping formula; it depends on the dataset size and the fraction of positive objects.
You’ll find the code here.