Let’s imagine a tribe of nomads, crossing a seemingly endless desert. They are far from their original settlement, heading towards a secret oasis, a route to which was only very recently rediscovered by their chief in an ancient, sacred scroll. The discovery of such an importance made the chief paranoid, so he burned the scroll, shared its contents with no-one and was compulsively changing directions of their march to conceal its true azimuth.
Suddenly, just a week before the end of the journey, the chief dies due to an unfortunate camel kick, leaving his tribe in a hopeless position: their water reserves are almost exhausted (so they can’t scan the entire week-of-journey circle around their current position) and their tradition forbids them to scatter around and let everyone try his own luck. Fortunately, a fraction of nomads, say , managed to look through the chief’s deception; some have observed the stars, some have managed to overhear him thinking aloud, some have been sensing the wind… Nevertheless, the nomad pride made those who don’t know the right direction claim that they are perfectly sure that some random direction is what they feel is right, while those how do know claim that their preferred direction is also a gut feeling rather than a proof that they were trying to outsmart their chief. Thus, they stay around their former chief corpse pointing around and arguing, while the precious time is running…
Is the tribe doomed then? Let’s consider some scenarios of how to proceed. Option one is to just throw a dagger and go wherever it will randomly point; option two is to select a random nomad and to follow his direction; option three is to vote by taking a vector average of all directions and go that way.
Let’s perform a simple simulation to see how they go, namely how far from the oasis are they after a week of journey in the agreed direction; because the world is not ideal, let’s also add a normal noise with to the predictions of nomads knowing the right direction. The code to reproduce is here, and here is the result (for sake of that plot, boxplot outliers are hidden): Naturally, the worse outcome is to go the very wrong way and end being twice as far as on the beginning; it is easy to show that baseline is on average weeks (with median ), not really good. Trusting random nomad is not much better up to , while voting immediately starts to pay off: with 500 nomads merely 5 per cent of them (25) knowing the right direction reduces the average distance to about 3 days, and for 30 per cent there is more than 3 in 4 chance that the final distance will be smaller than a single day. With a tribe of 5 000 nomads it’s even better – 5 per cent gives a single day median and 30 per cent practically guarantees that the oasis would be closer than that; consequently, it pays off to even trade lower for higher . It is also worth noticing that even for large an ensemble has substantial advantages; first of all it wipes the noise from the knowing nomads’ predictions, which disallows single nomad solution to converge to zero. Moreover, it stabilises the result; obviously even with a single nomad not knowing the right direction there is a change that he will be selected and show the opposite direction spoiling everything, ensemble approach basically guarantees to rule that out.
The mechanism of this is that random votes just have to cancel out because were independently, randomly generated in an uniform space giving no hook for bias; on the other hand the truth is only one and thus becomes amplified with each good vote, quickly overtaking the noise (simulation for and ):
This observation quickly suggests how to easily break this system; let’s imagine that we have one celebrity nomad who will influence other nomads to bias their random predictions towards his, and naturally has no idea where the oasis really is. Such correlation is indistinguishable from the impact of truth in this framework, and the final outcome becomes useless even when this bias is not substantial:
Probably the most known original analysis of such problem is the Condorcet’s theorem back from 1785, revolving from a popular question why does democracy work if people are mostly idiots. Unfortunately Condorcet could only prove that for uncorrelated idiots, which is painfully visible in the age of TV and filter bubbles 😉
In machine learning, the idea of voting form a base for a family of ensemble classifiers, including the very popular Random Forest. How so? Basically, it trivial to build a classifier that can perfectly predict virtually any given set, while it is fairly hard to confidently make it a robust representation of the truth rather than just an overfitted nonsense. Still, the nonsense part is not some kind of active insubordination from the model; it can only come from overgeneralising certain patterns, thus it is effectively guessing. Consequently, building a bunch of models and combining their predictions by voting should make the nonsense cancel itself out; the only catch is how to build independent models to avoid the celebrity effect. Turns out it is easy to achieve by spoiling the member models, either by randomly limiting the amount of training data or disturbing the optimisation process creating them, i.e., making it settle on somewhat suboptimal solutions. The most popular implementations of the first idea are bagging and random subpace; Random Forest uses a combination of both, and it turns out enough to transform a set of abysmal, always overfitting unpruned decision trees into something really useful.