Suppose you train a neural net on cat pictures to classify the breed of cat. We desire the property that if we were to feed in a picture of a horse instead of a cat, we could somehow measure how good the network's parameters are for classifying this particular image. This is uncertainty estimation, and Yarin's blog post + thesis provides an elegant way to compute this, which get nearly for free from the existing model.
Concretely, if you are trying to train a neural net to forecast stock prices or drive a car safely, not only do you want to have predictions, but you want to estimate some measure of how confident your model is of that prediction. This is eminently useful for models that lean towards the "black-box" spectrum, such as deep neural nets.
I just started trying to learn maching learning from the ground up in my free time. I'm still trying to work out the Chernoff bound theorem. So you see how much of a noob I am.
But does this basically mean that I can have a model trained on only cat pictures and it can still tell me, with some measure of certainty, that the picture of the horse is not a cat, all without training the model to answer specifically "is this a cat?"
What's the verdict on this? Does Dropout do parameter uncertainty or risk estimation? Gal seems to be claiming the first, while the paper you linked claims the second.
This seems to be a novel application of dropout for uncertainty. The author's 2015 post linked by matheweis [0] gives an approachable walkthrough:
> I think that's why I was so surprised that dropout – a ubiquitous technique that's been in use in deep learning for several years now – can give us principled uncertainty estimates. Principled in the sense that the uncertainty estimates basically approximate those of our Gaussian process. Take your deep learning model in which you used dropout to avoid over-fitting – and you can extract model uncertainty without changing a single thing. Intuitively, you can think about your finite model as an approximation to a Gaussian process. When you optimise your objective, you minimise some "distance" (KL divergence to be more exact) between your model and the Gaussian process. I'll explain this in more detail below. But before this, let's recall what dropout is and introduce the Gaussian process quickly, and look at some examples of what this uncertainty obtained from dropout networks looks like.
Concretely, if you are trying to train a neural net to forecast stock prices or drive a car safely, not only do you want to have predictions, but you want to estimate some measure of how confident your model is of that prediction. This is eminently useful for models that lean towards the "black-box" spectrum, such as deep neural nets.
Note that parameter uncertainty and risk estimation are quite different, which are addressed in this preliminary work http://bayesiandeeplearning.org/papers/BDL_4.pdf