Bayesian Neural Networks

submited by

Style Pass

2024-11-18 17:00:08

Bayesian inference allows us to learn a probability distribution over possible neural networks. We can approximately solve inference with a simple modification to standard neural network tools. The resulting algorithm mitigates overfitting, enables learning from small datasets, and tells us how uncertain our predictions are.

You may have heard deep neural networks described as powerful function approximators. Their power is due to the extreme flexibility of having many model parameters (the weights and biases) whose values can be learned from data via gradient-based optimization. Because they are good at approximating functions (input-output relationships) when lots of data are available, neural networks are well-suited to artificial intelligence tasks like speech recognition and image classification.

But the extreme flexibility of neural networks has a downside: they are particularly vulnerable to overfitting. Overfitting happens when the learning algorithm does such a good job of tuning the model parameters for performance on the training set—by optimizing its objective function—that the performance on new examples suffers. Deep neural networks have a ton of parameters (typically millions in modern models), which essentially guarantees eventual overfitting because the learning algorithm can always do just a little bit better on the training set by tweaking some of the many knobs available to it. The flexibility of neural networks during training time actually makes them brittle at test time. This might sound surprising at first, but let's look at the training procedure both mathematically and graphically (for a toy problem) to build some intuition around why deep neural networks overfit.