submited by

Style Pass

I am, once again, in a bit of a mood. And the only thing that will fix my mood is a good martini and a Laplace approximation. And I’m all out of martinis.

To be honest I started writing this post in February 2023, but then got distracted by visas and jobs and all that jazz. But I felt the desire to finish it, so here we are. I wonder how much I will want to re-write1

The post started as a pedagogical introduction to Laplace approximations (for reasons I don’t fully remember), but it rapidly went off the rails. So strap yourself in2 for a tour through the basics of sparse autodiff and a tour through manipulating the jaxpr intermediate representation in order to make one very simple logistic regression produce autodiff code that is almost as fast as a manually programmed gradient.

One of the simplest approximations to a distribution is the Laplace approximation. It be defined as the Gaussian distribution that matches the location and the curvature at the mode of the target distribution. It lives its best life when the density is of the form \[ p(x) \propto \exp(-nf_n(x)), \] where \(f_n\) is a sequence of functions3. Let’s imagine that we want to approximate the normalized density \(p(x)\) near the mode \(x^*\) . We can do this by taking the second order Taylor expansion of \(f_n\) around \(x=x_0\) , which is \[ f_n = f_n(x^*) + (x-x^*)^TH(x^*)(x-x^*) + \mathcal{O}((x-x^*)^3), \] where4 \[ [H(x^*)]_{ij} = \frac{\partial^2 f_n}{\partial x_i \partial x_j} \] is the Hessian matrix.

Read more dansblog.net...