The Hutchinson trick

submited by
Style Pass
2024-11-21 10:30:04

Continuous normalizing flows (more in a blog post to come) heavily rely on the computation of the divergence of a network \(f: \bbR^d \to \bbR^d\), aka the trace of its Jacobian:

Computing this quantity would require evaluating the full Jacobian, which can be done at the prohibitive cost of \(d\) backpropagations (more details below). It turns out, the trace/divergence can be approximated reasonably well with a single call to backpropagation.

For that, let’s forget about Jacobians for a second and take a generic matrix \(A \in \bbR^{d \times d}\). The Hutchinson trick states that for any random variable \(z \in \bbR^d\) such that \(\bbE[zz^\top] = \Id_d\), \(\begin{equation}\label{eq:hutchinson} \tr(A) = \bbE_z [z^\top A z] \end{equation}\)

It is typically used for \(z\) having iid entries of mean zero and variance, classically standard Gaussian or Rademacher. The proof is very simple:

The numerical benefit is not obvious at first: if one has access to \(A \in \bbR^{d \times d}\), computing its trace costs \(\mathcal{O}(d)\), while the above formula requires \(\mathcal{O}(d^2)\) to compute \(z^\top A z\), not to mention multiple Monte-Carlo samples for the expectation. But there are cases in which one wants to compute the trace of a matrix, without having explicit access to said matrix!

Leave a Comment