In this post I'm looking into a paper the authors of which promise to make diffusion models simple to understand and implement, called Iterative α-(d

Iterative α-(de)blending and Stochastic Interpolants

submited by

Style Pass

2024-11-06 10:00:06

In this post I'm looking into a paper the authors of which promise to make diffusion models simple to understand and implement, called Iterative α-(de)blending1, and find out that this promise is only partially fulfilled, at least personally. I reproduce the algorithm from the paper and apply it to the generation of MNIST digits, like I did in the previous series of posts, and find out that something is missing. As the title of the post reveals, we might find the missing ingredient in Stochastic interpolants.

The authors of the paper, like many others, found the topic of diffusion models difficult to enter. Usually, diffusion models involve many difficult concepts used from probability theory, stochastic differential equations, etc., and the fact that diffusion models can be approached in so many different ways doesn't help either. This was also my motivation for writing the blog series in diffusion models — although I'm not sure I was able to make the topic much more approachable. Thus, they set out to derive a simple model using only bachelor-level concepts. So let's have a closer look!

As you might remember from the posts on diffusion models, the whole goal we are trying to achieve, is finding a mapping between two probability distributions, so that we can e.g. sample from a simple one, like a Gaussian, and map it onto a more complex one, like the distribution of MNIST digits, or a certain dataset of images. The distributions are generally complicated and many-dimensional, which makes it hard to visualize. To facilitate the discussion, I will give examples of distributions on the 2D plane. Thus, for example, we might have distributions \(p_0, p_1\), corresponding to a uniformly sampled triangle and rhombus, respectively. In the image above, you can see a depiction of a possible mapping between these two distributions, as lines or paths connecting samples from \(p_0\) and \(p_1\). The paper spends some time introducing a way for calculating this mapping deterministically as a series of two basic operations they call \(\alpha\)-blending and \(\alpha\)-deblending. As the name suggests, these two operations are inspired by the blending of two images, e.g. alpha blending, where the blended image is a result of linearly interpolating between two images based on their alpha values.