Note: This is a continuation of the previous post: Thoughts on Riemannian metrics and its connection with diffusion/score matching [Part I], so if you haven’t read it yet, please consider reading as I won’t be re-introducing in depth the concepts (e.g., the two scores) that I described there already. This article became a bit long, so if you are familiar already with metric tensors and differential geometry you can just skip the first part.
I was planning to write a paper about this topic, but my spare time is not that great so I decided it would be much more fun and educative to write this article in form of a tutorial. If you liked it, please consider citing it:
I’m writing this second part of the series because I couldn’t find any formalisation of this metric tensor that naturally arises from the Stein score (especially when used with learned models), and much less blog posts or articles about it, which is surprising given its deep connection between score-based generative models, diffusion models and the geometry of the data manifold. I think there is an emerging field of “data geometry” that will be as impactful as information geometry (where the Stein score “counterpart”, the Fisher information, is used to construct the Fisher information metric tensor, the statistical manifold metric tensor – a fun fact, I used to live very close to where Fisher lived his childhood in north London). It is very unfortunate though, that the term “Geometric Deep Learning” has become a synonymous of Deep Learning on graphs when there is so much more about it to be explored.
As you will see later, this metric tensor opens up new possibilities for defining data-driven geometries that adapt to the data manifold. One thing that became clear to me is that score-based generative models tells data how to move and data tells score-based generative models how to curve, and given the connection of score-based models and diffusion, this is a very exciting area to explore and where we can find many interesting connections and the possibility to use the entire framework of differential geometry, which is a quite unique perspective in how we see these models and what they are learning.