Chaitanya K. Joshi's excellent blog post on his PhD work inspired me to write down some thoughts on symmetries and invariances in neural network models—particularly in the context of molecular data. Discussions around removing architectural biases can lean a little absolutist ("Bitter Lesson, GPUs go brrr!"), whereas the reality is more nuanced. I am actually not so opposed to the idea of encoding physical priors, but I am interested in when such symmetries should be embedded directly into the model architecture, and when they might be better handled implicitly: through data augmentation, regularization or inference-time tricks.
This post represents some of my current thinking on the topic, through the lens of two modeling domains in which the approaches to encoding symmetries are very different: neural network potentials and diffusion models.
When modeling image data, there are many desirable invariances one can imagine (translation being the predominant one baked into the most common image model architecture, CNNs). Some of these invariances have questionable utility in practice, or do not match the statistics of a natural data distribution. Rotations of images fits this mold well - in theory, a classifier's ability to accurately predict an image attribute should not have too much to do with it's orientation. But requiring this invariance property removes a lot of information about the spatial distribution of colours and objects in images (skies are typically blue, and typically at the top of an image). If a model learns this bias from the data, is it incorrect?