Transfer learning and pre-training schemas for both NLP and Computer Vision have gained a lot of attention in the last months. Research showed that carefully designed unsupervised/self-supervised training can produce high quality base models and embeddings that greatly decrease the amount of data needed to obtain good classification models downstream. This approach becomes more and more important as the companies collect a lot of data from which only a fraction can be labelled by humans - either due to large cost of labelling process or due to some time constraints.
Here I explore SimCLR pre-training framework proposed by Google in this arxiv paper. I will explain the SimCLR and its contrastive loss function step by step, starting from naive implementation in PyTorch, followed by faster, vectorized one. Then I will show how to use SimCLR’s pre-training routine to first build image embeddings using EfficientNet network architecture and finally I will show how to build a classifier on top of it.
In general, SimCLR is a simple framework for contrastive learning of visual representations. It’s not any new framework for deep learning, it’s a set of fixed steps that one should follow in order to train image embeddings of good quality. I drew a schema which explains the flow and the whole representation learning process (click to zoom).