This repository explores finetuning the DINOv2 (Oquab et al., 2024) encoder weights using Low-Rank Adaptation (Hu et al., 2021) (LoRA) and a simple 1x

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2024-07-11 15:00:08

This repository explores finetuning the DINOv2 (Oquab et al., 2024) encoder weights using Low-Rank Adaptation (Hu et al., 2021) (LoRA) and a simple 1x1 convolution decoder. LoRA makes it possible to finetune to new tasks easier without adjusting the original encoder weights by adding a small set of weights between each encoder block. The DINOv2 encoder weights are learned by self-supervised learning and capture the natural image domain accurately. For example, by just applying PCA to the outputs of the encoders we can already get a coarse segmentation of the objects in the image and see semanticly similar objects colored in the same color.

Pascal VOC I achieve a validation mean IoU of approximately 85.2% using LoRA and a 1x1 convolution decoder. When applying ImageNet-C corruptions (Hendrycks & Dietterich, 2019) to test robustness on Pascal VOC, the validation mean IoU drops to 72.2% with corruption severity level 5 (the maximum). The qualitative performance of this network is illustrated in the figure below. Based on their qualitative and quantitative performance, these pre-trained weights handle image corruptions effectively.

You can use the pre-trained weights using the --lora_weights flag or just using the load_parameters function call. Registers here mean that extra context global context tokens are learned, see the second reference. All models are finetuned for a 100 epochs.

Leave a Comment