This repository contains lightweight training code for MusicGen, a state-of-the-art controllable text-to-music model. MusicGen is a single stage auto-

Search code, repositories, users, issues, pull requests...

submited by

Style Pass

2024-04-25 17:00:03

This repository contains lightweight training code for MusicGen, a state-of-the-art controllable text-to-music model. MusicGen is a single stage auto-regressive Transformer model trained over a 32kHz Encodec tokenizer with 4 codebooks sampled at 50 Hz.

The aim is to provide tools to easily fine-tune and dreambooth the MusicGen model suite on small consumer GPUs, with little data and to leverage a series of optimizations and tricks to reduce resource consumption. For example, the model can be fine-tuned on a particular music genre or artist to give a checkpoint that generates in that given style. The aim is also to easily share and build on these trained checkpoints, thanks to LoRA adaptors.

For the time being, we're installing transformers from source, but you won't have to do once the next version is released anytime soon.

Optionally, you can create a wandb account and login to it by following this guide. wandb allows for better tracking of the experiments metrics and losses.