Jukebox

submited by

Style Pass

2022-09-29 21:30:46

We’re introducing Jukebox, a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles. We’re releasing the model weights and code, along with a tool to explore the generated samples.

Automatic music generation dates back to more than half a century. A prominent approach is to generate music symbolically in the form of a piano roll, which specifies the timing, pitch, velocity, and instrument of each note to be played. This has led to impressive results like producing Bach chorals, polyphonic music with multiple instruments, as well as minute long musical pieces.

But symbolic generators have limitations—they cannot capture human voices or many of the more subtle timbres, dynamics, and expressivity that are essential to music. A different approach[1] is to model music directly as raw audio. Generating music at the audio level is challenging since the sequences are very long. A typical 4-minute song at CD quality (44 kHz, 16-bit) has over 10 million timesteps. For comparison, GPT-2 had 1,000 timesteps and OpenAI Five took tens of thousands of timesteps per game. Thus, to learn the high level semantics of music, a model would have to deal with extremely long-range dependencies.

One way of addressing the long input problem is to use an autoencoder that compresses raw audio to a lower-dimensional space by discarding some of the perceptually irrelevant bits of information. We can then train a model to generate audio in this compressed space, and upsample back to the raw audio space.

Jukebox

Leave a Comment

Related Posts

Recent Posts

Search code, repositories, users, issues, pull requests...

Qualcomm responds to benchmark cheating allegations — Snapdragon X Elite/Plus benchmarks claimed to be fraudulent (Updated)

Notes of an Aesthete

A Brief Note on the FTC Noncompete Rule - by Casey Muratori

Net neutrality is about to make a comeback

New research predicts peak groundwater extraction for key basins around the globe

AIL Framework 5.5 Released: New OCR Module for Images, Report Generator for Tracker Module, and Numerous Improvements.

Tokens for LLMs: Byte Pair Encoding in Go

Adding Bluetooth to Vintage Headphones – Part 2

Blip School - The Importance of STEM Education for Girls: Inspiring the Next Generation

50 Digital Wood Joints

TikTok reward-to-watch feature suspended after EU threats to block it

We may have spotted the first magnetar flare outside our galaxy

Computer Science > Computation and Language

The better way to measure cosmic time

Computer Science > Machine Learning

Defence mechanism - Wikipedia

Senators call on postal board to abandon DeJoy’s USPS reforms

Harvey Weinstein's 2020 rape conviction overturned by New York appeals court

Fintech CRED secures in-principle approval for payment aggregator license