SingSong: Generating musical accompaniments from singing

submited by
Style Pass
2023-01-31 05:00:07

Chris Donahue*, Antoine Caillon*1, Adam Roberts*, Ethan Manilow, Philippe Esling1, Andrea Agostinelli, Mauro Verzetti, Ian Simon, Olivier Pietquin, Neil Zeghidour, Jesse Engel

We present SingSong, a system which generates instrumental music to accompany input vocals, potentially offering musicians and non-musicians alike an intuitive new way to create music featuring their own voice. To accomplish this, we build on recent developments in musical source separation and audio generation. Specifically, we apply a state-of-the-art source separation algorithm to a large corpus of music audio to produce aligned pairs of vocals and instrumental sources. Then, we adapt AudioLM---a state-of-the-art approach for unconditional audio generation---to be suitable for conditional ''audio-to-audio'' generation tasks, and train it on the source-separated (vocal, instrumental) pairs. To improve our system's generalization from source-separated training data (where the vocals contain artifacts of the instrumental) to isolated vocals we might expect from users, we explore a number of different featurizations of vocal inputs, the best of which improves quantitative performance on isolated vocals by 53% relative to the default AudioLM featurization. In a pairwise comparison with the same vocal inputs, listeners expressed a significant preference for instrumentals generated by SingSong compared to those from a strong retrieval baseline.

For our study, listeners are presented a pair of 10s vocal-instrumental mixtures, where the vocals are identical between the two mixtures and come from MUSDB18-test, and the instrumentals come from different sources (ground truth, our models, or baselines). Listeners are asked to indicate in which of the two mixtures do the instrumental accompaniments seem more musically compatible with the vocals.

Leave a Comment
Related Posts