1University of Toronto    2Temerty Centre for Artificial Intelligence Research and Education in Medicine    3Sunnybrook Research Insti

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2024-07-06 05:30:03

1University of Toronto    2Temerty Centre for Artificial Intelligence Research and Education in Medicine    3Sunnybrook Research Institute

This work presents SEE-2-SOUND, a method to generate spatial audio from images, animated images, and videos to accompany the visual content. Check out our website to view some results of this work.

You could also skip this section and run this entirely in a docker container, for which you can find the instructions in Run in Docker.

Evaluating the code (not inference though) requires the Visual Acoustic Matching (AViTAR) codebase. However, due to the many changes required to run AViTAR, you should install the codebase through a fork we host. Install this by running:

SEE-2-SOUND consists of three main components: source estimation, audio generation, and surround sound spatial audio generation.

In the source estimation phase, the model identifies regions of interest in the input media and estimates their 3D positions on a viewing sphere. It also estimates the monocular depth map of the input image.

Leave a Comment