Soundtrack for your favorite book in seconds

submited by
Style Pass
2024-10-13 21:30:03

The most important of the soundtrack generation is a sufficiently large database of music tracks described in natural language. Currently, the database contains 15103 mostly classical music pieces by 436 different composers. Expanding the database will be the main focus of the further development of this project.

The tracks descriptions are encoded as vector embeddings. When generating a „soundtrack“, an LLM is tasked to suggest what kind of music would be suitable for the book in question, and this description is transformed into embeddings as well. Since the LLM is provided only the name and author of the literary work, it is expected that this process will work well only for famous-enough books that are sufficiently represented in the training corpus. For less known or recent books, the LLM will be prone to hallucinations. In the future I plan to include the whole book content (e.g. Gemini would be more than capable for this challenge), similarly as in the illustrations part of this project – this would overcome the under-representation issue.

Leave a Comment