A multi modal starter kit that can have AI narrate a video or scene of your choice. Includes examples of how to do video processing, frames extraction

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2024-04-29 18:00:03

A multi modal starter kit that can have AI narrate a video or scene of your choice. Includes examples of how to do video processing, frames extraction, and sending frames to AI models optimally. Cost $0 to run.

We have a sample video in the assets directory that you can use to test the app. You can run the following command if you want to test the app with this video

By Default the app uses Ollama / llava for vision. If you want to use OpenAI Chatgpt4v instead, you can set INFERENCE_PLATFROM="OpenAI" and fill in OPENAI_API_KEY in .env

There are two ways to get Ollama up and running. You can either use Fly GPU, which provides very fast inference, or use your laptop.

When narrating a very long video, Upstash Redis is used for pub/sub and notifies the client when new snippets of reply come back. Upstash is also used for the critical task of caching video/images so the subsequent requests don't take long.

There is an example in the repo that leverages Inngest for workflow orchestration -- Inngest is especially helpful here when you have a long-running workflow and does automatic retries. Example code is in src/inngest/functions.ts.

Leave a Comment