It's been roughly 6 months since I started working on the first iteration of Mixreel. I've been regularly sharing progress on Bluesky, but this is first time I've put together a comprehensive update on what I've been doing since I started. I also wanted to record my thoughts about what I've found works well, and what I've found hasn't worked so well, and where I think my efforts should be focused.
Over the last 12 months, video generation models have gone from minor curiosity to potential must-have tool for visual creatives. Objects no longer randomly morph together or fly off randomly into the sky. Image-to-video models create genuinely plausible extensions of static scenes. Text-to-video models adhere reasonably faithfully to their prompts.
However, when working with these models, you're still mostly constrained to a single text prompt. It's difficult, if not impossible, to describe in words exactly how the camera should move, how the scene is composed, the lighting, or any of the other multitude of things that go into making a video.