OpenAI has never revealed exactly which data it used to train Sora, its video-generating AI. But from the looks of it, at least some of the data might’ve come from Twitch streams and walkthroughs of games.
Sora launched on Monday, and I’ve been playing around with it for a bit (to the extent the capacity issues will allow). From a text prompt or image, Sora can generate up to 20-second-long videos in a range of aspect ratios and resolutions.
When OpenAI first revealed Sora in February, it alluded to the fact that it trained the model on Minecraft videos. So, I wondered, what other video game playthroughs might be lurking in the training set?
Sora also appears to have an understanding of what a Twitch stream should look like — implying that it’s seen a few. Check out the screenshot below, which gets the broad strokes right:
Another noteworthy thing about the screenshot: It features the likeness of popular Twitch streamer Raúl Álvarez Genes, who goes by the name Auronplay — down to the tattoo on Genes’ left forearm.