We recognize that GPT-4o’s audio modalities present a variety of novel risks. Today we are publicly releasing text and image inputs and text outputs

Simon Willison’s Weblog

submited by
Style Pass
2024-05-15 19:30:09

We recognize that GPT-4o’s audio modalities present a variety of novel risks. Today we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities.

This is catching out a lot of people. The ChatGPT iPhone app already has image output, and it already has a voice mode. These worked with the previous GPT-4 mode and they still work with the new GPT-4o mode... but they are not using the new model’s capabilities.

Lots of people are discovering the voice mode for the first time—it’s the headphone icon in the bottom right of the interface.

They try it and it’s impressive (it was impressive before) but it’s nothing like as good as the voice mode in Monday’s demos.

Honestly, it’s not at all surprising that people are confused. They’re seeing the “4o” option and, understandably, are assuming that this is the set of features that were announced earlier this week.

Leave a Comment