trees are harlequins, words are harlequins — How should we assess the performance of generative...

submited by
Style Pass
2022-06-23 16:00:10

How should we assess the performance of generative ML models like GPT-3 and DALLE-2?  What baseline should we compare them to?

Imagine taking someone from 1980, teleporting them into the present, and showing Google Images to them.  They’d be amazed, wouldn’t they?

Until very recently, no one had that kind of instant access to a world’s worth of pictures.  And not only can it show you a picture of almost anything, you can tell it what you want to see, and a picture of that thing will appear!

(At least sometimes.  Sometimes it doesn’t quite work, but even then, you can usually see what the machine was “thinking” when it chose the images it did.)

Indeed, most of the gap here was closed by Google Images, not by the generative models.  In 1980, you couldn’t access pictures of anything by typing descriptions.  If you wanted a picture of something, you had to go out into the physical world and hunt for it.

In 2015, you could type a description of almost any picture (that plausibly exists) into Google Images, and it would show you a picture like that.  From the vantage point of 1980, this is basically an image generator.

Leave a Comment