I’ve been pretty quiet about ChatGPT and Bing for a number of reasons, the most pertinent of which is that I have so much more going on in my life right now.
But I think it’s time to jot down some notes on how I feel about Large Language Models (henceforth abbreviated to LLMs) and the current hype around them.
Plus the field is evolving so quickly that I’ve drafted this around four times–all the while progressively shrinking it it down to a quick tour over what I think are the key things to ponder.
Yes, typical outputs are vastly better than Markov chains, and there is a tendency to draw a rough parallel with running the probabilities for the next token through the LLM.
Like people like Tim Bray have pointed out, that is seriously underestimating the complexity of what is represented in model weights.
The reason why the Markov analogy breaks down is that LLM output is not probabilistic–there is randomness involved in setting up inference, sure, and sequential correlation between output tokens, but the factors driving the output are several dozens of orders of magnitude above what we were used to.