Machine Learning as a field is very explicit about the research KPIs. Left: Deepmind’s Chinchilla, Right: Meta’s LLaMA
This text is my personal opinion, developed by researching publicly available sources such as research publications and rumors. I did not and do not work in any of the companies whose current or future products this text speculates about.
Intended audience: people with engineering experience or some basic ML knowledge who are interested in language modeling techniques that may have been selected for implementation by “GPT-4” authors from OpenAI. We need such speculation, because the authors have elected to keep the technical detail private, citing safety concerns and competitive landscape. The bulk of this text had been written in the first days of March, when actual capabilities of GPT-4 remained an enigma.
TL;DR of hypotheses developed in this post (note that some references postdate the GPT-4 pretraining run; but they either have precedents in literature, or are assumed to be known privately in advance of publication, as has been the case with chinchilla scaling law, for example)):