In June, OpenAI teamed up with GitHub to launch Copilot, a service that provides suggestions for whole lines of code inside development environments like Microsoft Visual Studio. Powered by an AI model called Codex — which OpenAI later exposed through an API — Copilot can translate natural language into code across more than a dozen programming languages, interpreting commands in plain English and executing them.
Now, a community effort is underway to create an open source, freely available alternative to Copilot and OpenAI’s Codex model. Dubbed GPT Code Clippy, its contributors hope to create an AI pair programmer that allows researchers to study large AI models trained on code to better understand their abilities — and limitations.
Codex is trained on billions of lines of public code and works with a broad set of frameworks and languages, adapting to the edits developers make to match their coding styles. Similarly, GPT Code Clippy learned from hundreds of millions of examples of codebases to generate code similar to how a human programmer might.
The GPT Code Clippy project contributors used GPT-Neo as the base of their AI models. Developed by grassroots research collective EleutherAI, GPT-NEo is what’s known as a Transformer model. This means it weighs the influence of different parts of input data rather than treating all the input data the same. Transformers don’t need to process the beginning of a sentence before the end. Instead, they identify the context that confers meaning on a word in the sentence, enabling them to process input data in parallel.