The free and open source developing world is abuzz with a new feature from GitHub called Copilot. This is a programming tool that has been trained usi

Is GitHub’s Copilot potentially infringing copyright?

submited by
Style Pass
2021-06-30 18:00:09

The free and open source developing world is abuzz with a new feature from GitHub called Copilot. This is a programming tool that has been trained using code from GitHub’s own corpus. For those familiar, GitHub is the world’s largest open source software repository, it has 40 million users and hosts 190 million repositories, of which 28 million are public repositories, according to Wikipedia.

On the face of it, Copilot looks impressive, it is using Codex, a machine learning program that was developed by OpenAI, and it was trained using an undisclosed amount of GitHub’s own code. According to Codepilot:

“GitHub Copilot is powered by OpenAI Codex, a new AI system created by OpenAI. It has been trained on a selection of English language and source code from publicly available sources, including code in public repositories on GitHub.”

So we don’t know what code they used, only that it is in the English language, and that it is a selection (this could be vital, more on that later). Copilot takes a code prompt, and it will suggest code that follows, almost like magic, but what is happening is most likely a statistical analysis of how likely certain code follows other code, sort of like your phone’s autocomplete, or the text-based GPT-3. The program can make a good guess of what should follow based on the combined knowledge of a large corpus of code.

Leave a Comment