Is GitHub a derivative work of GPL'd software? July 4, 2021 on Drew DeVault's blog

submited by
Style Pass
2021-07-04 13:00:05

GitHub recently announced a tool called Copilot, a tool which uses machine learning to provide code suggestions, inciting no small degree of controversy. One particular facet of the ensuing discussion piques my curiosity: what happens if the model was trained using software licensed with the GNU General Public License?

The GPL is among a family of licenses considered “copyleft”, which are characterized by their “viral” nature. In particular, the trait common to copyleft works is the requirement that “derivative works” are required to publish their new work under the same terms as the original copyleft license. Some weak copyleft licenses, like the Mozilla Public License, only apply to any changes to specific files from the original code. Stronger licenses like the GPL family affect the broader work that any GPL’d code has been incorporated into.

A recent tweet by @mitsuhiko notes that Copilot can be caused to produce, verbatim, the famous fast inverse square root function from Quake III Arena: a codebase distributed under the GNU GPL 2.0 license. This raises an interesting legal question: is the work produced by a machine learning system, or even the machine learning system itself, a derivative work of the inputs to the model? Another tweet suggests that, if the answer is “no”, GitHub Copilot can be used as a means of washing the GPL off of code you want to use without obeying its license. But, what if the answer is “yes”?

Leave a Comment