With the onslaught of copyright infringement lawsuits being filed against OpenAI, Anthropic and other major generative AI providers for training models on proprietary data that is often being regurgitated verbatim back to users, there has been a corresponding surge in commentary on the treatment of AI by, in particular, US and EU copyright law. In this article, I hope to shed some much-needed light on the status of AI training under the copyright law of an often-forgotten part of the ‘rule-of-law world’, Australia. I write from the perspective of a data scientist who also holds an Australian law degree and who works in the legal tech industry.
There are two key issues that Australian copyright law raises for training AI. The first pertains to the procurement of copyrighted training data. The second concerns whether the very act of training AI on copyrighted data (as distinct from procuring data for that purpose) constitutes copyright infringement.
In brief, I find that Australian copyright law is not fit for purpose when it comes to both the procurement of copyrighted training data and the training of AI on copyrighted data. At least not if we want to see Australians training their own foundational models and Australian businesses disrupting Silicon Valley’s stronghold on AI innovation. While finding a solution to the problem is beyond the scope of this article, I urge policymakers to quickly adopt a new model for ensuring creators are adequately compensated while also enabling rather than stifling innovation, lest our burgeoning AI industry falls irreparably behind our international counterparts.