The last few years has seen copyright holders initiate suits for infringement against LLMs, generative AI chatbots, or those offering data sets (comprising of third party copyrighted data) free of charge.
One of the earliest such instance was Thomson Reuters v. Ross Intelligence[1], Thomson Reuters alleged that Ross Intelligence has violated the copyright in the content available on the plaintiff’s platform i.e. Westlaw, for training its research tool which is powered by artificial intelligence. Specifically, the plaintiff has claimed that the headnotes and the key numbers which are available to on its platform, constitute copyrightable material. While the matter is pending trial, the primary defence of the defendant remains fair use. In September 2023, the court in this matter came to a finding of actual copying by Ross Intelligence, however observed that the issue of whether or not such copying amounted to fair use must go to trial.
Read More+
It is relevant to note that fair use, as under the existing law in the United States is very different in scope and applicability when compared to its Indian counterpart i.e. fair dealing. Fair dealing has a far narrower scope of inquiry and also applies to very specific situations such as private or personal use, research or criticism, and reporting of a current event. If an Indian court were to come to a finding of actual copying, then perhaps the defence of fair dealing may not come to the defendant’s rescue given its limited scope, while there may be other arguments to be made.
In contrast, a German court took a different view on a similar issue in Robert Kneschke v. LAION e.V.[2]. This suit was brought by a photographer against an association that made datasets containing over 6 billion image-text pairs accessible for the public on a no-charge basis. In creating this data set, the defendant scrapped the web and copied images and automated a check on whether the text description corresponded to the image. As part of this process, the defendant downloaded all the images and therefore copied it, despite the plaintiff’s terms of us restricting the downloading or scrapping of its content. The court however, decided to dismiss the action of the plaintiff on the ground that the defendant is covered by the text and data mining exception, which is an exception to copyright infringement under the German law.
Text and data mining (TDM) is a process of deriving information from machine read material. Effectively, when large amounts of data are analysed, the information extracted identifies patterns and makes predictions basis the learnings from such patterns. Since the purpose of TDM is to identify patterns, an argument which emerges is that the use of copyrighted data for TDM is a non-expressive use of the copyrighted work i.e. while the work is being copied, the expressive aspect of the work is neither used nor communicated.
Several jurisdictions have adopted the TDM exception, with some modification and qualifications. For example, the UK permits TDM for the purpose of computational analysis for non-commercial research purposes.[3] The TDM exception in Japan is not explicitly limited for only research or non-commercial use, however, the Japanese law has a proviso which states that this exception would only be applicable if the action would not “unreasonably prejudice the interests of the copyright owner”. In the EU, TDM was introduced by the Copyright Directive 2019/790/EU, where TDM is defined as “any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations.” The applicability of TDM is subject to the qualifications that (i) the person [who is mining the data] had legitimate access to the content for the purpose of text and date mining; and (ii) the right holder or the copyright holder has not reserved for himself the right to exclusively mine from his data, and therefore has opted out from any mining of his copyrighted work for TDM. These exceptions to TDM do appear to balance the interest of the parties, by permitting TDM for LLMs, yet permitting the right holders to retain exclusivity should they choose to do so. However, insofar as the TDM is for scientific purposes by research organizations and cultural heritage protection institutions, there are no restrictions and this may be done for profit as well.
India, which does not have a TDM exception to copyright, is witnessing its first suit for copyright infringement initiated against an AI chatbot.[4] Some of the preliminary issues framed by the court include determination on whether the defendant’s use of the plaintiff’s copyright data qualifies as “fair use” and whether the use of the plaintiff’s copyright to train the AI model and to generate responses would amount to copyright infringement. The preliminary submissions by OpenAI suggest that it proposes adopting an opt-out model, however the extent of its arguments remain to be seen.
Footnote
[1] No. 1:20-cv-613-SB.
[2] Case No. 310 O 227/23.
[3] See, https://www.legislation.gov.uk/ukpga/1988/48/section/29A.
[4] ANI Media Pvt. Ltd. v. OpenAI Inc. & Anr., CS(COMM) 1028/2024.
This article was originally published in ET Legal World on 27 December 2024 Written by: Apoorva Murali, Partner. Click here for original article
Read Less-
Disclaimer
This is intended for general information purposes only. The views and opinions expressed in this article are those of the author/authors and does not necessarily reflect the views of the firm.
The Bar Council of India does not permit solicitation of work and advertising by legal practitioners and advocates. By accessing the Shardul Amarchand Mangaldas & Co. website (our website), the user acknowledges that:
Click here for important public notice from the Firm.