Do AI Programs Store or Memorize Personal Data? German Regulator Says "No"

There are several ongoing legal controversies relating to AI computer software models — such as Chatbot — and whether the training and output of such models violate copyright laws and data privacy laws and endanger personal and social freedoms. We wrote recently about the pending case of Andersen v. Stability AI, Ltd. (N.Dist. Cal.) involving whether AI-generated images infringe upon copyrights. Recently, the federal judge in the case allowed the case to proceed beyond the Motion To Dismiss phase — see here — because it was alleged that the AI program involved stored or contained compressed copies of billions of copyrighted images that had been downloaded and used for training. This was an allegation in the Amended Complaint that the judge was required to accept “as true.” Because this fact was “taken as true,” the court allowed the case to go forward on claims of direct infringement and induced infringement.

Relevant to this issue is the recently released report by a German regulator that AI models do NOT memorize or store personal data like names and birth dates. The regulator in question is the Hamburg Commissioner for Data Protection and Freedom of Information. The report itself involves personal data privacy rights but presents a factual finding that might be relevant to questions of copyright infringement.

The Hamburg Commissioner noted that AI generative software programs contain several interacting components, one of which is generally called a Large Language Model (“LLM”). These are used for text-generative AI programs, and similar components are used for image and video-generative AI models. The Hamburg Commissioner’s ultimate finding was that LLMs do not store or memorize personal data. Rather, “LLMs store highly abstracted and aggregated data points from training data and their relationships to each other, without concrete characteristics or references that “relate“ to individuals.” (p. 6). Because of this, LLMs are not storing “personal data” — as defined by EU personal and data privacy jurisprudence — because what is stored “lacks the necessary direct, targeted association to individuals….” In overly simplified terms, the LLMs store data that is disaggregated, abstracted, and disconnected. As such, there is no “personal data.”

Now, it must be said that the Hamburg Commissioner’s report is focused on a couple of very narrow questions: are LLMs engaged in the “processing” of personal data, and are they, themselves, subject to EU data privacy regulations? On that very narrow set of questions, the Hamburg Commissioner is suggesting that the answer is “no.” However, there is – or will be – a different answer when the whole AI program is considered since it is admitted that the output of the AI program “… may contain information relating to natural persons, especially if the prompt specifically asks for it.” Again, in overly simplified terms, the dis-aggregated data is re-aggregated, and that output generates information that identifies natural persons. That is “personal data” subject to EU privacy regulations.

In any event, it will be interesting to see how information and data are stored with respect to image-generated AI programs. The outcome of the various copyright cases may turn on the answer to that question.

Contact the AI, Internet Law, and Copyright Attorneys at Revision Legal

For more information, contact the experienced the AI, Internet Law, and Copyright Lawyers at Revision Legal. You can contact us through the form on this page or call (855) 473-8474.