Hamburg DPA Launches GDPR Discussion Paper on Personal Data in LLMs

The Hamburg Commissioner for Data Protection and Freedom of Information (HmbBfDI) has published a discussion paper analyzing the General Data Protection Regulation (GDPR) applicability to Large Language Models (LLMs). This document aims to provoke discussion and assist companies and public authorities in managing the intersection of data protection law and LLM technology.

The paper clarifies the distinction between LLMs as AI models and as components of AI systems, in line with the AI Act effective from 2 August 2024. It concludes:

  1. LLMs and Data Processing:
    • The mere storage of an LLM does not constitute data processing per GDPR Article 4(2) because LLMs do not store personal data.
    • Personal data processed within an LLM-supported AI system must comply with GDPR requirements, particularly concerning output.
  2. Data Subject Rights:
    • As LLMs do not store personal data, GDPR-defined data subject rights (e.g., access, erasure, rectification) do not apply to the model itself. However, these rights can relate to the input and output managed by the AI system’s provider.
  3. Training Compliance:
    • Training LLMs with personal data must follow data protection regulations and uphold data subject rights. However, violations during the training phase do not affect the lawful use of the model within an AI system.

Practical Implications

The discussion paper outlines several practical implications:

  1. Unlawful Training:
    • If a third party unlawfully processes personal data during LLM training, it does not affect the legal deployment of the LLM by another entity.
  2. Data Subject Requests:
    • Individuals can request information, rectification, or erasure regarding the input and output of an LLM-based AI system, but not the LLM itself.
  3. Fine-Tuning LLMs:
    • Organizations should use minimal personal data for fine-tuning and prefer synthetic data when possible. Legal bases and data subject rights must be ensured if personal data is used.
  4. Local LLM Operation:
    • Storing LLMs locally is not a data protection issue, but AI systems must enable data subject rights and prevent privacy attacks.
  5. Third-Party LLMs:
    • When contracting third-party LLM providers, organizations must ensure GDPR compliance regarding input and output and clarify responsibilities (e.g., joint controllership).

You can find the paper here. If you want to look into the topic further, I recommend:

♻️ Share this if you found it useful.
💥 Follow me on Linkedin for updates and discussions on privacy, digital and AI education.
📍 Subscribe to my newsletter for weekly updates and insights – subscribers get an integrated view of the week and more information than on the blog.

Scroll to Top