Can the training data be linked to individuals?

This page is a fallback for search engines and cases when javascript fails or is disabled.
Please view this card in the library, where you can also find the rest of the plot4ai cards.

Privacy & Data Protection Category
Design PhaseInput PhaseModel PhaseOutput PhaseMonitor Phase
Can the training data be linked to individuals?
  • Do you need to use unique identifiers in your training or fine-tuning dataset? If personal data is not necessary for the model you would not really have a legal justification for using it.
  • Training datasets for LLMs may inadvertently include personal data, leading to potential privacy breaches. Even if direct identifiers are removed, indirect identifiers or quasi-identifiers can still enable re-identification. This poses risks under data protection regulations like the GDPR, especially if the data subjects have not provided explicit consent for their data to be used in this manner.

If you answered Yes then you are at risk

If you are not sure, then you might be at risk too

Recommendations

  • Unique identifiers might be included in the training set when you want to be able to link the results to individuals. Consider using pseudo-identifiers or other robust pseudonymization techniques that can help you protect personal data.
  • Document the measures you are taking to protect the data. Consider if your measures are necessary and proportional.