Is data minimisation possible?
This page is a fallback for search engines and cases when javascript fails or is disabled.
Please view this card in the library, where you can also find the rest of the plot4ai cards.
Is data minimisation possible?
Although it appears to contradict the principle of data minimisation, not using enough data could sometimes have an impact in the accuracy and performance of the model. A low level of accuracy of the AI system could result in critical, adversarial or damaging consequences. Can you still comply with the data minimisation principle?
If you answered No then you are at risk
If you are not sure, then you might be at risk too
Recommendations
- Sometimes data minimisation can be achieved by using less features and training data that is of good quality. However it is not always possible to predict which data elements are relevant to the objective of the system.
- Consider to start training the model with less data, observe the learning curve and add more data if necessary, thereby justifying why it was necessary.
- The usage of a large amount of data could be compensated by using pseudonymisation techniques, or techniques like perturbation, differential privacy in pre-processing, use of synthetic data and federated learning.
- Try to select the right amount of features with the help of experts to avoid Curse of dimensionality (which means that errors increase with an increase in the number of features)
Interesting resources/references
- pag. 13 Artificial Intelligence and Data Protection How the GDPR Regulates AI
- Data Minimization for GDPR Compliance in Machine Learning Models: Methods like the one proposed in this paper can inspire you to find a way to mitigate the accuracy risk. They show how to reduce the amount of personal data needed to perform predictions, by removing or generalizing some of the input features.
- The answer to this post also contains information about this problems in different models: Does Dimensionality curse effect some models more than others?
- Towards Breaking the Curse of Dimensionality for High-Dimensional Privacy