Can we minimize the amount of personal data used while preserving model performance?

This page is a fallback for search engines and cases when javascript fails or is disabled.
Please view this card in the library, where you can also find the rest of the plot4ai cards.

Privacy & Data Protection CategoryData & Data Governance Category
Design PhaseInput PhaseModel PhaseMonitor Phase
Can we minimize the amount of personal data used while preserving model performance?

The principle of data minimization, as outlined in the General Data Protection Regulation (GDPR) and reflected in many global privacy standards, requires that only data necessary for achieving the system's purpose is collected and processed. However, reducing data too much can sometimes negatively impact the accuracy and performance of AI models, leading to critical or damaging consequences. Balancing regulatory compliance with operational effectiveness is essential to avoid undermining the model's reliability while adhering to privacy principles.

If you answered No then you are at risk

If you are not sure, then you might be at risk too

Recommendations

  • Achieve data minimization by starting with a smaller dataset and iteratively adding data only as needed, based on observed performance improvements, to justify why additional data is necessary.
  • Use high-quality data to reduce the need for large datasets while ensuring sufficient diversity and representativeness for your model.
  • Apply advanced privacy-preserving techniques such as pseudonymization, perturbation, differential privacy, federated learning, or synthetic data generation to comply with privacy regulations while using larger datasets.
  • Collaborate with experts to select the minimum set of features needed, ensuring relevance to the objective and avoiding issues like the Curse of Dimensionality, which can degrade model performance when unnecessary features are included.

Interesting resources/references