Can we comply with the storage limitation principle?

This page is a fallback for search engines and cases when javascript fails or is disabled.
Please view this card in the library, where you can also find the rest of the plot4ai cards.

Non-compliance CategoryTechnique & Processes Category
Design PhaseInput PhaseModel PhaseOutput Phase
Can we comply with the storage limitation principle?
  • Do you know how long you need to keep the data (training data, output data, etc)?
  • Do you need to comply with specific internal, local, national and/or international retention rules for the storage of data?

If you answered No then you are at risk

If you are not sure, then you might be at risk too

Recommendations

  • Personal data must not be stored longer than necessary for the intended purpose. (art.5 e GDPR). In order to comply with this principle it is important to have a clear overview of the data flow during the life cycle of the model.
  • You might receive raw data that you need to transform. Check what are you doing with this data and all the different types of input files you might be receiving/collecting.
  • Check if you need to store that data for quality and auditing purposes.
  • Check where are you going to store the data from the data preparation, the training and test sets, the outputs, the processed outputs (when they are merged or linked to other information), metrics, etc.
  • How long should all this data be stored? What type of deletion process can you put in place? And who will be responsible for the retention and deletion of this data?
  • Implement the right retention schedules when applicable. In case you might still need a big part of the data in order to feed the model, consider anonymising the data.
  • Deleting data from a trained model can be challenging to carry out (short of retraining the model from scratch from a dataset with the deleted data removed, but that is expensive and often infeasible). Note that through the learning process, input data are always encoded in some way in the model itself during training. That means the internal representation developed by the model during learning (say, thresholds and weights) may end up being legally encumbered as well. Source: BerryvilleiML