Can we trace the provenance and lineage of the data used to train or fine-tune the AI model?

Data & Data Governance Category
Design PhaseInput Phase
Can we trace the provenance and lineage of the data used to train or fine-tune the AI model?

AI models require traceability of data sources to ensure ethical usage, reproducibility, and compliance. Without proper data lineage, it is difficult to verify the credibility and accuracy of training data.

If you answered No then you are at risk

If you are not sure, then you might be at risk too

Recommendations

  • Use data lineage tracking tools to monitor where data originates and how it is modified over time.
  • Implement metadata standards (e.g., Datasheets for Datasets) to ensure clear documentation of data sources.
  • Regularly audit data providers to verify their reliability and adherence to ethical guidelines.

Interesting resources/references