Can we confirm the legitimacy of the data sources that we need?

This page is a fallback for search engines and cases when javascript fails or is disabled.
Please view this card in the library, where you can also find the rest of the plot4ai cards.

Technique & Processes Category
Design PhaseInput Phase
Can we confirm the legitimacy of the data sources that we need?
  • Data lineage can be necessary to demonstrate trust as part of your information transparency policy, but it can also be very important when it comes to assessing impact on the data flow. If sources are not verified and legitimised you could run risks such as data being wrongly labelled for instance.
  • Do you know where you need to get the data from? Who is responsible for the collection, maintenance and dissemination? Are the sources verified? Do you have the right agreements in place? Are you allowed to receive or collect that data? Also keep ethical considerations in mind!

If you answered No then you are at risk

If you are not sure, then you might be at risk too

Recommendations

  • Develop a robust understanding of your relevant data feeds, flows and structures such that if any changes occur to the model data inputs, you can assess any potential impact on model performance. In case of third party AI systems contact your vendor to ask for this information.
  • If you are using synthetic data you should know how it was created and the properties it has. Also keep in mind that synthetic data might not be the answer to all your privacy related problems; synthetic data does not always provide a better trade-off between privacy and utility than traditional anonymisation techniques.
  • Do you need to share models and combine them? The usage of Model Cards and Datasheets can help providing the source information.