Is our data complete, up-to-date, and trustworthy?

Data & Data Governance Category
Design PhaseInput PhaseMonitor Phase
Is our data complete, up-to-date, and trustworthy?

Can you avoid the known principle of “garbage in, garbage out”? Your AI system is only as reliable as the data it works with.

If you answered No then you are at risk

If you are not sure, then you might be at risk too

Recommendations

  • Verify the data sources:
    • Is there information missing within the dataset?
    • Can we verify that our training and input data hasn’t been tampered with or corrupted?
    • Are we using datasets that are outdated or no longer reflect the current environment?
    • Are all the necessary classes represented?
    • Does the data belong to the correct time frame and geographical coverage?
    • Evaluate which extra data you need to collect/receive.
  • Carefully consider representation schemes, especially in cases of text, video, APIs, and sensors. Text representation schemes are not all the same. If your system is counting on ASCII and it gets Unicode, will your system recognize the incorrect encoding? Source: BerryVilleiML