This page is a fallback for search engines and cases when javascript fails or is disabled.
Please view this card in the library, where you can also find the rest of the plot4ai cards.
Have we protected our AI system against model sabotage?
Have we protected our AI system against model sabotage?
Model sabotage involves deliberate manipulation or damage to AI systems at any stage, from development to deployment. This can include embedding backdoors, altering model behavior, or exploiting vulnerabilities in training data, third-party tools, or infrastructure.
- For AI providers: Risks include compromised training datasets, malicious code in open-source libraries, or backdoors introduced during development.
- For AI deployers: Threats arise from integrating tampered models, using insecure APIs, or applying updates that introduce vulnerabilities.
If you answered No then you are at risk
If you are not sure, then you might be at risk too
Recommendations
- Implement strong security measures, including regular audits and penetration testing, to ensure the integrity of models and the platforms hosting them.
- Assess and monitor the security profile of third-party libraries, tooling, and providers to ensure they are not compromised.
- Develop and maintain a robust disaster recovery plan with explicit mitigation strategies for model sabotage scenarios.
- Use model inspection tools to detect backdoors and ensure that the model’s behavior aligns with its intended function.
- Incorporate supply chain security principles by verifying the authenticity and integrity of the components used in model development and deployment.
- Maintain strict version control to detect and prevent unauthorized changes to libraries or model artifacts.
- Implement anomaly detection systems to identify unusual usage patterns that may indicate attempted sabotage or exploitation.
Interesting resources/references
- Securing Machine Learning Algorithms, ENISA
- STRIDE-AI: An Approach to Identifying Vulnerabilities of Machine Learning Assets
- Stride-ML Threat Model
- MITRE ATLAS™ - Adversarial Threat Landscape for Artificial-Intelligence Systems
- An Effective and Resilient Backdoor Attack Framework against Deep Neural Networks and Vision Transformers