Are we protected from perturbation attacks?
This page is a fallback for search engines and cases when javascript fails or is disabled.
Please view this card in the library, where you can also find the rest of the plot4ai cards.
Are we protected from perturbation attacks?
- In perturbation style attacks, the attacker stealthily modifies the query to get a desired response.
- Examples:
- Image: Noise is added to an X-ray image, which makes the predictions go from normal scan to abnormal.
- Text translation: Specific characters are manipulated to result in incorrect translation. The attack can suppress a specific word or can even remove the word completely. Source: Microsoft, Threat Modelling AI/ML Systems and Dependencies.
- Random perturbation of labels is also a possible attack, while additionally there is the case of adversarial label noise (intentional switching of classification labels leading to deterministic noise, an error that the model cannot capture due to its generalization bias). Source: ENISA
If you answered No then you are at risk
If you are not sure, then you might be at risk too
Recommendations
Reactive/Defensive Detection Actions:
- Implement a minimum time threshold between calls to the API providing classification results. This slows down multi-step attack testing by increasing the overall amount of time required to find a success perturbation.
Proactive/Protective Actions:
- Develop a new network architecture that increases adversarial robustness by performing feature denoising.
- Train with known adversarial samples to build resilience and robustness against malicious inputs.
- Invest in developing monotonic classification with selection of monotonic features. This ensures that the adversary will not be able to evade the classifier by simply padding features from the negative class.
- Feature squeezing can be used to harden DNN models by detecting adversarial examples.
Response Actions:
- Issue alerts on classification results with high variance between classifiers, especially when from a single user or small group of users.
Source: Microsoft, Threat Modelling AI/ML Systems and Dependencies.
Interesting resources/references
- Microsoft, Threat Modelling AI/ML Systems and Dependencies
- Adversarially Robust Malware Detection Using Monotonic Classification
- Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training
- Attribution-driven Causal Analysis for Detection of Adversarial Examples
- Feature Denoising for Improving Adversarial Robustness
- Securing Machine Learning Algorithms, ENISA
- STRIDE-AI: An Approach to Identifying Vulnerabilities of Machine Learning Assets
- Stride-ML Threat Model
- MITRE ATLAS™ - Adversarial Threat Landscape for Artificial-Intelligence Systems