Can we prevent concept and data drift?
Can we prevent concept and data drift?
- Data drift weakens performance because the model receives data on which it hasn’t been trained. It causes changes in the statistical properties of the input data distribution (e.g., feature distributions shift over time).
- With Concept drift, the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways causing accuracy issues. It causes changes in the relationship between input features and the target variable (e.g., customer behavior changes over time, impacting a predictive model).
If you answered No then you are at risk
If you are not sure, then you might be at risk too
Recommendations
- Implement robust monitoring tools to detect data and concept drift, and establish governance policies for regular data validation and model retraining.
- Select an appropriate drift detection algorithm and apply it separately to labels, model’s predictions and data features.
Interesting resources/references
- Data Drift vs. Concept Drift
- Characterizing Concept Drift
- Inferring Concept Drift Without Labeled Data
- Automatic Learning to Detect Concept Drift
- From concept drift to model degradation: An overview on performance-aware drift detectors
- Learning under Concept Drift: A Review
- Detect data drift (preview) on datasets