In the past decade, machine learning has begun to play an active role in our day-to-day lives. Some models drive our cars, complete emails for us, predict and diagnose our medical conditions, and even do our taxes. These models are deployed to generate real-time predictions and have to adhere to the highest levels of MLOps (Machine Learning Operations) standards to impact our lives positively. The teams overseeing these prediction models are tasked with constantly maintaining their quality – as poor predictions can significantly affect our day-to-day lives. This article highlights the challenges of drift in MLOps and explores best practices to detect and mitigate them.
What is Drift in Machine Learning?
Supervised machine learning models are typically trained on a finite amount of data called the training set. Once the model is deployed, two conditions must be satisfied for the model to continue generating great predictions. Firstly, any data the model encounters during deployment must have a similar distribution to the training data used to train the model. Secondly, the mathematical relationship between the features and the target during deployment must be similar to the relationship between the features and the target during training.
If either of the above assumptions breaks down, the model performance will suffer, and the predictions will no longer be valid. In other words, the machine learning model has drifted. A deployed model that has drifted may generate incorrect predictions and have severe real-world implications, including filing false tax returns and providing patients with incorrect medical diagnoses.
What Are Data Drift and Concept Drift?
When either of the assumptions above breakdowns, they result in a specific kind of model drift. The breakdown of the first assumption will result in a data drift, and the second one will result in a concept drift. Let’s understand these terms with the help of some examples.
1. Data Drift: A global realty company has trained and deployed a model to predict the pricing of residential properties based on various features, including the number of bedrooms. When the model was first trained, buyers were predominantly interested in Studios and 1BR apartments. The training data used for the model was hence skewed towards such properties. Recently, analysts at the company have noticed an unexplained surge in demand for 3BR and 4BR properties and are concerned that their deployed model may generate inaccurate predictions for these properties. This is an example of a data drift, where the data distribution going into the model has changed since the model was last trained.
2. Concept Drift: Concept drift can be understood using the example of a spam email detector. Not only can it be challenging to ascertain what constitutes a spam email, but over the years, spam emails have changed and evolved to circumvent spam filters. This can make models within the spam filters highly susceptible to concept drift – where the relationship between the data and the target is constantly changing. Detecting concept drift can be more challenging than detecting data drift, but both drifts pose risks to deployed machine learning models.
Figure 1. This figure shows visual representations of data drift and concept drift. Please note that concept drift can be much more subtle in real life than a cat becoming a dog.
Why is Detecting Drift Challenging?
This section summarizes the most common reasons why detecting drift can be challenging.
1. Unavailability of Ground Truths: One way to detect drift is to monitor the performance of the deployed model. Deployed models are capable of generating hundreds of predictions every second. Verifying the quality of these predictions in real-time can be very challenging. A machine learning model that sets the credit limit for a customer’s credit card may suffer from drift, but this may not be detected until a few months later when a customer defaults on their payment. Those few months can prove very costly for the credit card company.
2.Drift-Time Dependency: Another way to detect drift is to monitor the distribution of the incoming data to the deployed model. The distribution of the incoming data is compared with the original training data distribution. Statistically significant differences in the distributions may indicate drift. The challenge here is accumulating enough data points during deployment to perform these statistical tests confidently. Collecting more data takes more time, and in real-time prediction models catching drift quickly is critical to minimizing real-world impacts.
What Can You Do to Detect Drift?
Here are some best practices for detecting drift successfully. No single method can successfully detect drift at all times, but a combination of the following methods can detect drift under most circumstances.
1. Automated Checks on the Incoming Data: The data your deployed model encounters daily can be subjected to a series of automated checks to detect drift. Monitoring and logging summary statistics such as the mean, median, skew, and kurtosis can provide vital information on how the data changes over time. Statistical tests may also be performed frequently to compare the data with the original training data distribution. For live data being streamed, we recommend performing these checks on consecutive data windows. With these checks being done and logged regularly, triggers may be set up to warn your teams about changes to the data the model is encountering. Your team may then intervene to implement corrective measures.
2. Evaluating the Model against a Dynamic Validation Set: As discussed earlier, monitoring the deployed model’s performance can be challenging due to the unavailability of real-time ground truth labels. A validation set that is regularly maintained and updated with the latest ground truths can be used to overcome this problem and continuously monitor model performance partially. This method is not guaranteed to detect drift but can be used in conjunction with other methods to monitor drift. It also has to be mentioned that a drop in model performance may indicate problems other than drift.
3. External Indicators of Drift: Sometimes, you may detect the onset of drift using metrics other than your model performance or the summary statistics of your data. External business metrics and Key Performance Indicators (KPIs) that you have in place may be able to detect drift sooner. For example, an insurance company wanting to predict total property damage claims for a certain month may be able to anticipate drift by monitoring occurrences of natural disasters worldwide. They may need help to retrain their model to factor this in quickly, but sometimes anticipating poor predictions from the model due to drift can minimize real-world consequences due to drift.
A Drift Has Been Detected. What Should You Do Now?
1. Drift without Consequences: You may accurately detect drift in your system. However, this drift does not seem to impact the performance of your deployed model. In this case, it is entirely up to the team managing the model to decide if retraining is necessary. Retraining requires time and resources and depends on how critical the model is. Whether the drift is expected to be permanent – the team may choose to retrain the model with training data that is more representative to prevent the model performance from dropping in the future. Teams may also develop daily/weekly/monthly schedules to retrain their models. The frequency of retraining will depend on the dynamics of data the model is subject to and the model’s application. It is important to note that model retraining requires time and computing resources, and the resulting boost in performance may be marginal after all.
2. Drift with Consequences: In this case, the drift that has been detected is causing the model to generate incorrect predictions, thereby causing the model performance to suffer. Model retraining is almost always necessary to restore the model’s functionality when this happens. There are techniques to partially retrain the model using new, representative data. This may serve as a short-term solution, but a complete model retraining and deployment will be necessary for the long run.
Aimpoint Digital Is Here to Help You Detect and Manage Drift Effectively
At Aimpoint Digital, we are committed to helping all industries, big or small, to build, deploy and manage their machine-learning models in production. Our data science, data engineering, and MLOps experts are here to develop solutions to cater to your unique requirements.
If you would like us to evaluate your current infrastructure to start building and deploying production-level models for your organization, please contact us through the form below to speak with our experts.