MLOps Models in Production: Drift Tracking Notifications

cancel
Showing results for 
Search instead for 
Did you mean: 

MLOps Models in Production: Drift Tracking Notifications

Your model is in production—great! How is it performing? This is a larger question than just producing scoring responses quickly. Models (built and hosted by DataRobot, or built and/or hosted outside of DataRobot) can be tracked for decay over time via metrics like data drift and target drift, in addition to others. A deployment's drift can be reviewed in the DataRobot GUI. In the following example, the model is a binary classifier predicting whether Prosper D grade loans will be repaid.

Screen Shot 2021-04-11 at 12.09.40 AM.png

The left graph (Feature Drift vs. Feature Importance) is showing the most important features of the model vs. Population Stability Index (PSI) of the scoring data, compared to the data the model was originally trained on. This deployment is still happily scoring requests, thanks to its API; however, it is clear the world has changed, the model is stale, and a new model should be trained and deployed for this use case. The metric shown above is the Debt-to-Income Ratio (i.e., how much debt a borrower is carrying). As shown, it’s impacting their ability to pay, and clearly the scoring dataset has skewed toward borrowers carrying much more debt than those borrowers on which the model was originally trained. What if we have thousands of models in production, or if we simply don't want to log in and manually review each deployment on the monitoring dashboard?

Monitoring Notifications

Each deployment can have monitoring notifications configured for various metrics. For Data Drift, the frequency, comparison periods, PSI & importance thresholds, and number of features can all be leveraged as part of the notification process.

Screen Shot 2021-04-11 at 12.23.56 AM.png

An email alert will be sent to the user whose notifications have been configured on a deployment when meeting the criteria set above. As a best practice, models being scored through in production pipelines are typically set up with a service account, leveraging a group email distribution list for interested parties. In this way a single, central notification can be set up to alert multiple people and processes.

Custom Notifications

What if more complicated logic is required? The notification configuration above can only handle so much complexity; more complicated business logic representing subject matter expert (SME) understanding of related features may be of greater value. Applying advanced logic could reduce the noise from false positives and ensure that notifications are created only when meaningful observations are identified for further action. The DataRobot API can be leveraged to collect data and apply more sophisticated logic, and/or notification integrations beyond just emails (such as messaging a Slack channel, creating a new project to train and deploy a new model via the Python SDK, etc.). The API provides additional drift metrics; options provided via the API include psi, kl_divergence, dissimilarity, hellinger, and js_divergence. A value of 0.2 or more for PSI is considered to be a significant population change. What follows is an example of some custom logic to trigger an alerting event. The following logic will be used:

  • The period reviewed for drift will be the last 120 days.
  • At least 250 records must have been scored.
  • The LISTING_TERM feature will be excluded for consideration in triggering an alert.
  • If two or more features hit a PSI threshold of 0.2 or more, trigger an alert.
  • If any feature has hit a threshold of 0.8 or more, trigger an alert.

 

import pandas as pd
import requests
import json
from pandas.io.json import json_normalize
from datetime import datetime, timezone, timedelta

# connectivity values
DR_API_TOKEN = 'FF____INSERT_API_TOKEN_HERE_______TlU9'
DR_MODELING_ENDPOINT = 'https://app.datarobot.com'
DR_MODELING_HEADERS = {'Content-Type': 'application/json', 'Authorization': 'token %s' % DR_API_TOKEN}

# deployment retrieval
DEPLOYMENT_ID = '5c341c008b7d654f'
DRIFT_METRIC = 'psi' # psi - Population Stability Index is the default
PAST_DAYS_TO_RETRIEVE = 120

START_TM = (datetime.now(timezone.utc).replace(microsecond=0, second=0, minute=0) - timedelta(days=PAST_DAYS_TO_RETRIEVE)).isoformat()

# get drift data from the deployment
params = {
    'limit': 100, 
    'metric': DRIFT_METRIC, 
    'start': START_TM
}

response = requests.get(
    url = DR_MODELING_ENDPOINT + '/api/v2/deployments/' + DEPLOYMENT_ID + '/featureDrift/',
    headers=DR_MODELING_HEADERS,
    params=params,
)

if response.status_code != 200:
    print('Request failed; http error {code}: {content}'.format(code=response.status_code, content=response.content))
else:
    df_features = pd.io.json.json_normalize(response.json()['data'])
    sample_size = df_features['sampleSize'].iloc[0]
    df_features = df_features[['name', 'featureImpact', 'driftScore']]

# apply custom alerting logic
MIN_SAMPLE_SIZE = 250
PSI_THRESHOLD = 0.2
EXCESSIVE_PSI_THRESHOLD = 0.8
IGNORE_FEATURES = ['LISTING_TERM']

# get list of features with significant drift
df_psi_drifted_features = df_features[(df_features['driftScore'] >= PSI_THRESHOLD)]

# remove features we do not want to alert on
df_psi_drifted_features = df_psi_drifted_features[~df_psi_drifted_features.name.isin(IGNORE_FEATURES)]

alert = 0
alert_message = DR_MODELING_ENDPOINT + '/deployments/' + DEPLOYMENT_ID + '/data-drift\n'

if sample_size >= MIN_SAMPLE_SIZE:
    if len(df_psi_drifted_features) >= 2:
        alert = 1
        alert_message += '\nAlert: 2 or more features have exceeded a threshold of ' + str(PSI_THRESHOLD)
    if len(df_psi_drifted_features[(df_psi_drifted_features['driftScore'] >= EXCESSIVE_PSI_THRESHOLD)]) > 0:
        alert = 1
        alert_message += '\nAlert: 1 or more features have exceeded an excessive threshold of ' + str(EXCESSIVE_PSI_THRESHOLD)
    
if alert == 1:
    alert_message += '\n\n' + str(df_features.sort_values(by=['driftScore'], ascending=False))
    print(alert_message)
    # take action, eg. send email, kick off new project for training a model replacement, etc.

 

This code is also available on the Community GitHub.

The trigger action to take is left to the developer; for example, the code could be scheduled to run every day or every week, or it could simply send an email if the alert is triggered. On AWS, you can use CloudWatch for scheduling, Lambda for running the code, and SES for sending an email.

Conclusion

The world is changing rapidly, along with the universe of scoring data that's being requested from deployed models. It’s risky to arbitrarily schedule times to refresh models, and even riskier to not plan for model refreshing or not monitor models at all. With DataRobot's drift tracking abilities, any model anywhere can be monitored. When drift is encountered, built-in tools or more sophisticated business logic can alert when an action must be taken to ensure the needs of the use case are still being met.

Labels (2)
Version history
Last update:
‎05-11-2021 05:04 PM
Updated by:
Contributors