FAQs: Deploying Models (MLOps)

cancel
Showing results for 
Search instead for 
Did you mean: 

FAQs: Deploying Models (MLOps)

(Updated March 2021)

This section provides answers to frequently asked questions related to model deployment. If you don't find an answer for your question, you can ask it now; use Post your Comment (below) to get your question answered. 

Service Health

Data Drift

Accuracy

Predictions
Deployments / Governance

Service Health

Where can I see the prediction volume for my deployment?

You can see the Total Predictions in the Service Health tab of your deployment.

lhaviland_0-1617145943396.png

More information for DataRobot users: search in-app Platform Documentation for Service Health tab.

What is 'cache hit rate'?

Cache hit rate is used to account for the percentage of requests that used a cached model (i.e., a model that was recently used by other predictions). If not cached, DataRobot has to look up the model, which can cause delays. The prediction server cache holds 16 models by default and drops the least-used model when the limit is reached.

lhaviland_1-1617145943578.png

More information for DataRobot users: search in-app Platform Documentation for Service Health tab, then locate information for “Cache Hit Rate.”

Can I see statistics of my deployment, month-by-month?

Yes. Go to the Service Health tab of your deployment. Change the Resolution to “Monthly” and adjust the blue slider at the top to narrow down the time frame.

lhaviland_2-1617145943436.png

More information for DataRobot users: search in-app Platform Documentation for Service Health tab.

Does the response time in service health include the time to traverse the network?

The response time does not include the time due to network latency.

More information for DataRobot users: search in-app Platform Documentation for Service Health tab, then locate information for “Response Time.”

What is the difference between response time and execution time?

When an API call is made, the time it takes to get a response back could be called "total time." It is not possible to track this. Once the request is received by DataRobot servers, you can measure two things:

  • Response time, or how long was spent processing a prediction request (receiving the request and returning a response).
  • Execution time, or the time that was spent scoring a prediction request.

More information for DataRobot users: search in-app Platform Documentation for Prediction API (How to), or for Service Health tab and locate information for “Response Time” or “Execution Time.”

Data Drift

What is Data Drift? How is this different from Model Drift?

Data Drift refers to changes in the distribution of prediction data versus training data. Any alerts indicating Data Drift mean that the data you're making predictions on looks different from the data the model used to train. DataRobot uses PSI or "Population Stability Index" to measure this. (This is an alert that you want to look into; perhaps you need to retrain your model to better align with the new population.) Models themselves cannot drift: once they are fit, they are static. Sometimes the term "Model Drift" is used to refer to drift in the predictions, which would simply be an indication that the average predicted value is changing over time.

More information: https://www.listendata.com/2015/05/population-stability-index.html. More information for DataRobot users: search in-app Platform Documentation for Data Drift tab.

Why do I only see 7 features tracked by DataRobot in the Drift versus Importance graph (under the Drift section of the Deployment tab of my specific deployment), despite the documentation stating that DataRobot tracks the drift of the top 25 features? 

This graph plots the top 25 most impactful numerical, categorical, and text-based features. If your top 25 features include features of other variable types, such as percentages, dates,  or currencies, then DataRobot will not include them in the Feature Drift vs. Feature Importance plot. In a situation like this, you can end up with less than 25. 

lhaviland_3-1617145943402.png

More information for DataRobot users: search in-app Platform Documentation for Data Drift tab.

How can I monitor high-cardinality categorical features? Will DataRobot include all of them?

For categorical features, DataRobot tracks the top 25 most frequent values in the Feature Details chart. If your feature has more than 25 values, DataRobot bins them into the "Other" label.

Can I see feature drift for all variables in my model?

No. The Feature Drift vs. Feature Importance chart currently monitors the most impactful numerical, categorical, and text-based features in your data. Other features such as percentages and currency are not included. 

More information for DataRobot users: search in-app Platform Documentation for Data Drift tab, then locate information for the “Feature Drift vs. Feature Importance chart” section.

What are the thresholds for the red/yellow/green indicators on the Feature Drift? Can I change these?

By default, the drift threshold is 0.15. The Y-axis scales from 0 to the higher of 0.25 and the highest observed drift value. If you are the project owner, you can click the gear icon in the upper right corner of the chart to change these values. 

lhaviland_4-1617145943446.png

More information for DataRobot users: search in-app Platform Documentation for Data Drift tab.

What is Feature Drift vs. Feature Importance chart, and what is its value?

The Feature Drift vs. Feature Importance chart monitors the 25 most impactful numerical or categorical features, excluding text features, in your data. Use the chart to see if data is different at one point in time compared to another. Differences may indicate problems with your model or in the data itself.

lhaviland_5-1617145943473.png

More information for DataRobot users: search in-app Platform Documentation for Data Drift tab, then locate information for the “Feature Drift vs. Feature Importance chart” section.

Can I compare data drift for my newly deployed model vs the prior deployed model?

Yes. You can change the data drift display to analyze the current, or any previous, version of a model in the deployment.

lhaviland_6-1617145943393.png

Figure 1. Current data drift information for the model

lhaviland_7-1617145943448.png

Figure 2. Model drift information from selected timeframe

More information for DataRobot users: search in-app Platform Documentation for Data Drift tab, then locate information for the “Feature Drift vs. Feature Importance chart” section.

Accuracy

How can I tell whether the accuracy of my deployed model is deteriorating?

To monitor accuracy, predicted outcomes must be compared to actual outcomes. You can go to the Settings tab of your Deployment and upload the actuals data. 

lhaviland_8-1617145943451.png

Then, you can check the accuracy of a deployed model over time under the Accuracy tab.

lhaviland_9-1617145943418.png

 More information for DataRobot users: search in-app Platform Documentation for Accuracy tab.

Predictions

How do I make predictions on DataRobot models?

MLOps deployments support several methods for making predictions, depending on the use case.  

When hosting models on DataRobot prediction servers, predictions can be made in real-time or in batch mode for high volume datasets.

When the need arises to deploy models on infrastructure within your organization, several portable prediction options are available.  The Portable Prediction Server is the preferred choice for container-based REST predictions.  Java Scoring Code is also available for high-throughput or embedded predictions.

Can I see a sample of the API call to make predictions?

Yes, you can find this in the Predictions section of the Deployments tab for a specific deployment as shown in the two images below. After clicking the Predictions API tab, DataRobot lets you specify the prediction type as either Batch or Real-time and updates the Python snippet based on your selection.

Once you click through that, DataRobot shows some sample code for how to make API calls for scoring new data.

lhaviland_10-1617145943440.png

 More information for DataRobot users: search in-app Platform Documentation for Prediction API, and review the topic “How to: Prediction API.”

What data formats does the API support?

The data needs to be provided in a CSV or JSON file.

More information for DataRobot users: search in-app Platform Documentation for Prediction API (How to) or Prediction API Scripting Code.

What is the difference between Scoring Code JAR and DataRobot Prime?

Both are downloadable scoring code. Scoring Code JAR is not available for all models; however, if it is available, it allows you to download Java code that will match API predictions exactly. DataRobot Prime is a model that is run to approximate another model. It allows you to download Python or Java scoring code, but as this model is an approximation to another model, the predictions returned won't match exactly. 

DataRobot Prime is a good option when you need scoring code but the model you want to deploy doesn't support it. Although neither Scoring Code JAR nor DataRobot Prime give Prediction Explanations, you can access DataRobot Prime models from the Leaderboard.

More information for DataRobot users: search in-app Platform Documentation for Downloads tab, Scoring Code, DataRobot Prime tab, or Prediction Explanations.

In which languages can I get downloadable scoring code? What are the drawbacks?

With Scoring Code JAR you get Java, as either source or binary. With DataRobot Prime you get either Java or Python. Scoring Code JAR produces exact predictions for a model, but isn't available for all models. DataRobot Prime is an approximation to a model, but it can be used to approximate any model.

More information for DataRobot users: search in-app Platform Documentation for Scoring Code, DataRobot Prime tab, or DataRobot Prime examples.

Does DataRobot "remember" any feature/variable transformations performed within the UI when predicting?

Yes. When DataRobot is making predictions you only need to pass original columns with the original variable types. DataRobot remembers automated feature engineering as well as user-defined transformations.

More information for DataRobot users: search Platform Documentation for Models in Production and Feature transformations.

How can I keep my predictions from getting caught in a modeling queue?

Use a dedicated prediction server. When you submit a dataset to DataRobot through either the UI or via the R or Python DataRobot client, you are submitting to modeling workers. If those workers are busy building models, your prediction job will be queued. Since the prediction server only makes predictions and is sized according to your needs, it will virtually never have a queue.

More information for DataRobot users: search in-app Platform Documentation for Models in Production, or search for Standalone Prediction Server and then look for “Using the model transfer feature” information.

Deployments / Governance

Can I see a log of the changes that have been made to a deployment or to the underlying models?

Yes, this is displayed in the Overview section of the Deployments tab for a specific deployment as shown in the following image:

lhaviland_11-1617145943434.png

More information for DataRobot users: search in-app Platform Documentation for Replace model action.

How do I use a different model in a deployment? Are there any requirements to do this?

Use the Replace model functionality found in the Actions menu. The menu is available from both the Deployments tab:

lhaviland_12-1617145943429.png

and the Overview page of your deployment:

lhaviland_13-1617145943437.png

If the replacement model differs from the current model—because of either features with different names or features with the same name but different data types—then DataRobot issues a warning.

More information for DataRobot users: search in-app Platform documentation for Replace a deployed model.

What configurations do I need to make when deploying a model on the Leaderboard? Can I change them later?

1) Decide what the prediction threshold should be. This cannot be changed after the model is deployed. (Note this only applies to classification problems.)

2) Enable data drift tracking. This can be changed later.

3) Allow DataRobot to perform segment analysis of predictions. This can be changed later.

lhaviland_14-1617145943444.png

More information for DataRobot users: search in-app Platform Documentation for Deploy tab.

Labels (3)
Comments
adi
DataRobot Employee
DataRobot Employee

How do we monitor high-cardinality categorical features? E.g. If I have a categorical feature with more than 60 possible values, will the DataRobot interface be able to include all? @HweeTheng 

HweeTheng
Data Scientist
Data Scientist

For categorical features, DataRobot tracks the top 25 most frequent values in the Feature Details chart. All other values will be binned into "Other" label.

Version history
Last update:
‎03-31-2021 09:46 AM
Updated by:
Contributors