This tutorial will explain how to build, select, and deploy unsupervised learning models for anomaly detection. We’re going to use an anti-money laundering dataset with the goal of identifying transactions that are considered anomalous. Finding abnormalities or outliers within transactional data can often lead to the identification of fraud.
Within this anti-money laundering dataset, each row represents a transaction and each column represents a feature about that transaction (Figure 1).
Figure 1. Dataset
In addition to the transaction itself, we also have attributes about the transactions, including risk for demographics, customer spending, and other credit information. You can upload your dataset directly using a local file, or you can connect to a JDBC connection or use a URL, the AI catalog, or Paxata.
I'm just going to drag-and-drop this dataset (local file) to the DataRobot GUI. DataRobot initially does a first pass exploratory data analysis (EDA) that includes classifying feature types as well as providing summary statistics. If you scroll down once the data is uploaded, you can click on features to get a distribution of those fields (Figure 2). This is useful for exploring your data prior to modeling.
FIgure 2. Distribution for a feature
Unlike when creating supervised learning projects, with anomaly detection you are just going to leave the target blank and instead click the orange No target? link (below the What would you like to predict? field). This turns on unsupervised mode (Figure 3). Press the Start button to kick off the modeling process.
Figure 3. Unsupervised mode
You can click the Models tab to see the Leaderboard (Figure 4). The models have been ranked using an optimization metric called Synthetic AUC. This metric can help you determine which blueprint would be best suited for your use case. For example, if a model has a Synthetic AUC of 0.9, it is not correct to interpret that the model is correct 90% of the time. Instead, it simply means that model is likely to outperform one with a Synthetic AUC of 0.6.
Figure 4. Leaderboard
For a summary of anomaly results, select Insights tab > Anomaly Detection (Figure 5). Anomaly scores range between 0 and 1, with higher scores being more likely to be anomalous.
Figure 5. Insights tabIf you go back to the Models tab and select one of the models, the blueprint for that model is shown (Figure 6). You can use this blueprint to determine exactly how DataRobot built the model. If you click a box on any of these steps and follow the link, you see documentation that describes in detail what took place during that step, including the parameters as well as any reference material.
Figure 6. Model Blueprint
When you click the Understand tab, you find a number of evaluation tools. For example, Feature Impactmeasures how much a feature contributes to the overall accuracy of your model (Figure 7).
Figure 7. Feature Impact
In the example shown here, the number of credits for merchants in the past 90 days is a direct relationship to the transactional anomaly. This insight can be invaluable for guiding an organization to focus on what matters most.
Let’s look at the Feature Effects tab (Figure 8). This uses a model-agnostic approach called partial dependence to explain how features in the model are affecting whether or not a row is anomalous.
Figure 8. Feature Effects
In general, Feature Impact and Feature Effects describe how features are impacting your full model.
Prediction Explanations, on the other hand, explains the effects of features on individual records (Figure 9).
Figure 9. Prediction Explanations
You can see a sample of a row-by-row analysis that explains the anomaly score results. You can download these for all rows in your dataset.
These Prediction Explanations are really useful if you have an end user who needs to use the model but isn't a data scientist. For example, somebody who is looking out for transactional fraud could take a look at these Prediction Explanations and understand the reason for the score and critically evaluate that case within the entire context of the transaction.
Deploying the anomaly detection model
fter you select a model, the next step is to deploy it. Every DataRobot model is immediately ready for deployment. There are several ways to deploy an anomaly detection model.
You can simply go to the Predict tab > Make Predictions, upload a new dataset to DataRobot, and score it within the GUI.
If you go to Predict tab > Deploy, you can create a REST API endpoint to score the data directly from applications. An independent prediction server is available to support low latency, high throughput predictions requirements. (You can do that by adding a new deployment.)
The Predict tab > Hadoop option allows you to deploy to Hadoop.
If you create a deployment then you have additional functionality. You will be able to monitor service health and data drift on the Deployments tab, along with your other productionalized models (Figure 10). This allows you to proactively monitor and manage all deployed machine learning models and ensure accurate and consistent results.
Figure 10. Deployments
If you’re a licensed DataRobot customer, search the in-app Platform Documentation for Unsupervised learning (anomaly detection).