Welcome to Using Python with DataRobot!

cancel
Showing results for 
Search instead for 
Did you mean: 

Welcome to Using Python with DataRobot!

(Article updated November 2020)

This landing page will help you get started using the datarobot Python client package to interact with DataRobot. You can import data, build models, evaluate metrics, and make predictions right from the console.

There are several advantages to interacting with DataRobot programmatically:

  • You can set up a series of tasks and walk away while DataRobot and Python do the rest.
  • You can get more customized analyses by combining the power of Python with the various outputs you can get from DataRobot.
  • You can more easily reproduce prior results with code.

View the full documentation for the datarobot Python client package here.

Code samples

You can access code samples on our public DataRobot Community GitHub. This section lists the currently available samples and provides links to the related GitHub locations. For data scientists, there are two main code repositories for getting started: API Samples and Tutorials. The contents of both are listed below, but please visit the repos to get the latest details.

Usage

For each respective guide, follow the instructions in the related .ipynb or .Rmd file.

Also, please pay attention to the different DataRobot API Endpoints.

The API endpoint you specify for accessing DataRobot is dependent on the deployment environment, as follows:

The DataRobot API Endpoint is used to connect your IDE to DataRobot.

API Samples for Data Scientists

This repository contains Python notebooks for achieving specific tasks using the API.

AI Catalog

  • AI Catalog API Demo: How to create and share datasets in AI Catalog and use them to create projects and run predictions. Python

Advanced Tuning

  • Advanced Tuning: How to do advanced tuning. Python
  • Datetime Partitioning: How to do date/time partitioning. Python

Compliance Documentation

  • Getting Compliance Documentation: How to get Compliance Documentation documents. Python

Feature Lists Manipulation

  • Advanced Feature Selection: How to do advanced feature selection using all of the models created during a run of DataRobot Autopilot. Python
  • Feature Lists Manipulation: How to create and manipulate custom feature lists and use them for training. Python
  • Transforming Feature Type: How to transform feature types. Python

Helper Functions

  • Modeling/Python: A function that helps you search for specific blueprints within a DataRobot project repository and then initiate all of the models. Python
  • Time Series/Python: A set of custom functions for AutoTS projects (advanced settings, data quality, filling dates, preprocessing, brute force, cloning, accuracy metrics, modeling, project lists). Python

Initiating Projects

  • Starting a Binary Classification ProjectHow to initiate a DataRobot project for a Binary Classification target. Python
  • Starting a Multiclass ProjectHow to initiate a DataRobot project for a Multiclass Classification target. Python
  • Starting a Project with Selected BlueprintsHow to initiate a DataRobot project manually where the user has the option to choose which models/blueprints to initiate. Python
  • Starting a regression ProjectHow to initiate a DataRobot project for a numerical target. Python
  • Starting a Time Series ProjectHow to initiate a DataRobot project for a Time Series problem. This notebook also covers calendars and feature settings for time series projects. Python

Making Predictions

  • Getting Predictions and Prediction Explanations: How to get predictions and prediction explanations out of a trained model. Python
  • Scoring Big Datasets—Batch Prediction APIHow to use a DataRobot batch prediction script to get predictions out of a DataRobot deployed model. Python

Model Evaluation

  • Getting Confusion Chart: How to get the Confusion Matrix Chart. Python
  • Getting Feature Impact: How to get the Feature Impact scores.  Python
  • Getting Lift Chart: How to get the Lift Chart. Python
  • Getting Partial Dependence: How to get partial dependence. Python
  • Getting ROC Curve: How to get the ROC Curve data. Python
  • Getting SHAP Values: How to get SHAP values. Python
  • Getting Word Cloud: How to pull the word cloud data. Python
  • Plotting Prediction Intervals: How to plot prediction intervals for time series projects (single and multi series). Python

Model Management

  • Model Management and Monitoring: How to manage models through the API. This includes deployment, replacement, deletion, and monitoring capabilities. Python
  • Sharing Projects: How to share projects with colleagues. Python
  • Uploading Actuals to a DataRobot Deployment: How to upload actuals into the DataRobot platform in order to calculate accuracy metrics. Python

Tutorials for Data Scientists

This repository contains various end-to-end use case examples using the DataRobot API. Each use case directory contains instructions for its own use.

Classification

  • Lead Scoring for selling online courses: Predict who is likely to become a customer using a binary classification strategy. Create a custom feature list. Get the ROC Curve, Feature Impact and Feature Effects. Plot them for analysis, train your model, and make predictions. Python
  • Predict Hospital admissions: Predict which patients are likely to be readmitted within 30 days after being discharged using binary classification. Install the software, find your API token, choose the best model, get the evaluation metrics, and make predictions. Python
  • Predict COVID-19 at the County Level: Predict high risk counties with a look-alike modeling strategy. Build a binary classification model and rank each county by the probability of seeing cases. Set up the project, get evaluation and interpretability metrics, plot results, and get prediction explanations. Python.
  • Predict Medical Fraud: Predict fraudulent medical claims with binary classification. Connect to a SQL database, create a data store, write custom functions to build multiple projects, conduct anomaly detection, and deploy the model using the prediction server. Save the results for a custom dashboard. Python

DRU

API Training: The DataRobot API Training is targeted at data scientists and motivated individuals with at least basic coding skills who want to take automation with DataRobot to the next level. Python

Here you will be able to learn how to use the DataRobot API through a series of exercises that will challenge you, and teach you how to solve some of the most common problems that people run into.

Start by carefully reading the "API Training - Introductory Notebook."  Python This will help you learn the basics and provide a concrete overview for the API. Afterwards, go within the /exercises folder and start downloading and solving the exercises.

The list of exercises is as follows:

  • Exercise 1 Feature Selection Curves Python Python
  • Exercise 2. Advanced Feature Manipulation Python
  • Exercise 3. Model Documentation Python
  • Exercise 4. Beyond AutoPilot  Python
  • Exercise 5. Model Factory  Python
  • Exercise 6. Continuous Model Training  Python
  • Exercise 7. Using a Database Python

Model Factories

  • Classification Model Factory: Create a model factory for a binary classification problem using our readmissions dataset. Predict the likelihood of patient readmission. Build a single project and find the best model. Then build more projects based on admission ID. Find the best model for each sub-project. Make this model ready for deployment. Python
  • Time Series Model Factory: Create a time series model factory using our store sales multiseries dataset. Set up a time series multiseries project. Get the best model and its performance. Cluster the data and create plots over time. Create a project for each cluster and evaluate the results. Python

Model Management

  • Automated training and replacement of Models: Automatically retrain and replace models. An automated continuous training pipeline. Python/cURL
  • Monitoring Drift and replacing Models: Monitor your deployment for data drift and replace the model once criteria are met. Connect to a SQL server and create a data store. Create a project based on the data source. Deploy the recommended model and set up drift tracking. Upload and make predictions on a dataset with drift. Check the drift results and replace the model. Python

Multiclass

  • Multiclass one-vs-rest Modeling: Create a one-vs-rest model to do geophysical classification with 9 potential classes. Preprocess the data and split up the dataset. Use a loop to build nine projects and put the result into a dataframe. Then, get the predictions and plot them with an advanced visualization technique. Python
  • Predicting Product Type Based on Customer Complaints: Use the free text from customer complaints to predict which product the customers are addressing. Python

Out of Time Validation (OTV)

  • Predict C02 levels of Mauna Loa: Create an OTV project to predict C02 levels. This project trains on older data and then validates on newer data. In this use case, this strategy is useful because the scientists know that the data changes. Import your data, create lagged features, define date/time partitioning, select a model, and get Feature Impact. Python

Regression

  • Double Pendulum with Eureqa Models: Solve a regression problem using Eureqa blueprints. Eureqa makes no prior assumptions about the dataset, instead fitting models to the data dynamically. The models are presented as mathematical equations, so end users can seamlessly understand results and recommendations. Set up a manual mode project and select Eureqa blueprints from the repository. Advance tune the default model and print the mathematical expression. Python
  • Analyzing residuals to build better models: Use residuals created by DataRobot insights to evaluate your models and make them better. Python

Visual AI

  • Visual AI Heartbeats: Create a Visual AI project to classify images of sound. Heartbeats of people with normal and atypical heart conditions were recorded onto .wav files. This code shows you how to create spectrograms from the images and import them into DataRobot for Visual AI classification. Python

Anomaly Detection (Unsupervised Learning)

  • Anti-Money Laundering with Outlier Detection: Create an unsupervised model that can predict money laundering related transactions. Use a small set of labeled data to evaluate how the different models can perform. Python

Integrations

  • Database Connections and Writebacks: DataRobot provides a “self-service” JDBC product for database connectivity setup. Once configured, you can read data from production databases for model building and predictions. This allows you to quickly train and retrain models on that data, and avoid the unnecessary step of exporting data from your enterprise database to a CSV file for ingest to DataRobot. Python

Feature Discovery

  • Feature Discovery with Instacart Dataset: An example of how to use Feature Discovery through the Python API. Python

The following video shows an example walkthrough of the Python client.

Labels (4)
Comments
Blue LED

These notebooks are great!
This page is gonna have a dedicated spot on my bookmarks bar.

Version history
Revision #:
32 of 32
Last update:
a week ago
Updated by: