Accelerate DataRobot AI with Data Prep

Showing results for 
Search instead for 
Did you mean: 

Accelerate DataRobot AI with Data Prep

DataRobot Data Prep empowers both novice users and experienced analysts to own and drive the end-to-end AI lifecycle: from accessing raw data to deploying trained ML models into production.

AI Catalog Integration

Data Prep’s built-in connector to the AI Catalog provides bi-directional integration for selecting relevant datasets from the AI Catalog or publishing “cleaned” results back to the AI Catalog. 

  1. Browse for datasets in the AI Catalog.
  2. Select one or more files with one-click ingestion into the Data Prep Library. (See here for details on loading data from a database.)
  3. Select Parse options, select Columns, and create metadata Tags during import.
  4. Export prepared AnswerSets directly to the AI Catalog.
  5. Schedule imports/export through Automated Project Flows.

(For more, here’s a quick overview of DataRobot Data Prep.)

Integrated Predict Tool

Data Prep enables you to directly run a prediction through a deployed machine learning model in real-time, while preparing your data.

  • Invoke DataRobot predictions after any step in your data preparation project.
  • Perform joins and transformations to prepare training and prediction data.
  • Interactively review training results and add post-processing steps.
  • Schedule predictions to run periodically, and operationalize using Automated Project Flows.

A Unique Approach to Data Preparation

DataRobot Data Prep is available in multiple cloud-hybrid deployment options. A range of user personas—data scientists, data engineers, and business analysts—can perform data preparation tasks by themselves, aided by Data Prep features such as:

  • Visually interactive user interface that presents data in familiar tabular or spreadsheet style with no coding required.
  • Embedded AI-powered techniques that automatically profile, cluster, and clean data to reduce the time needed to prepare and shape the data.
  • Modern, cloud-native architecture that enables users to work interactively, or in batch, with large volumes of data (as opposed to product-enforced samples that require multiple data prep iterations to deliver data that is clean and ready).
  • End-to-end governance through auto-recording of every data preparation action, and automatic versioning of every step and project to provide trusted data lineage.
  • Enterprise-wide collaboration for data science teams to jointly develop projects and reuse datasets or projects to accelerate data preparation.


DataRobot Data Prep enables users to prepare machine learning data in a secure and governed fashion. 

  • Automatic versioning and recording of every Data Prep step provides deep-lineage tracking and data transparency. 
  • Editing, reordering, reusing, and the ability to easily add steps lets you rapidly iterate the dataset creation process for ML projects.
  • Annotation and auto-versioning of Data Prep projects enables multiple users to work on a single project in a simultaneous and a collaborative fashion.
  • Ability to control permissions for datasets and projects at a user and tenant level.

Accelerated End-to-End AI Workflow

Data Prep integration accelerates the entire lifecycle, from ingesting raw data to deploying your model to production.

  • Ingest data from over 50 different data sources, including the AI Catalog.
  • Use Data Prep’s inbuilt profiling feature to understand your data.
  • Build Data Prep pipelines in an automated fashion using Project Flows.
  • Export your data to DataRobot AutoML Platform to build your AI/ML models.
  • Manage and maintain your ML models using DataRobot MLOps.

For more Data Prep resources in the community, see 

Labels (1)
Version history
Last update:
3 weeks ago
Updated by: