We're looking into an issue with broken attachments right now. Please stay tuned!
The express goal of the DataRobot Platform is to empower anyone to quickly and easily build AI applications and maintain them over time. The quality of the data you use to create those applications and then feed into them for predictions is critical to your overall success.
DataRobot’s AI Catalog is comprised of three key functions:
Data can be ingested into DataRobot from your local system, URLs, Hadoop (if deployed in a Hadoop environment), and via Data Connections to common databases and data lakes. A critical part of the data ingestion process is Exploratory Data Analysis (EDA). EDA actually happens twice within DataRobot, once when data is ingested and again once a target has been selected and modeling has begun. (For information about EDA and supported file types, see this community article, Importing Data Overview. (If you are a licensed customer, you can find more information in the in-app Platform Documentation by searching for Data connections, Overview and EDA, or Dataset requirements.)
DataRobot’s AI Catalog is a centralized collaboration hub for working with data and related assets. The AI Catalog allows you to seamlessly find, understand, share, tag, and reuse data. Data assets within AI Catalog can either be materialized “snapshots” of tables/views or be “dynamic," meaning that the whole dataset is only ingested from your data source when you create a modeling project from it, thus allowing you to work with the most up-to-date data. If the data is Snapshotted, those snapshots can be automatically refreshed periodically, and are also automatically versioned to preserve dataset lineage and enhance the overall governance capabilities of DataRobot. (If you are a licensed customer, you can find more information in the in-app Platform Documentation by searching for Load data/create projects.)
Data preparation plays a critical role in any AI/ML project. Raw data, directly from the source rarely is clean enough, has the correct unit of analysis, or is enriched enough to be useful as-is. DataRobot, in the true spirit of Automated Machine Learning, automates as much of the data cleaning and feature engineering as possible and does so in ways that are specific to each model type. Data enrichment is also easy within the DataRobot system, Feature Discovery can automatically join datasets and create new features for you.
However, there is a point where manual data preparation is needed. DataRobot currently offers two types of data preparation:
For more information on data preparation for Machine Learning best practices, please see: Best Practices for Building ML “Learning Datasets.”