You get your data into DataRobot through the GUI or with the API. This article explains how to achieve this in the GUI. (For examples on how to do this with the API, see the DataRobot Community GitHub.)
Through the GUI, there are five methods for getting data into DataRobot.
Data Source: Connect to a data source.
URL: Use a URL, such as an Amazon S3 bucket.
HDFS: You can connect to Hadoop.
Local File: You can upload a local file from your machine.
AICatalog: You can store, blend, and share your data through the AI catalog.
For AutoML projects:
Unless you are using images with Visual AI, the data must be in a flat-file, tabular format.
You must have a column that includes the target you are trying to predict.
For Time Series projects:
The data must be in a flat-file, tabular format.
You must include a date/time feature for each row.
You must have a row for each level of analysis in your data. For example, if you are doing a time series and are predicting 7 days in the future, then your data must have one day per row for the entire date range; similarly, if you are forecasting out 7 years, then your data must have one year per row for the entire date range.
You must have a column that includes the target that you are trying to predict.
You can find more information on time series setup here.
For Visual AI projects:
You can only solve classification problems with Visual AI.
You set up folders that contain images for each class and name the folder for that class. You then create a ZIP archive of that folder of folders and upload it to DataRobot.
You can also add tabular data if you include the links to the images within the top folder. You can find more information on that here.
Connecting to a Data Source
You can use almost anything with a JDBC connection to import data into DataRobot. You can find more details on this here.
Figure 1. Database
Importing Data with a URL
You can also use a URL, such as an Amazon S3 bucket. You can find more details on this here.
Figure 2. URL
Uploading a Local File
You can upload a local file by clicking on the LocalFile button, or by dragging and dropping a dataset into the browser.
Figure 3. Local File
You can use the AI Catalog to store, blend, and share your data. From anywhere in DataRobot, you can simply click the AI Catalog tab at the top of the browser to access the tool. Then, in the displayed page you can add a dataset using any of the previously described methods by clicking on the Addtocatalog button.
Figure 4. AI Catalog
If you click on an existing dataset, you can review information about that data. This includes the filename, filepath, number of rows and features, the create/modify dates, the owners, and the dataset ID.
Figure 5. Dataset Information
If you click on the Relationships tab, you can blend this data with another dataset. Simply click the + Add relationship link and select the secondary dataset from the catalog. Indicate the joining features in each dataset and the platform will add the relationship for you. This will allow you to use automated Feature Discovery among multiple datasets when you start your project. You can find more information on secondary datasets and automated Feature Discovery here.
Figure 6. Relationships
If your dataset changes over time, you can track VersionHistory and add important Comments.
FIgure 7. Version History
You can share your dataset with others in your organization using the Share button at the top right of the browser. Simply add their email address, indicate whether they will be able to share the data further, assign a role for their use of the data, and click Share. (You can also select to share to defined groups or organizations, if available for selection.)
FIgure 8. Sharing
While viewing information for a dataset in the AI Catalog, you can even create a new project. Simply click the Createproject button (at the top right of the browser).
Figure 9. Create Project
If you’re a licensed DataRobot customer, search the in-app documentation for Importing Data.