Importing Data Overview

(Artlcle updated October 2020/Release 6.2.)

You get your data into DataRobot through the GUI or with the API.  This article explains how to achieve this in the GUI. (For examples on how to do this with the API, see the DataRobot Community GitHub.)

Through the GUI, there are five methods for getting data into DataRobot. 

  • Data Source: Connect to a data source.
  • URL:  Use a URL, such as an Amazon S3 bucket. 
  • HDFS:  You can connect to Hadoop. 
  • Local File:  You can upload a local file from your machine. 
  • AI Catalog: You can store, blend, and share your data through the AI catalog.

Important Considerations

For AutoML projects:
  • Unless you are using images with Visual AI, the data must be in a flat-file, tabular format.
  • You must have a column that includes the target you are trying to predict.
For Time Series projects:
    •  The data must be in a flat-file, tabular format. 
    • You must include a date/time feature for each row. 
    • You must have a row for each level of analysis in your data. For example, if you are doing a time series and are predicting 7 days in the future, then your data  must have one day per row for the entire date range; similarly, if you are forecasting out 7 years, then your data must have one year per row for the entire date range.  
    • You must have a column that includes the target that you are trying to predict. 
    • You can find more information on time series setup here
For Visual AI projects:
  • You can only solve classification problems with Visual AI. 
  • You set up folders that contain images for each class and name the folder for that class. You then create a ZIP archive of that folder of folders and upload it to DataRobot. 
  • You can also add tabular data if you include the links to the images within the top folder. You can find more information on that here

Connecting to a Data Source

You can use almost anything with a JDBC connection to import data into DataRobot. You can find more details on this here

Figure 1. DatabaseFigure 1. Database

Importing Data with a URL

You can also use a URL, such as an Amazon S3 bucket. You can find more details on this here

Figure 2. URLFigure 2. URL

Uploading a Local File

You can upload a local file by clicking on the Local File button, or by dragging and dropping a dataset into the browser.

Figure 3. Local FileFigure 3. Local File

AI Catalog

You can use the AI Catalog to store, blend, and share your data.  From anywhere in DataRobot, you can simply click the AI Catalog tab at the top of the browser to access the tool. Then, in the displayed page you can add a dataset using any of the previously described methods by clicking on the Add to catalog button. 

Figure 4. AI CatalogFigure 4. AI Catalog

If you click on an existing dataset, you can review information about that data. This includes the filename, filepath, number of rows and features, the create/modify dates, the owners, and the dataset ID. 


Figure 5. Dataset InformationFigure 5. Dataset Information

If you click on the Relationships tab, you can blend this data with another dataset. Simply click the + Add relationship link and select the secondary dataset from the catalog. Indicate the joining features in each dataset and the platform will add the relationship for you. This will allow you to use automated Feature Discovery among multiple datasets when you start your project.  You can find more information on secondary datasets and automated Feature Discovery here


Figure 6. RelationshipsFigure 6. Relationships

 If your dataset changes over time, you can track Version History and add important Comments


FIgure 7. Version HistoryFIgure 7. Version History

You can share your dataset with others in your organization using the Share button at the top right of the browser. Simply add their email address, indicate whether they will be able to share the data further, assign a role for their use of the data, and click Share. (You can also select to share to defined groups or organizations, if available for selection.)


FIgure 8. SharingFIgure 8. Sharing

While viewing information for a dataset in the AI Catalog, you can even create a new project. Simply click the Create project button (at the top right of the browser).  


Figure 9. Create ProjectFigure 9. Create Project

More Information

If you’re a licensed DataRobot customer, search the in-app documentation for Importing Data.

Version history
Revision #:
8 of 8
Last update:
‎10-30-2020 03:39 PM
Updated by:
 
Contributors