Importing from AWS S3

cancel
Showing results for 
Search instead for 
Did you mean: 

Importing from AWS S3

This article showcases how you can ingest data from an Amazon S3 bucket using DataRobot.

Identify the object

To start using an object saved in an S3 bucket, first navigate to the dataset you want to use. Then copy the object’s URL (Figure 1).

Figure 1. Identifying object URLFigure 1. Identifying object URL

Next, select AI Catalog from the DataRobot GUI.

Figure 2. DataRobot GUIFigure 2. DataRobot GUI

Now click Add to catalog and select “URL” (Figure 3).

Figure 3. URL add to catalog optionFigure 3. URL add to catalog option

In the URL box, paste the URL of the object and click Save. DataRobot will automatically read the data and infer data types and the schema of the data. Basically, this works the same as if you uploaded a CSV file from your local machine.

You can also ingest data into DataRobot from private S3 buckets. For example, a pre-signned S3 URL creates a temporary link that DataRobot can use to retrieve the file. One of the easiest ways to accomplish this is by using the AWS Command Line Interface (CLI). After the CLI has been installed and configured, a command similar to the following may be used:

(base) mike:titanic mike.taveirne$ aws s3 presign --expires-in 600 s3://bucket-name/path/to/file.csv
https://bucket-name.s3.amazonaws.com/path/to/file.csv?AWSAccessKeyId=<key>


The URL produced in this example will allow whoever has it to read the private file.csv from the private bucket bucket-name, and the signed link will be available for 600 seconds upon creation.

If you have your own DataRobot installation, you have the following additional options:

  • The datarobot service account that the application runs as can be provided IAM privileges to read private S3 buckets. DataRobot will be able to then ingest from any location specified within S3 that it has privileges to access.
  • S3 impersonation of the user logging in to DataRobot can additionally be implemented for more limited access to S3 data. This requires LDAP be used for authentication, with authorized roles for the user specified within LDAP attributes.

Both of the above options will accept an s3:// URI path.


Figure 4. Pasting URLFigure 4. Pasting URL

Initiating DataRobot Project

Now that your data has been successfully uploaded, you can click on Create project in the upper right corner (Figure 5).

Figure 5. Created AI Catalog tableFigure 5. Created AI Catalog table

Now you will be able to initiate a project as you would normally be able to through DataRobot.

More Information

If you’re a licensed DataRobot customer: search the in-app documentation for Non-catalog import methods, then locate more information in the section “Importing files from S3.”

Labels (4)
Version history
Revision #:
6 of 6
Last update:
‎04-03-2020 09:29 AM
Updated by:
 
Contributors