Showing results for 
Search instead for 
Did you mean: 

Getting Your Image Data Ready for Visual AI

DataRobot Employee
DataRobot Employee
11 6 6,669

Visual AI is a new capability within the Automated Machine Learning product that allows you to build models and to compute predictions against images. To better understand what Visual AI is, read our blog post introducing Visual AI.

This article focuses on how to get your image data prepared so that DataRobot can ingest it. Your goal is to end up with a single ZIP file that contains a CSV file for structured data plus folders for all of your training images. In some cases, you can even forgo the CSV file. Once prepared, you can load your image data directly into a new project or add your image data into DataRobot’s AI Catalog, to share with other users in your organization.

Using a CSV File with Images

This method relies on a CSV file to capture structured data; you can supply numeric, categorical, text, date features, including the target, as usual. Organize image data the way you want it (use folders, or not, up to you) as long as they are sitting next to the CSV file. In the CSV file, create one or several image fields, giving the path to the images. Then create a single ZIP file that contains your CSV file and images, and you’re done.

Shows an example CSV for skin lesion imagesShows an example CSV for skin lesion images

In the above dataset, you see that all images are in the same folder. The image column contains the names for the image file and the other columns represent other features in the dataset.

Here’s a quick example of the steps:

Let’s assume you have data about 600 articles of clothing. For each one, you know the brand, size, and category. You even have a text description and two pictures for each: one front image and one back image.

  1. Create a ClothesDataset folder.
  2. Add all the images to this folder. You can put images in a single subfolder (but it’s not mandatory). Or,  put them into two subfolders: one for front images, one for back images.
  3. Create a CSV file containing four columns: Brand, Size, Category, and Description.
  4. Add two columns to the CSV file for images: Front and Back. The Front column will contain the relative path to the Front image; the Back column will contain the relative path to the Back image.
  5. Create a ZIP file from the ClothesDataset folder.
  6. Drag and drop, or upload, your ZIP file into DataRobot.

(If you get the error "Zip files must contain exactly one file" then your DataRobot instance doesn't currently support Visual AI (e.g., it's an older version or doesn't have the Visual AI feature enabled). Contact your DataRobot representative for help.)

DataRobot will automatically identify and create a six-column dataset: four columns for Brand, Size, Category, and Description, two columns for images (Front and Back). From there, you’ll be able to let DataRobot build a model to predict the category from the brand, size, description, front, and back pictures of an item.

Shows how a dataset with an associated CSV looks like our AI Catalog after uploadShows how a dataset with an associated CSV looks like our AI Catalog after upload

Some other things to consider:

  • No whitespaces in file and folder names.
  • Use / (not \) for file paths.
  • Don’t add / at the beginning of the path.
  • JPG, PNG, BMP, and PPM file formats are supported.
  • Don’t zip the image subfolder.
Using File System Folders as Image Labels

In case your modeling task reduces to image classification, this is a simpler method to prepare your image dataset. Just create folders in your file system and name those folders how you want your images to be labeled in DataRobot. Then, put the right images in the right folders. That’s it! All you need to do after this is create a single ZIP file that contains your folders (and the images inside) and you’re done.

Here’s a quick example of the steps:

Let’s assume you have 300 images: 100 images of oranges, 100 images of apples, and 100 images of grapefruit.

  1. Create three folders: Orange, Apple, and Grapefruit.
  2. Drop your images into the correct folders depending on the type of fruit.
  3. Create a ZIP file in the parent directory of the three folders. The ZIP file will contain the three folders and the images inside.
  4. Drag and drop, or upload, your ZIP file into DataRobot.

Shows 2 folders: airplanes and helicopters. The file is simply the 2 folders zipped together.Shows 2 folders: airplanes and helicopters. The file is simply the 2 folders zipped together.

DataRobot will automatically identify and create a simple, three-column dataset: one for the label (Apple, Orange, Grapefruit), another for the image, and a third for the image path.

Shows the resulting dataset in AI CatalogShows the resulting dataset in AI Catalog

Preparing data for in-app predictions

In-app, you can compute predictions via the Make Predictions tab. Just prepare data the same way you did for the training dataset.

Preparing data for predictions via a dedicated prediction server

If you want to integrate predictions to your production environment, you’re probably better off deploying the Visual AI model first, then sending requests to the prediction server endpoint. You have to do so one prediction request at a time. Currently, it’s mandatory to encode images to the base64 format first. It’s the standard way to handle images via API calls. Here's an example script.

Alright, with this, you should be good to go and ready to kickstart your first Visual AI project with DataRobot. If any question should remain, please reach out below!

Snap Circuit


Thank you for the blog post. I've been looking forward to using Visual AI in DataRobot. Can you make one or more of the zipped data sets from your blog post downloadable? Thank you.

DataRobot Employee
DataRobot Employee

The Skin Lesion is available for download here, the Aircraft dataset here.

Great suggestion, @aballard , thanks!

Snap Circuit

Hi @Sylvain , I just put my images for prediction in a single folder and then zipped it. I used the API to make predictions. The problem is that when I output the pandas DataFrame for predictions, image names do not appear in any field and all prediction rows are unsorted with respect to the original zip order. Then I am unable to know which image correspond to which prediction. Would you know how to solve this? Thanks!

DataRobot Employee
DataRobot Employee

Hello @aaronuv ,

the team tells me it is possible to use the passthrough_columns argument, to add the image_file_path column to the resulting dataset. Indeed, when using a prediction dataset as a zip file, DataRobot adds an image_file_path column to the dataset.

Let me know if I can help further!

Snap Circuit


Thanks for your promptly response. I found that passthrough_columns is a parameter that can be used once model is deployed. Unfortunately, I m using a DR academic account and it does not allow for model deployment. If I m right, is there any other way to do this? Thanks.

DataRobot Employee
DataRobot Employee

@aaronuv ,

thanks for adding color to this. I suggest to include images to the csv file, via Base64 encoding. You can find some code to perform the encoding within this script. This will allow to handle this column like any other via the API.

Happy to help further if needed.


Welcome to the DataRobot Community!
We're so happy you're here! If you haven't already, check out helpful tips for navigating through the community, as well as guidelines to keep in mind while you're sharing and learning along with the rest of us. Have a look at the latest community newsletter.

Just getting started using the AI Cloud Platform Trial? Have a look at these basics and FAQs.

DataRobot Release 7.2 Is Here!
Learn about changes in DataRobot Release 7.2. You can learn more about AI Cloud and DataRobot 7.2 in the recent AI Cloud Launch Event. And, if you have questions about the release, ask them right here!