Getting Your Image Data Ready for Visual AI

DataRobot Employee
DataRobot Employee
10 2 2,114

Visual AI is a new capability within the Automated Machine Learning product that allows you to build models and to compute predictions against images. To better understand what Visual AI is, read our blog post introducing Visual AI.

This article focuses on how to get your image data prepared so that DataRobot can ingest it. Your goal is to end up with a single ZIP file that contains a CSV file for structured data plus folders for all of your training images. In some cases, you can even forgo the CSV file. Once prepared, you can load your image data directly into a new project or add your image data into DataRobot’s AI Catalog, to share with other users in your organization.

Using a CSV File with Images

This method relies on a CSV file to capture structured data; you can supply numeric, categorical, text, date features, including the target, as usual. Organize image data the way you want it (use folders, or not, up to you) as long as they are sitting next to the CSV file. In the CSV file, create one or several image fields, giving the path to the images. Then create a single ZIP file that contains your CSV file and images, and you’re done.

Shows an example CSV for skin lesion imagesShows an example CSV for skin lesion images

In the above dataset, you see that all images are in the same folder. The image column contains the names for the image file and the other columns represent other features in the dataset.

Here’s a quick example of the steps:

Let’s assume you have data about 600 articles of clothing. For each one, you know the brand, size, and category. You even have a text description and two pictures for each: one front image and one back image.

  1. Create a ClothesDataset folder.
  2. Add all the images to this folder. You can put images in a single subfolder (but it’s not mandatory). Or,  put them into two subfolders: one for front images, one for back images.
  3. Create a CSV file containing four columns: Brand, Size, Category, and Description.
  4. Add two columns to the CSV file for images: Front and Back. The Front column will contain the relative path to the Front image; the Back column will contain the relative path to the Back image.
  5. Create a ZIP file from the ClothesDataset folder.
  6. Drag and drop, or upload, your ZIP file into DataRobot.

(If you get the error "Zip files must contain exactly one file" then your DataRobot instance doesn't currently support Visual AI (e.g., it's an older version or doesn't have the Visual AI feature enabled). Contact your DataRobot representative for help.)

DataRobot will automatically identify and create a six-column dataset: four columns for Brand, Size, Category, and Description, two columns for images (Front and Back). From there, you’ll be able to let DataRobot build a model to predict the category from the brand, size, description, front, and back pictures of an item.

Shows how a dataset with an associated CSV looks like our AI Catalog after uploadShows how a dataset with an associated CSV looks like our AI Catalog after upload

Some other things to consider:

  • No whitespaces in file and folder names.
  • Use / (not \) for file paths.
  • Don’t add / at the beginning of the path.
  • JPR, PNG, BMP, and PPM file formats are supported.
  • Don’t zip the image subfolder.
Using File System Folders as Image Labels

In case your modeling task reduces to image classification, this is a simpler method to prepare your image dataset. Just create folders in your file system and name those folders how you want your images to be labeled in DataRobot. Then, put the right images in the right folders. That’s it! All you need to do after this is create a single ZIP file that contains your folders (and the images inside) and you’re done.

Here’s a quick example of the steps:

Let’s assume you have 300 images: 100 images of oranges, 100 images of apples, and 100 images of grapefruit.

  1. Create three folders: Orange, Apple, and Grapefruit.
  2. Drop your images into the correct folders depending on the type of fruit.
  3. Create a ZIP file in the parent directory of the three folders. The ZIP file will contain the three folders and the images inside.
  4. Drag and drop, or upload, your ZIP file into DataRobot.

Shows 2 folders: airplanes and helicopters. The Archive.zip file is simply the 2 folders zipped together.Shows 2 folders: airplanes and helicopters. The Archive.zip file is simply the 2 folders zipped together.

DataRobot will automatically identify and create a simple, three-column dataset: one for the label (Apple, Orange, Grapefruit), another for the image, and a third for the image path.

Shows the resulting dataset in AI CatalogShows the resulting dataset in AI Catalog

Preparing data for in-app predictions

In-app, you can compute predictions via the Make Predictions tab. Just prepare data the same way you did for the training dataset.

Preparing data for predictions via a dedicated prediction server

If you want to integrate predictions to your production environment, you’re probably better off deploying the Visual AI model first, then sending requests to the prediction server endpoint. You have to do so one prediction request at a time. Currently, it’s mandatory to encode images to the base64 format first. It’s the standard way to handle images via API calls. Here's an example script.

Alright, with this, you should be good to go and ready to kickstart your first Visual AI project with DataRobot. If any question should remain, please reach out below!

2 Comments
Snap Circuit

@Sylvain 

Thank you for the blog post. I've been looking forward to using Visual AI in DataRobot. Can you make one or more of the zipped data sets from your blog post downloadable? Thank you.

DataRobot Employee
DataRobot Employee

The Skin Lesion is available for download here, the Aircraft dataset here.

Great suggestion, @aballard , thanks!

Announcements
BIG NEWS: The DataRobot Community is getting a new look!
Here is some more information about our plans for Community 2.0. In an effort to provide you with fast, easy help for all of your DataRobot questions, we started by reorganizing content. You will continue to see updates as we move forward, but don’t worry—we will keep you updated over in Community News.

See the quick index for Knowledge Base and quick index for Learning Sessions to find links to some great learning content.


DataRobot 6.2 is Here!

Ready to see what some highlights of the latest DataRobot release? Have a look at DataRobot Release 6.2! And check out the the on-demand webinar of the 6.2 headline features.