cancel
Showing results for 
Search instead for 
Did you mean: 

Getting Your Image Data Ready for Visual AI

Sylvain
DataRobot Employee
DataRobot Employee
10 2 4,777

Visual AI is a new capability within the Automated Machine Learning product that allows you to build models and to compute predictions against images. To better understand what Visual AI is, read our blog post introducing Visual AI.

This article focuses on how to get your image data prepared so that DataRobot can ingest it. Your goal is to end up with a single ZIP file that contains a CSV file for structured data plus folders for all of your training images. In some cases, you can even forgo the CSV file. Once prepared, you can load your image data directly into a new project or add your image data into DataRobot’s AI Catalog, to share with other users in your organization.

Using a CSV File with Images

This method relies on a CSV file to capture structured data; you can supply numeric, categorical, text, date features, including the target, as usual. Organize image data the way you want it (use folders, or not, up to you) as long as they are sitting next to the CSV file. In the CSV file, create one or several image fields, giving the path to the images. Then create a single ZIP file that contains your CSV file and images, and you’re done.

Shows an example CSV for skin lesion imagesShows an example CSV for skin lesion images

In the above dataset, you see that all images are in the same folder. The image column contains the names for the image file and the other columns represent other features in the dataset.

Here’s a quick example of the steps:

Let’s assume you have data about 600 articles of clothing. For each one, you know the brand, size, and category. You even have a text description and two pictures for each: one front image and one back image.

  1. Create a ClothesDataset folder.
  2. Add all the images to this folder. You can put images in a single subfolder (but it’s not mandatory). Or,  put them into two subfolders: one for front images, one for back images.
  3. Create a CSV file containing four columns: Brand, Size, Category, and Description.
  4. Add two columns to the CSV file for images: Front and Back. The Front column will contain the relative path to the Front image; the Back column will contain the relative path to the Back image.
  5. Create a ZIP file from the ClothesDataset folder.
  6. Drag and drop, or upload, your ZIP file into DataRobot.

(If you get the error "Zip files must contain exactly one file" then your DataRobot instance doesn't currently support Visual AI (e.g., it's an older version or doesn't have the Visual AI feature enabled). Contact your DataRobot representative for help.)

DataRobot will automatically identify and create a six-column dataset: four columns for Brand, Size, Category, and Description, two columns for images (Front and Back). From there, you’ll be able to let DataRobot build a model to predict the category from the brand, size, description, front, and back pictures of an item.

Shows how a dataset with an associated CSV looks like our AI Catalog after uploadShows how a dataset with an associated CSV looks like our AI Catalog after upload

Some other things to consider:

  • No whitespaces in file and folder names.
  • Use / (not \) for file paths.
  • Don’t add / at the beginning of the path.
  • JPG, PNG, BMP, and PPM file formats are supported.
  • Don’t zip the image subfolder.
Using File System Folders as Image Labels

In case your modeling task reduces to image classification, this is a simpler method to prepare your image dataset. Just create folders in your file system and name those folders how you want your images to be labeled in DataRobot. Then, put the right images in the right folders. That’s it! All you need to do after this is create a single ZIP file that contains your folders (and the images inside) and you’re done.

Here’s a quick example of the steps:

Let’s assume you have 300 images: 100 images of oranges, 100 images of apples, and 100 images of grapefruit.

  1. Create three folders: Orange, Apple, and Grapefruit.
  2. Drop your images into the correct folders depending on the type of fruit.
  3. Create a ZIP file in the parent directory of the three folders. The ZIP file will contain the three folders and the images inside.
  4. Drag and drop, or upload, your ZIP file into DataRobot.

Shows 2 folders: airplanes and helicopters. The Archive.zip file is simply the 2 folders zipped together.Shows 2 folders: airplanes and helicopters. The Archive.zip file is simply the 2 folders zipped together.

DataRobot will automatically identify and create a simple, three-column dataset: one for the label (Apple, Orange, Grapefruit), another for the image, and a third for the image path.

Shows the resulting dataset in AI CatalogShows the resulting dataset in AI Catalog

Preparing data for in-app predictions

In-app, you can compute predictions via the Make Predictions tab. Just prepare data the same way you did for the training dataset.

Preparing data for predictions via a dedicated prediction server

If you want to integrate predictions to your production environment, you’re probably better off deploying the Visual AI model first, then sending requests to the prediction server endpoint. You have to do so one prediction request at a time. Currently, it’s mandatory to encode images to the base64 format first. It’s the standard way to handle images via API calls. Here's an example script.

Alright, with this, you should be good to go and ready to kickstart your first Visual AI project with DataRobot. If any question should remain, please reach out below!

2 Comments
aballard
Snap Circuit

@Sylvain 

Thank you for the blog post. I've been looking forward to using Visual AI in DataRobot. Can you make one or more of the zipped data sets from your blog post downloadable? Thank you.

Sylvain
DataRobot Employee
DataRobot Employee

The Skin Lesion is available for download here, the Aircraft dataset here.

Great suggestion, @aballard , thanks!

Announcements
Need a Tip?
DataRobot experts are putting together some helpful DataRobot usage tips for the platform, trial, features, etc. You can find these easily in the Tip of the Day board (under Read). Let us know if you've found a good one or have a good one to add!

DataRobot Release 7.1
Ready to learn about changes in the latest release? See the What's New in DataRobot Release 7.1? article, and the DataRobot Release 7.1 (on-demand) webinar. If you have questions about the release, you can ask them right here!

DataRobot + Zepl
The acquisition of Zepl and integration of its self-service data science notebook solution provides additional flexibility for data scientists who prefer to code. Jason's blog post provides an end-to-end DataRobot demo that uses Zepl notebooks. You can check out Zepl today.

New to DataRobot? Check out all the resources to help you get going quickly! See the quick index for Knowledge Base Resources and quick index for Learning Sessions to find links to some great learning content.