What are the recommendation or process to ingest images, OCR or documents? How can Paxata be leveraged? Any reference documents will be helpful.
Hi @BJ,
When using DataRobot Data Prep/Paxata for Image data there are two ways to ingest it described here in the documentation full description.
Put shortly, if you have more information then just images (and their classes) use a CSV file to containing the directory of the images their classes and supporting information on each row. If you just have image data, split the images by class in separate folders.
Besides images Data Robot accepted formats: .csv, .tsv, .dsv, .xls, .xlsx, .sas7bdat, .geojson, .gz, .bz2, .tar, .tgz and .zip, ingesting all of them is drag and drop as long as they are structured Relationally.
Other useful Links:
DataRobot University Course on Data Prep
DataRobot Data Prep Documentation
If this post answers your question feel free at accept as a solution to help others find the information.
All the best,
Ira