Recently, I ran my own POC to determine if I could use the new image classification capabilities of DataRobot to classify images sourced from audio files. I used sound files that were recorded in clinical settings for this proof of concept. Specifically, I took audio from patients with either normal or abnormal heartbeats and changed it into spectrograms (image files), then used DataRobot to classify the images (heartbeats) as Normal or Abnormal.
In this blog, I describe how to take sound data and make it ready for image classification. You can find the data for my POC here. You can also find code for this use case in this Community GitHub repo.
The images below illustrate the visual differences between spectrograms for a normal heartbeat versus an abnormal heartbeat (in this case, caused by a murmur).
Normal Heartbeat Spectrogram
Abnormal Heartbeat Spectrogram (murmur)
Create spectrograms of data for DataRobot
I only wanted to run this once to get the images, so I commented out the code. (Make sure to replace "/folder/" with the actual location of your WAV files.) Running this results in PNG files with the same filenames as the WAV files. You will find these in the same folder as the WAV files. This is the visual representation of your sound data.
Create a folder for your training data and create subfolders for each class. Move the images for each class into the correct subfolder for the related class. Compress the whole training folder into a ZIP file. Create a test folder and move the test images into that folder. You then compress that folder into a ZIP file as well. You can upload these zipped folders of images directly into DataRobot for training and testing.
Below is an example of what the training image file (and its subfolders) should look like before you zip it. Remember to create two ZIP files: one for the training dataset and one for the testing dataset.
Run the project in DataRobot
First, you set up the project similar to any other classification project.
Upload the zipped training file and type in “class” for the target.
You can look at the images before you run Autopilot as well!!
The Leaderboard populates in the same way as it does for other types of data. I decided to optimize on logloss for this classification problem.
Blueprint for the best model
The best model in this case was a tuned Light Gradient Boosted Trees Classifier.
Global Confusion Matrix for each class
The model did pretty well at identifying the classes. If I was a clinician, I would rather have a lot of false positives than false negatives, for the sake of pathology.
In the case of the “normal" heartbeat recording, you can see an F1 score of 0.91 and a very high recall (0.98) and precision (0.86).
In the case of the “murmur,” the heartbeat recording has an okay F1 score (0.69). The recall is around chance (0.59) and the precision is high (0.83).
In the case of the “extrasystole” heartbeat recording, the F1 score is (0.86), while the recall is (0.75) and the precision is (1).
I uploaded the zipped prediction file and calculated the results. Then, I downloaded them and renamed the dataset to “scores.csv.”
Need a Tip? DataRobot experts are putting together some helpful DataRobot usage tips for the platform, trial, features, etc. You can find these easily in the Tip of the Day board (under Read). Let us know if you've found a good one or have a good one to add!