cancel
Showing results for 
Search instead for 
Did you mean: 

Image classification on audio? Yes.

emily
Data Scientist
Data Scientist
1 0 1,331

Recently, I ran my own POC to determine if I could use the new image classification capabilities of DataRobot to classify images sourced from audio files. I used sound files that were recorded in clinical settings for this proof of concept. Specifically, I took audio from patients with either normal or abnormal heartbeats and changed it into spectrograms (image files), then used DataRobot to classify the images (heartbeats) as Normal or Abnormal.

In this blog, I describe how to take sound data and make it ready for image classification. You can find the data for my POC here. You can also find code for this use case in this Community GitHub repo.

The images below illustrate the visual differences between spectrograms for a normal heartbeat versus an abnormal heartbeat (in this case, caused by a murmur).

Normal Heartbeat Spectrogram

normal1Heart.png

Abnormal Heartbeat Spectrogram (murmur)

abnormalHeart1.png

Create spectrograms of data for DataRobot

I only wanted to run this once to get the images, so I commented out the code. (Make sure to replace "/folder/" with the actual location of your WAV files.) Running this results in PNG files with the same filenames as the WAV files. You will find these in the same folder as the WAV files. This is the visual representation of your sound data.

 

 

#spectrogramFolder("/folder/", htmlPlots = TRUE, verbose = TRUE, step = NULL, overlap = 50, wn = "gaussian",
# zp = 0, ylim = NULL, osc = TRUE, xlab = "Time, ms",
# ylab = "kHz", width = 900, height = 500, units = "px",
# res = NA) 

 

 

Set up folders

Create a folder for your training data and create subfolders for each class. Move the images for each class into the correct subfolder for the related class. Compress the whole training folder into a ZIP file. Create a test folder and move the test images into that folder. You then compress that folder into a ZIP file as well. You can upload these zipped folders of images directly into DataRobot for training and testing.

Below is an example of what the training image file (and its subfolders) should look like before you zip it. Remember to create two ZIP files: one for the training dataset and one for the testing dataset.

files2.png

Run the project in DataRobot

First, you set up the project similar to any other classification project.

Upload the zipped training file and type in “class” for the target.

dr-ui-class-target-3.png

You can look at the images before you run Autopilot as well!!

zipped-images-viewing_2.png

Leaderboard

The Leaderboard populates in the same way as it does for other types of data. I decided to optimize on logloss for this classification problem.

leaderboard-logloss-opt2a.png

Blueprint for the best model

The best model in this case was a tuned Light Gradient Boosted Trees Classifier.

bestmodel-bp3.png

Global Confusion Matrix for each class

The model did pretty well at identifying the classes. If I was a clinician, I would rather have a lot of false positives than false negatives, for the sake of pathology.

  • In the case of the “normal" heartbeat recording, you can see an F1 score of 0.91 and a very high recall (0.98) and precision (0.86).
  • In the case of the “murmur,” the heartbeat recording has an okay F1 score (0.69). The recall is around chance (0.59) and the precision is high (0.83).
  • In the case of the “extrasystole” heartbeat recording, the F1 score is (0.86), while the recall is (0.75) and the precision is (1).

results2.png

Prediction

I uploaded the zipped prediction file and calculated the results. Then, I downloaded them and renamed the dataset to “scores.csv.”

 

 

#Predicted
Pred <- read.csv('scores.csv')
#Actual
Actual <- read.csv('scoreB.csv')

Pred$pred <- pmax(Pred$Prediction.extrasystole, Pred$Prediction.murmur, Pred$Prediction.normal)

Pred$row_id <- NULL

Pred$Class <- colnames(Pred)[max.col(Pred,ties.method="first")]
Actual$Class <- colnames(Actual[, 2:4])[max.col(Actual[, 2:4],ties.method="first")]

Pred$Actual <- tolower(Actual$Class)

Pred$Class <- str_sub(Pred$Class, 12, str_length(Pred$Class))

table(Pred$Actual, Pred$Class)

 

 

predictionReport-results2.png

Concluding Remarks

This POC demonstrated that it is possible to use DataRobot to classify spectral images of sound.

Computer vision solutions are becoming more and more prevalent. The ability to automate the classification of spectrograms opens up a new range of opportunities for the DataRobot Community.

Visual AI on sound links:

Related

Announcements
Need a Tip?
DataRobot experts are putting together some helpful DataRobot usage tips for the platform, trial, features, etc. You can find these easily in the Tip of the Day board (under Read). Let us know if you've found a good one or have a good one to add!

DataRobot Release 7.1
Ready to learn about changes in the latest release? See the What's New in DataRobot Release 7.1? article, and the DataRobot Release 7.1 (on-demand) webinar. If you have questions about the release, you can ask them right here!

DataRobot + Zepl
The acquisition of Zepl and integration of its self-service data science notebook solution provides additional flexibility for data scientists who prefer to code. Jason's blog post provides an end-to-end DataRobot demo that uses Zepl notebooks. You can check out Zepl today.

New to DataRobot? Check out all the resources to help you get going quickly! See the quick index for Knowledge Base Resources and quick index for Learning Sessions to find links to some great learning content.