Image classification on audio? Yes.

Data Scientist
Data Scientist
1 0 184

Recently, I ran my own POC to determine if I could use the new image classification capabilities of DataRobot to classify images sourced from audio files. I used sound files that were recorded in clinical settings for this proof of concept. Specifically, I took audio from patients with either normal or abnormal heartbeats and changed it into spectrograms (image files), then used DataRobot to classify the images (heartbeats) as Normal or Abnormal.

In this blog, I describe how to take sound data and make it ready for image classification. You can find the data for my POC here.

The images below illustrate the visual differences between spectrograms for a normal heartbeat versus an abnormal heartbeat (in this case, caused by a murmur).

Normal Heartbeat Spectrogram

normal1Heart.png

Abnormal Heartbeat Spectrogram (murmur)

abnormalHeart1.png

Create spectrograms of data for DataRobot

I only wanted to run this once to get the images, so I commented out the code. (Make sure to replace "/folder/" with the actual location of your WAV files.) Running this results in PNG files with the same filenames as the WAV files. You will find these in the same folder as the WAV files. This is the visual representation of your sound data.

 

#spectrogramFolder("/folder/", htmlPlots = TRUE, verbose = TRUE, step = NULL, overlap = 50, wn = "gaussian",
# zp = 0, ylim = NULL, osc = TRUE, xlab = "Time, ms",
# ylab = "kHz", width = 900, height = 500, units = "px",
# res = NA) 

 

Set up folders

Create a folder for your training data and create subfolders for each class. Move the images for each class into the correct subfolder for the related class. Compress the whole training folder into a ZIP file. Create a test folder and move the test images into that folder. You then compress that folder into a ZIP file as well. You can upload these zipped folders of images directly into DataRobot for training and testing.

Below is an example of what the training image file (and its subfolders) should look like before you zip it. Remember to create two ZIP files: one for the training dataset and one for the testing dataset.

files2.png

Run the project in DataRobot

First, you set up the project similar to any other classification project.

Upload the zipped training file and type in “class” for the target.

dr-ui-class-target-3.png

You can look at the images before you run Autopilot as well!!

zipped-images-viewing_2.png

Leaderboard

The Leaderboard populates in the same way as it does for other types of data. I decided to optimize on logloss for this classification problem.

leaderboard-logloss-opt2a.png

Blueprint for the best model

The best model in this case was a tuned Light Gradient Boosted Trees Classifier.

bestmodel-bp3.png

Global Confusion Matrix for each class

The model did pretty well at identifying the classes. If I was a clinician, I would rather have a lot of false positives than false negatives, for the sake of pathology.

  • In the case of the “normal" heartbeat recording, you can see an F1 score of 0.91 and a very high recall (0.98) and precision (0.86).
  • In the case of the “murmur,” the heartbeat recording has an okay F1 score (0.69). The recall is around chance (0.59) and the precision is high (0.83).
  • In the case of the “extrasystole” heartbeat recording, the F1 score is (0.86), while the recall is (0.75) and the precision is (1).

results2.png

Prediction

I uploaded the zipped prediction file and calculated the results. Then, I downloaded them and renamed the dataset to “scores.csv.”

 

#Predicted
Pred <- read.csv('scores.csv')
#Actual
Actual <- read.csv('scoreB.csv')

Pred$pred <- pmax(Pred$Prediction.extrasystole, Pred$Prediction.murmur, Pred$Prediction.normal)

Pred$row_id <- NULL

Pred$Class <- colnames(Pred)[max.col(Pred,ties.method="first")]
Actual$Class <- colnames(Actual[, 2:4])[max.col(Actual[, 2:4],ties.method="first")]

Pred$Actual <- tolower(Actual$Class)

Pred$Class <- str_sub(Pred$Class, 12, str_length(Pred$Class))

table(Pred$Actual, Pred$Class)

 

predictionReport-results2.png

Concluding Remarks

This POC demonstrated that it is possible to use DataRobot to classify spectral images of sound.

Computer vision solutions are becoming more and more prevalent. The ability to automate the classification of spectrograms opens up a new range of opportunities for the DataRobot Community.

Visual AI on sound links:

Related

Announcements
Welcome to DataRobot Community! Explore, learn, and engage with your peers around all things AI and ML.

To learn more about the community, check out About Community.


Join the Fight with DataRobot

Our Research Center centralizes information relating to COVID-19 research and ongoing activity. Ready to get started? Click here!