Basics of Audio File Processing in R

This post provides basic information on audio processing using R as the programming language. It also walks through and understands some basics of sound and digital audio.



By Taposh Dutta Roy, Kaiser Permanente

Figure

Audio File Processing

 

In today’s day and age, digital audio has been part and parcel of our life. One can talk to Siri or Alexa or “Ok Google” to search for information. Siri or Alexa or Google knows who is asking for information, and can search for the ask and reply contextually.

The idea of writing this post is to provide basic information on audio processing using R as the programming language. However, before we go to using R as our choice of language, let’s walk through and understand some basics of sound and digital audio.

 

What is Sound?

 
Sound is a pressure change of air molecules created by a vibrating object.

This pressure change by vibrating object creates a wave. A wave is how sound propagates. Sound is a thus a mechanical wave that results from the back and forth vibration of the particles of the medium through which the sound wave is moving [4]. Sound is processed through our ears via the ‘auditory’ sense. Thus, sound can also be called as audio. Audio processing a hugely researched domain and lot of very good papers talk about audio processing. As part of this post we will only talk about very basic but helpful information to develop an intuitive understanding.

Figure

Parts of audio signal

 

Sound waves can be described by the number of waves per second and the size of the waves. The number of waves per second (or the distance between high or low points) is the frequency of the sound. As shown in the figure above, the horizontal distance between any two successive equivalent points on the wave is called the wavelength. Thus, the wavelength is the horizontal length of one cycle of the wave.

The period of a wave is the time required for one complete cycle of the wave to pass by a point. So, the period is the amount of time it takes for a wave to travel a distance of one wavelength. This understanding is important for analysis of sound as we will move from time domain to frequency domain.

Thus frequency of sound = Velocity of propagation / Wavelength

The size of each wave is described by the amplitude. Amplitude determines how loud or soft a sound will be.

 

Components of Sound or Audio

 
Sound can be divided into multiple components depending on how you want to analyze it. For the purpose of this article, we will classify sound into 2 main components — Amplitude and Frequency. The frequency components can be further divided into — Pitch, Formant, Bandwidth, Sampling Rate, and others (overtone, harmonics etc.)

Figure

Basic Components of Sound

 

Amplitude:

As noted, amplitude determines how loud or soft a sound will be. Loudness is a measure of sound wave intensity. Intensity is the amount of energy a sound has over an area. The same sound is more intense if you hear it in a smaller area. In general, we call sounds with a higher intensity louder. Amplitude is a thus a measure of energy. The more energy a wave has, the higher its amplitude. As amplitude increases, intensity also increases.

Some Basic Frequency Components:

Literature provides a variety of frequency components, for the purpose of this article we will talk about the pitch, sampling rate, format, and bandwidth.

Sample rate (or sampling frequency) is the number of samples per second in a Sound. For example: if the sampling rate is 4000 hertz, a recording with a duration of 5 seconds will contain 20,000 samples.

Pitch is the frequency of the fundamental component in the sound, that is, the frequency with which the waveform repeats itself. Pitch depends on the frequency of a sound wave. Frequency is the number of wavelengths that fit into one unit of time.

Formant is a concentration of acoustic energy around a particular frequency in the speech wave. Thus, they are the peaks that are observed in the spectrum envelope.

Bandwidth is the range of frequencies within a given band, in particular that used for transmitting a signal.

 

Basic Audio Analysis in R

 
There are a few packages in R which do audio analysis. The key ones that we have seen are : tuneR, wrassp and audio. We use “readr” package to read the wave form.

 

tuneR Package:

 
Documentation: https://www.rdocumentation.org/packages/tuneR/versions/1.3.3

Before we go ahead and analyze the entire training data-set, lets analyze a single wave.

#Read a Wave File
library(readr)
library(tuneR)
#path of file
file_audio_path <- ‘audio_file.wav’
#Read Files
train_audio = readWave(file_audio_path)
#Lets see the structure of the audio.
str(train_audio)

 

Figure

Audio file after reading

 

Observations :

The wav file has one channel (@left) containing 18593 sample points each, considering the sample rate (train_audio@samp.rate = 4000) this corresponds to a duration of 4.6s:

18593 /train_audio@samp.rate = 4.6 sec

 

Our wav file has a 16-bit depth (train_audio@bit), this means that the sound pressure values are mapped to integer values that can range from -2¹⁵ to (2¹⁵)-1. We can convert our sound array to floating point values ranging from -1 to 1 as follows:

s1 <- s1 / 2^(train_audio@bit -1)

 

Plotting the Wave:

A time representation of the sound can be obtained by plotting the pressure values against the time axis. However we need to create an array containing the time points first:

timeArray <- (0:(18593–1)) / train_audio@samp.rate
#Plot the wave
plot(timeArray, s1, type=’l’, col=’black’, xlab=’Time (ms)’, ylab=’Amplitude’)

 

Figure

Output of the plot using tuneR

 

Advanced :

The R-package “tuneR” also provides complex frequency domain analysis functions such as melfcc[8,9,10], audspec etc. There are some good articles that talk about MELFCC such as “The dummy’s guide to MFCC” and “Mel Frequency Cepstral Coefficient (MFCC) tutorial”.

#tuneR
m2 <- melfcc(train_audio, numcep=9, usecmp=TRUE, modelorder=8,
spec_out=TRUE, frames_in_rows=FALSE)

 

 

Wrassp Package :

 
Documentation: https://ips-lmu.github.io/The-EMU-SDMS-Manual/chap-wrassp.html

The package wrassp is capable of more than just the mere reading and writing of specific signal file formats. One can use wrassp to calculate the formant values, their corresponding bandwidths, the fundamental frequency contour and the RMS energy contour of the audio file.

library(wrassp)
# create path to wav file
file_audio_path <- ‘audio_file.wav’
# read audio file
au = read.AsspDataObj(file_audio_path)
str(au)

 

Figure

Output of wrassp package, audiofile

 

Observations :

Review the similarities and differences between the output of tuneR package and wrassp. Both provide the same sample rate, number of bits etc. However, wrassp provides a class object AsspDataObj for further use.

# (only plot every 10th element to accelerate plotting)
plot(seq(0,numRecs.AsspDataObj(au) — 1, 10) / rate.AsspDataObj(au),
au$audio[c(TRUE, rep(FALSE,9))],
type=’l’,
xlab=’time (s)’,
ylab=’Audio samples’)

 

Figure

Output of audio file using Wrassp package

 

Calculate Formant and Bandwidth :

In the initial part of the tutorial we talked about frequency components — pitch, formant and bandwidth. Let’s compute the formant and bandwidth with wrassp.

# calculate formants and corresponding bandwidth values
fmBwVals = forest(file_audio_path, toFile=F)
fmBwVals

 

Figure

Formant and Bandwidth

 

# plot the first 100 F1 values over time:
plot(fmBwVals$fm[1:100,1],type=’l’)

 

# plot all the formant values
matplot(seq(0,numRecs.AsspDataObj(fmBwVals) — 1) / rate.AsspDataObj(fmBwVals) +
attr(fmBwVals, ‘startTime’),
fmBwVals$fm,
type=’l’,
xlab=’time (s)’,
ylab=’Formant frequency (Hz)’)

 

Figure

Formant Frequencies vs time

 

#plot Bandwidth
plot(fmBwVals$bw)

 

Figure

Distribution of Bandwidths

 

#plot the formant
plot(fmBwVals$fm)

 

Figure

Distribution of Formant Values

 

Advanced Functions:

There are a lot of advanced functions that can be explored such as dftSepectrum, RMS energy contour (rmsana), acfana, rfcana etc to add as a feature to your model.

Feature Set Development

Lets use both tuneR and wrassp R-packages and develop an initial set of features for our audio signal.

extract_audio_features <- function(x) {
#tuneR
tr <- readWave(x) # load file
#print(t@left)
ar <- read.AsspDataObj(x)
#File Name
fname <- file_path_sans_ext(basename(x))
#add Feature Number of Samples
num_samples <- numRecs.AsspDataObj(ar)
# calculate formants and corresponding bandwidth values
fmBwVals <- forest(x,toFile=F)
fmVals <- fmBwVals$fm
bwVals <- fmBwVals$bw
#add Feature Sample Rate
sample_rate <- tr@samp.rate
left= tr@left
#left
range_audio = range(tr@left)
#add Feature min_amplitude_range
min_range =range_audio[1]
#add Feature min_amplitude_range
max_range =range_audio[2]
normvalues=left/2^(tr@bit -1)
normal_range <- range(normvalues)
#add Feature normalized_min_amplitude_range
normal_min_ampl_range <- normal_range[1]
#add Feature normalized_min_amplitude_range
normal_max_ampl_range <- normal_range[2]
mylist <- c(fname=fname,num_samples=num_samples,sample_rate=sample_rate, min_range=min_range, max_range=max_range, normal_min_ampl_range=normal_min_ampl_range, normal_max_ampl_range=normal_max_ampl_range,fmVals=fmVals,bwVals=bwVals)
return(as.data.frame(mylist))
}

 

Now, call this feature extractor function & review

file_audio_path <- ‘../audio_file.wav’
output = extract_audio_features(file_audio_path)
head(output,10)

 

Note, I have used heart beat audio file for this tutorial from kaggle. Finally, use this data for any processing you might need to do. I have shared this code and audio file in my github account. let me know your thoughts.

 
References

Heartbeat Sounds
Download Open Datasets on 1000s of Projects + Share Projects on One Platform. Explore Popular Topics Like Government…
 

https://cran.r-project.org/web/packages/seewave/vignettes/seewave_IO.pdf
 

Individual sound files for each selection (or how to create a warbleR function)
 

The dummy’s guide to MFCC
Disclaimer 1 : This article is only an introduction to MFCC features and is meant for those in need for an easy and…
 

http://www.cs.toronto.edu/~gpenn/csc401/soundASR.pdf
 

https://theproaudiofiles.com/understanding-sound-what-is-sound-part-1
 

What are the differences between audio and sound?
Answer (1 of 3): In physics, sound (noise, note, din, racket, row, bang, report, hubbub, resonance, reverberation) is a…
 

Sound as a Mechanical Wave
A sound wave is a mechanical wave that propagates along or through a medium by particle-to-particle interaction. As a…
 

https://www.nde-ed.org/EducationResources/HighSchool/Sound/components.htm

https://arxiv.org/pdf/1207.5104.pdf

https://www.r-bloggers.com/intro-to-sound-analysis-with-r/

https://medium.com/prathena/the-dummys-guide-to-mfcc-aceab2450fd

https://towardsdatascience.com/getting-to-know-the-mel-spectrogram-31bca3e2d9d0

http://www.practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/

https://blogs.rstudio.com/tensorflow/posts/2019-02-07-audio-background/

http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/

 
Bio: Taposh Dutta Roy leads Innovation Team of KPInsight at Kaiser Permanente. These are his thoughts based on his personal research. These thoughts and recommendations are not of Kaiser Permanente and Kaiser Permanente is not responsible for the content. If you have questions Mr. Dutta Roy can be reached via LinkedIn.

Original. Reposted with permission.

Related: