PyTorch for Audio + Music Processing: Course Overview | Highlights and Annotations by Gistr.

This PyTorch for audio and music processing course overview introduces PyTorch and TorchAudio, highlighting TorchAudio's GPU-accelerated audio processing capabilities. The course will cover building, training, and evaluating deep learning models in PyTorch, using custom datasets, performing GPU-accelerated feature extraction with TorchAudio, and applying CNNs for sound classification (using the UrbanSound8K dataset as a practical example). Prior intermediate Python programming knowledge and familiarity with deep learning concepts are recommended, but not strictly required. The focus is practical coding, with minimal theory. So what is it? Well, torch audio is an audio processing library for Pytorch. And it's part of the Pytorch environment, if you will. The great thing about torch audio is that it takes advantage of GPU acceleration so that it is very, very efficient when transforming audio data. Now, torch audio has a lot of different components and functionalities. First of all, you have functionalities to perform input and output with audio data, all sorts of audio data. It also has already audio data sets that you can easily query and download in your Python environment. And then you can also perform data augmentation, all sorts of things like time stretching or pitch shifting.01:35And finally, and probably most importantly, torch audio has a lot of native, uh, feature extraction facilities that will allow you to extract audio features like spectrograms melt spectrograms and MFCC's And the great thing about this is that all of And as I just mentioned in the previous slide, this project is going to be all about urban sound classification. and it will allow us to put into practice all the things that we'll learn about Pytorch and torch audio. Now, what's urban sound classification as a task? Well, this is quite straightforward. so you have some sound that has been captured with a mic, for example, on the street. And then you pass that over to a deep learning model. And then the deep learning model tells you what type of sign sound you have in that registration that recording, right? So, for example, in this case, uh, we pass this audio file. We, we have an inference done by this model. And the sound is classified as an ambulance or the sound of a siren. Cool. Now we can think of urban sound classification as a multi--class classification problem. Because of course, we have a bunch of different classes of sounds that difference recording can be in, right? And we'll be using a data set which has been extensively used in academia. That's called the Urban Sound at K. 8K data dataset. And here not surprisingly, we have more than 8, 000 sound samples from 10 different sound classes. For example, we have the sound of the siren, sound of a, or dogs barking, and a bunch of other sounds like that.