Mel-Frequency Cepstral Coefficients Explained Easily | Highlights and Annotations by Gistr.

This video provides a detailed explanation of Mel-Frequency Cepstral Coefficients (MFCCs), a crucial audio feature in machine learning. It begins by reviewing Mel spectrograms, a foundational concept, then delves into the mathematical formulation and visualization of cepstrums, explaining their connection to speech production and perception. The video then outlines the multi-step process of computing MFCCs, highlighting the use of Mel scaling and the Discrete Cosine Transform. Finally, it discusses the advantages, disadvantages, and applications of MFCCs in speech and music processing. But before we get to MFCC's I won't just like to remind you about what we did in the previous couple of videos and we focused on Mel spectrograms Now mouse spectrograms are going to be like an important building block to understanding MFCC's So if you are really not that familiar with that, I highly suggest you to go check out my previous couple of videos on male spectrograms okay but now let's get started with mfcc's that as I said build on top of the concept of mal spectrogram to a certain extent okay so now we have this audio feature it's called mel frequency perceptual coefficients right so in this feature we have many different words so now let's try to uh understand which word means what okay so male frequency well male frequency as I said refers somewhat to the concept of a mouse spectrogram basically the idea is that we are using the mel scale here which is a a perceptually relevant uh scale for pitch and there's something that has to do like with male spectrograms and male scale like in in mfcc's okay so we know that and we know that what male spectrograms are from previous videos okay so now let's move on the last point here is coefficients well this isn't really like that difficult to understand because the idea that probably you may guess like from from like this name is that out of these features you're going to get a number of coefficients a number of values and those coefficients will describe some characteristic of a piece of sound right that's all it is right okay and finally we have probably the most interesting part here that's a sexual right so this is a weird word right and cepstrul is the adjective but if we want to move to the noun the noun is Sepstrom okay does this word ring a bell at all no if not I'll give you a hint saps just like focus in this like four letters here any idea if not I'll give you the answer it's spectrum right so if you if you just like take seps and you spell it like backwards you'll have spec and a spectrum okay so Sepstrom is somewhat related to spectrum okay so here we have clearly a wordplay and so it's going to take us like some time to understand why this is like relevant and why researchers who came up with the idea of Sepstrom um used like this word and they had this kind of like wordplay on spectrum so i suggest you just like to bear with me because this is going to be like a quite intense and in--depth session to understand Sepstrom and then once we understand substring we're going to use like this concept to build mfcc's or to see how we can build mfcc's on top of sepstrum okay so now let's put sep stream and spectrum like down there but when we're talking about Sepstrom it's not only Sepstrom the the weird words that we we have or that these researchers who came up with this idea came out with so there are a bunch of other concepts there so that's the concept of qui-frency liftering and ramonic for example now I guess like you you you you have an idea of how like you're translating like these things into stuff that makes sense and indeed right quite frenzy is a wordplay on a frequency liftering is connected to some sort of filtering and dramonic is connected to harmonic okay so now we are entering the world of Sepstrom where we don't have frequencies but we have references we don't have filtering but we have liftering and we don't have harmonics but we have ramonics sounds a little bit weird right yeah and it is so bear with me to understand what all of these things like really mean okay so now uh, let's get like a, an historical understanding of the the concept of Sepstrom, where like it, it came out from and how it developed over time. Okay, let's get started with the math behind it. So how do we compute the set stream? Well we compute it like this. So now here we have like our Sepstrom and we indicate that like s capital C and the Sepstrom is provided by like this formula here.07:50So let's get started with x of T. Well x of t is just like a normal uh signal in the time domain, right? It's just like normal waveform. Then out of this normal waveform what we do is we take the uh, discrete Fourier transform which here I've indicated with this capital F.08:10And so when we do that we come up with a spectrum and we move from the time domain to the frequency domain. Okay, now the next step that we want to do is apply a logarithm to the spectrum. And in this way, we get the log amplitude spectrum.08:29So in other words, we are applying the logarithm on the amplitudes of the spectrum. Now, uh, if you if you are not familiar with the Fourier transform or logarithm logarithm amplitude spectrum, all of this kind of stuff, I highly suggest you to go check out my previous videos on the Fourier transform because all of these things I've addressed them uh, time and again, like in my previous videos.08:56Okay, good. So we said we start from the signal, we take the the Fourier transform. So we, we move to a spectrum, we take the log amplitude spectrum. And finally, at this point, we do the F the kind of the key step to get to a substrum, which is basically applying an inverse Fourier transform to a log amplitude spectrum.