This video provides a theoretical introduction to audio processing for machine learning. It covers the physics of sound as a mechanical wave, waveform representation, periodic and aperiodic sounds, the relationship between frequency and pitch (including octaves and cents), and the mapping of MIDI notes to frequencies. The video concludes by previewing future topics such as intensity, power, loudness, and timbre. So what, what is sound? Well, sound is produced by vibrating objects. So these objects vibrate and these vibrations cause our air molecules to oscillate and to bump into each other. And by bumping into each other, these air molecules kind of like change the state of the air pressure in like the local region where they are acting. And so they create in this process a wave. So in other words, we can think of sound as a wave that transmits, transfer some energy from one point to another through air molecules. Okay, But the question is, what is a mechanical wave? And I'm talking about mechanical waves here, because sound is a mechanical wave. So a mechanical wave is a wave that oscillates, and that an oscillation that travels through space, energy energy travels from, as I said, one point to another. And the particularity of mechanical waves is that they need a medium through which the wave can expand and propagate through. In the case of sound, possible, the time, this medium is just air, right? And when we have this sound wave or a mechanical wave, the medium gets the forms and in the kinds of sound, what happens is like, as I mentioned before, is that like, this molecules tend to bump into each other. And when that hap once we have like higher points of pressure, right, and then when they just like with kind of like, move away from each other, we have like points of more like a rarefaction of less air pressure. And obviously, we can represent, visualize all of this by using a pressure plot. Okay, so but obviously, this is not the type of pressure that I want to talk about here is like high pressure, right? Okay. So basically, the idea here is like, we can visualize a sound wave as I, in this case, like a simple, like a sine wave, right? And so we have at the center here, the average atmospheric pressure. And then we have overtime points of compression, which are connected with this, like denser points where like air molecules collide with each other. And then we have like these points of the REL refraction, where just the air molecules are more spaced out, right? So this is like the whole idea of a sound wave that just kind of like travels through space using air as a medium. Okay, so now we can represent a complex sound using a waveform, which, once again, is basically like a pressure pot. So we are, we are plotting the, the be a chain of air from this pressure from this zero level against time. And so which a waveform like this that I'm sure like you'll be familiar with are with, we can just like represent a whole piece of music, some noise, whatever we want, really. And it's a nice way of visualizing and having like a quick understanding of visualization of what's like a sound or like looks like. It's not just about like the frequency information, but also about intensity time graph, other types of temporal information with durations. So for example, if we have like a complex waveform for like a piece of music, we can identify on sets and the duration of the notes and all of these kind of cues. And it's kind of like mesmerizing that all of that comes from like a very simple graph and 2d graph. Cool. OK, so now we can divide sound into a couple of like classes of categories. So we have periodic and a periodic sound. So in the case of periodic sound, this is like a sound where the compressions and refractions repeat regularly. Right. So you can fix a period at which you have peaks or you have tips. Now, the simplest form of periodic sound is a single sine wave. And so we know like the in the math behind some sand waves. and really, really well. So that is like very convenient for us. Now, a more complex type of sound that is like sound, for example, of an orchestra playing is the so called complex sound That as we'll see in a few videos, is the result of multiple sine waves that gets kind of like combined a superimposed together now. So this is the ciphers, like periodic sounds. Then we have a periodic sound, obviously, like they the name here is a clear hinge. So which a periodic sound, we don't have like periodicity in the audio signal. And we can differentiate between two types of a periodic sounds. So it continues and transient. So continuous, a periodic sound is just noise. And basically, you have like a jumble of points as a way from which don't follow any pattern whatsoever in the air pressure. it's just like some random points sampled through like the on the air pressure axis. Okay, so the transient a periodic sound is a little bit different. So all of these like popping sounds or like clicks and things like that can be folded, like it's transient. So these are just like bursts of energy and which change like the air pressure, suddenly, it's kind of like a pulse thing. And then again, there you don't have any type of periodicity whatsoever. Okay, so now let's try to start with the simplest things first, which is like I having like a simple sine wave. Okay, so here we have the wave form for like a simple sine wave, And here also down here, we have the math for that. And as you can see, we can easily create the equation of this sanguine, a sine wave by using a bunch of like different parameters. So this a capital A, here is the amplitude, and then we have F, which is the frequency T is just time. And then V is the phrase. Now, let's try to take a look at each of these parameters in isolation so that you get an idea of like how they influenced a wave form itself. Okay, so frequency is connected with the idea of period. So t are the period is it's very simple that you understand as a concept, right? is just like the amount of time that we need to elapse before having two peaks, or, for example, two dips, right? Okay, so this is the period. Now, the frequency is just the inverse of the period where we've indicated period with T here. And frequency is expressed in hertz, which is number of cycles per second. Okay, so now the amplitude in stache is a yeah, quite simple, intuitive, our concepts as well to understand. And it's basically how high or low this perturbation in air pressure goes, the higher it goes, and the higher the amplitude, obviously, right? So this is like, we can take like this information just by starting from zero, and then looking at the difference, for example, between the peak, the the value of the amplitude at the peak, and the, and at zero, right? And so though we have the amplitude, okay, then the final parameter that we were never like pondering was a phase, which is indicated with the fee letter from the Greek alphabet. Okay. So it face, what face tells us is basically, like, it enables us that you shift the waveform to the right or to the left, and face basically tells us what is the position of the, the waveform at time zero, right? Okay, So which amplitude, which frequency? and with phase we are able of determining all the parameters and have like a complete understanding of a sine wave. And as we'll see in future videos, this is extremely important, because complex sounds can just be forward as a combination or superimposition of many assign weights together. Okay, cool. Okay, now let's take a look at frequency and amplitude a little bit more. And so as you can see here, we have like here with this red graph, might see waves. And so the one above here has like this frequency, or like that actually, like the period, right? And this one down here has this period here, which is a lot shorter. So the frequency being the inverse is higher. So what's the relationship between frequency and sound? Well, the higher the frequency, the higher the sound that we perceive right? Now, a similar thing can happen with amplitude. So we we see here with this purple graph in the top right, where the amplitude is quite low. And then down here, like bottom right, we have a sine wave with a higher amplitude. And that perceptually translates to have a louder sound. So larger amplitude are connected with louder sounds. And this like makes sense intuitively, because the amplitude just measures the amount of perturbation that happens in the air pressure, right? And the higher the perturbation, the higher the energy that we transfer and the having the louder that will sound. Okay, so now here we have like a basic understanding of how frequency and amplitude and map on to perceptual aspects. Now, different animals usually have like very different hearing ranges. So humans, for example, have a hearing range between which is between 20 hertz and 20,000 hertz. But if we go and we take a look at the hearing range for cats and dogs, we see that they're capable of hearing also higher frequencies, the frequencies that are we humans call ultrasounds just because like we, I mean, guess like we, everything is entrepot centric and say what we can't hear and goes beyond like what d the highest frequency here is called ultrasound. And basically, like the both cats and dogs and definitely, bats can hear ultrasounds right? So now let's take a look at a few examples of like sounds and where they are mapped into the hearing range. And now pitch is the the concepts that we use for the perception of frequency, right? Okay, so the great thing and the interesting thing I should say, like about pitch is that we don't hear pitch in a kind or we don't hear, I should say frequency in a linear way, but rather in a logarithmic way. So one other like interesting thing here is that two frequencies perceived to be more or less like the same if they differ by a power of two. And so this is like the concept of octave that's coming into place. And we'll cover this, but for now, I want you to understand that the way we perceive pitch or the way we perceive frequency through page, it's kind of like very different and different from my frequency itself, because we have this kind of like logarithmic perception of frequency. okay, so it understands the concept of pitch. We need to understand the concept of meeting notes. Now, MIDI notes are kind of like very common and very handy convention for just like transferring information about like musical notes and stuff like that. But what we want to understand here is that how they map to onto your like pitch and a keyboard. So, now here we have like a piano keyboard, right? And the idea of MIDI notes is that we can attach to each keyboard a number, i, a MIDI number, right? So, for example, the middle C down here has a MIDI note which is equal to 60. Now, we can map this MIDI notes to node names. So this MIDI notes, 60 is equal to C, 4. Now, what does that stand for? So we have like a 4 and note name, like c parameters there. So 1 is a letter. And yeah, that's just like the, the note names, right? c, d, e, f, f sharp, all these things, right? And yet, one is a number. So what's that number? Well, that number represents the octave we are at. So now I guess like most of you are familiar with the a concept of like a scale like or and not safe. and he's like, perhaps they played around like we did with a keyboard, like a piano yourself. But if you're not, the basic idea is that we have a pattern of notes that always repeats itself. So it starts with C, we go to c sharp, d and an e, f, g, a, b. and then we can go up and not safe. And here we have like the very same cutting, the very same 12 notes, which cool sentence. So there are like 12 sentence in the whole octave. Okay. So now what's the difference between this, these, this, and this? Well, basically like the, the notes that kept like the same, our names see will sound basically the same, but they'll sound somehow like higher. So it's difficult to explain, but basically, like the child itself is the same, but it will be perceived as higher. Okay. So and basically, like with the number here in the note name, like they see for the four just expresses the octave we are at. This segment provides a clear explanation of sound as a mechanical wave, emphasizing its dependence on a medium (like air) for propagation. It describes how vibrating objects create oscillations in air molecules, leading to changes in air pressure that form sound waves. This segment explains how complex sounds are represented using waveforms, which plot air pressure changes over time. It highlights the waveform's ability to provide information about frequency, intensity, and temporal aspects like note onsets and durations, demonstrating its importance in audio processing. This segment discusses the hearing range of humans and other animals, highlighting the differences in their ability to perceive sound frequencies. It provides examples of various sounds and their positions within the human hearing range, emphasizing the importance of understanding the hearing range for audio processing. This segment delves into the parameters of a sine wave – amplitude, frequency, and phase – explaining their individual effects on the waveform's shape and how they influence the sound's characteristics. The clear visual aids enhance understanding. This segment connects the objective measures of frequency and amplitude with their subjective perceptual counterparts: pitch and loudness. It clearly explains how higher frequencies correspond to higher perceived pitches and larger amplitudes to louder sounds. This segment explains the concept of MIDI notes and their mapping to musical notes on a keyboard, clarifying how MIDI numbers represent notes and octaves. It connects MIDI notes to pitch and frequency, illustrating the relationship between these concepts in musical scales. Okay so this is like a very interesting aspect and it seems that in sound, like the way we perceive sound across the board, not just my fault frequency, but also for example for amplitude for like intensity is a logarithmic and not only that, but if we're talking about music even like the very concept of time and rhythm is somehow logarithm a logarithmic and based on power is of -okay. but yeah, I guess we'll look into debt in the coming videos. Okay, so how do we map pitch on to frequency? Here we can use a very simple equation. So you have like this, a frequency f as a function of p, which is the pitch, and we'll pass the pitch in as a MIDI note. And here you have this nice function. So you have t to the power of p minus 69 divided by 12 and all of this guy is multiplied by 440. 0 And this 440, 0 you can recognize that's just the frequency of pitch 69 at minute, not 69 because like if you plug into this p here, 69, this the whole thing here will become 1 and if you multiply that by 440 is Priya just will get 440, which is the frequency for pitch 69 or a4. Okay, so now let's try to plug like a, an example number here. Okay so here we're plugging 60 which is middle C, or I think it's c 4, right? And we plug it in here. And when we when it's all said and done, we get back two hundred sixty one point six. And that's the frequency for pitch 60 now. so as we search an octave is divided into twelve of semitones. Okay so twelve different notes that are that have like the same distance, like among themselves. Right. Okay so they basically divide the whole lots of in twelve equal parts. Okay so the, so what's the relationship between two subsequent pitches? Okay and the idea of like music perception mainly. So the idea is basically that we have, we have like a quite decent resolution when it comes time to pitch. And obviously we're capable of like picking up like different immittance and understanding that there's a difference in pitch there. But we are better than that. So we are capable at like appreciating pitch differences that are smaller than a summation. So how can we measure that? Well, here is where scents come into place. So the idea of a scent is that it the the whole set is divided in 1200 sense. So we have a hundred cents for each semitone. Okay. And so here the cool thing is that's obviously like there's a, there's a threshold below which we can't just like really tell the difference between two pitches, but that threshold on so-called noticeable pitch difference is between ten and twenty five cents depending like on who you are, your I guess your age and your background. If you are like a musician or you're not probably if you're a musician like you are capable of appreciating pitch differences way better than no musicians. Okay cool. So by now you should have like a fair understanding of what what sound is, what a waveform is and everything that has to do with frequency and pitch. So the next time we'll continue delving into sound and all the different parameters which actually describe sound Precisely we're going to talk about intensity, power, loudness, that are all features that are somehow correlated or connected with the idea of amplitude. And then we'll also venture into a very cool topic which is that as time bruh, the timbre of sound, and this is like a very difficult one like to grass because it's very nearest. It's very, very, I don't, like very ambiguous, I would say and subjective. okay. So I hope you enjoyed this video. If that's the case, please remember to leave a like. and if you haven't subscribed to the channel, please do. so you'll get like all the videos that I'm posting. And finally, I want to remember that we have a community, which is on a Slack work space that's called the sound of AI, where if you join, you're