But what is a convolution? | Highlights and Annotations by Gistr.

This video explains convolutions, a fundamental operation for combining lists or functions. It starts with probability (dice rolls), illustrating convolution as a sliding window of pairwise products and sums. This extends to image processing (blurring, edge detection using kernels) and polynomial multiplication. The key insight is that polynomial multiplication (a simple pointwise operation) is equivalent to convolving their coefficients. This leads to a faster convolution algorithm using Fast Fourier Transforms (FFTs), reducing complexity from O(n²) to O(n log n). up to 12, and we got them by mixing together one list of values, a, and another list of values, b. in the lingo, we'd say the convolution of those two sequences gives us this new sequence, the new sequence of 11 values, each of which looks like some sum of pairwise products. if you prefer, another way you could think about the same operation is to first create a table of all the pairwise products, and then add up along all these diagonals. again, that's a way of mixing together these two sequences of numbers to get us a new sequence of 11 numbers. it's the same operation as the sliding windows thought, just another perspective. putting a little notation to it, here's how you might see it written down. the convolution of a and b, denoted with this little asterisk, is a new list, and the nth element of that list looks like a sum, and that sum goes over all different pairs of indices, i and j, so that the sum of those indices is equal to n. it's kind of a mouthful, but for example, if n was 6, the pairs we're going over are 1 and 5, 2 and 4, 3 and 3, 4 and 2, 5 and 1, all the different pairs that add up to 6. but honestly, however you write it down, the notation is secondary in importance to the visual you might hold in your head for the process. here, maybe it helps to do a super simple example, where I might ask you what's the convolution of the list 1, 2, 3 with the list 4, 5, 6. you might picture taking both of these lists, flipping around that second one, and then starting with its lid all the way over to the left. then the pair of values which align are 1 and 4, multiply them together, and that gives us our first term of our output.. slide that bottom array one unit to the right, the pairs which align are 1, 5 and 2 and 4, multiply those pairs, add them together, and that gives us 13, the next entry in our output.. slide things over once more, and we'll take 1 times 6 plus 2 times 5 plus 3 times 4, which happens to be 28. one more slide and we get 2 times 6 plus 3 times 5, and that gives us 27. and finally the last term will look like 3 times 6. if you'd like,, you can pull up whatever your favorite programming language is and your favorite library that includes various numerical operations, and you can confirm I'm not lying to you.. if you take the convolution of 1, 2, 3 against 4, 5, 6,, this is indeed the result that you'll get.. we've seen one case where this is a natural and desirable operation, adding up to probability distributions, and another common example would be a moving average.. imagine you have some long list of numbers, and you take another smaller list of numbers that all add up to 1. in this case, I just have a little list of 5 values and they're all equal to 1. 5th. then if we do this sliding window convolution process, and kind of close our eyes and sweep under the rug, what happens at the very beginning, of it,, once our smaller list of values entirely overlaps with the bigger one,, think about what each term in this convolution really means.. at each iteration, what you're doing is multiplying each of the values from your data by 1. 5th and adding them all together,, which is to say you're taking an average of your data inside this little window..? overall,, the process gives you a smoothed out version of the original data., And you could modify this, starting with a different little list of numbers., And as long as that little list all adds up to 1, you can still interpret it as a moving average..? in the example shown here,, that moving average would be giving more weight towards the central value..? this also results in a smoothed out version of the data..! if you do kind of a two-dimensional analog of this,, it gives you a fun algorithm for blurring a given image. and I should say the animations I'm about to show are modified from something I originally made for part of a set of lectures I did with the Julia lab at MIT for a certain open courseware class that included an image processing unit.. there we did a little bit more to dive into the code behind all of this, so if you're curious, I'll Convolutions are a fundamental, yet less-discussed, method of combining lists of numbers or functions. Unlike simple addition or multiplication, convolutions are a genuinely new operation, not inherited from operations on numbers. They are ubiquitous in various fields, including image processing, probability theory, and solving differential equations; even polynomial multiplication can be viewed as a convolution. The video uses the example of rolling a pair of dice to introduce convolutions. It demonstrates visualizing the possible sums by arranging pairs of outcomes in a grid, highlighting that pairs with the same sum lie on diagonals. This simple example sets the stage for a more complex discussion of probability distributions and convolutions.The video introduces an alternative visualization of dice roll probabilities. By flipping one array and sliding it along the other, the pairs that sum to a specific value align vertically. This visual representation lays the groundwork for understanding the core concept of convolutions as a sliding window operation.The video extends the dice example to include non-uniform probabilities for each die face. Calculating the probability of each sum now requires multiplying and summing corresponding probabilities. This process of flipping, sliding, multiplying, and summing is defined as a discrete convolution, introducing the core mathematical concept. The video compares the runtime of a standard convolution algorithm with a faster algorithm using the Fast Fourier Transform (FFT). It shows a significant speed improvement (three orders of magnitude) using FFTconvolve from the SciPy library, motivating the need for more efficient algorithms. The connection to the multiplication table visualization is re-emphasized.The video connects convolutions to polynomial multiplication. It shows that multiplying two polynomials and collecting like terms is equivalent to convolving their coefficient lists. This establishes a link between a seemingly complex operation (convolution) and a simpler one (polynomial multiplication).The video compares the computational complexity of direct convolution (O(n^2)) with the approach using polynomial multiplication and FFT. It argues that using polynomial multiplication and FFT offers a significantly more efficient way to compute convolutions, especially for large inputs. The concept is further illustrated using a system of equations analogy. we modify this slightly, we can get a much more elegant blurring effect by choosing a different grid of values.. in this case,, I have a little 5x5 grid, but the distinction is not so much its size. if we zoom in, we notice that the value in the middle is a lot bigger than the value towards the edges.. and where this is coming from is they're all sampled from a Bell curve, known as a Gaussian distribution.. that way, when we multiply all of these values by the corresponding pixel that they're sitting on top of,, we're giving a lot more weight to that central pixel, and much less towards the ones out at the edge.. and just as before,, the corresponding pixel on the right is defined to be this sum. as we do this process for every single pixel, it gives a blurring effect, which much more authentically simulates the notion of putting your lens out of focus, or something like that. but blurring is far from the only thing that you can do with this idea.. for instance,, take a look at this little grid of values, which involves some positive numbers on the left, and some negative numbers on the right,, which I'll color with blue and red, respectively.. take a moment to see if you can predict and understand what effect this will have on the final image. so in this case,, I'll just be thinking of the image as grayscale instead of colored, So each of the pixels is just represented by one number instead of three.. And one thing worth noticing is that as we do this convolution, it's possible to get negative values.. for example,, at this point here,, if we zoom in, the left half of our little grid sits entirely on top of black pixels,, which would have a value of zero, but the right half of negative values all sit on top of white pixels, which would have a value of one. so when we multiply corresponding terms and add them together,, the results will be very negative.. and the way I'm displaying this with the image on the right is to color negative values, red and positive values blue. another thing to notice is that when you're on a patch that's all the same color,, everything goes to zero,, since the sum of the values in our little grid is zero,.-this is very different from the previous two examples, where the sum of our little grid was one,, which let us interpret it as a moving average, and hence a blur.. all in all,, this little process basically detects wherever there's variation in the pixel value as you move from left to right., And so it gives you a kind of way to pick up on all the vertical edges from your image.., and similarly,, if we rotated that grid around so that it varies as you move from the top to the bottom,, this will be picking up on all the horizontal edges,, which in the case of our little pie creature, image does result in some pretty demonic eyes..-this smaller grid,, by the way,, is often called a kernel.,? And the beauty here is how just by choosing a different kernel,, you can get different image processing effects, not just blurring your edge detection,, but also things like sharpening..-for those of you who have heard of a convolutional neural network,, the idea there is to use data to figure out what the kernels should be in the first place,, as determined by whatever the neural network wants to detect..-another thing I should maybe bring up is the length of the output.. for something like the moving average example,, you might only want to think about the terms when both of the windows fully align with each other.., or in the image processing The video formally defines the discrete convolution using mathematical notation, clarifying the summation over index pairs. A simple example of convolving the lists [1, 2, 3] and [4, 5, 6] is worked through, illustrating the process visually and numerically, emphasizing the importance of visualization over strict notation.The video explores applications of convolutions, starting with moving averages. It explains how a convolution with a small list of numbers that sum to one acts as a moving average, smoothing the original data. This concept is extended to two dimensions, showing how a similar process can be used to blur an image.The video visually demonstrates image blurring using a 3x3 grid of values (kernel). The process involves sliding the kernel across the image, multiplying kernel values by corresponding pixel values, and summing the results to obtain the blurred pixel. The video highlights the importance of rotating the kernel (180 degrees) in the context of the mathematical definition of convolution. The speaker introduces the concept of evaluating polynomials at the roots of unity (evenly spaced points on the unit circle in the complex plane) to introduce redundancy into the system. This redundancy, arising from the cyclic pattern of powers of the roots of unity, is crucial for computational efficiency improvements. This segment outlines a novel approach to convolution: treating number lists as polynomial coefficients, sampling polynomial outputs, and solving the resulting linear system to recover the convolution. The speaker highlights the initial inefficiency of this method due to the n-squared operations for sampling and solving the system, posing a challenge that will be addressed later. This segment explains how the Fast Fourier Transform (FFT) algorithm leverages the redundancy from evaluating at roots of unity to reduce the computational complexity of convolution from O(n²) to O(n log n). It describes the three-step FFT-based convolution algorithm: computing the FFT of input lists, point-wise multiplication of the results, and inverse FFT to obtain the convolution.The speaker emphasizes the significance of the FFT-based convolution algorithm, highlighting its wide applicability beyond polynomial multiplication. It connects the algorithm to various fields, including probability distributions and image processing, demonstrating the far-reaching impact of this mathematical technique and its ability to solve problems across seemingly unrelated domains.This segment presents an intriguing application of convolution: demonstrating that ordinary multiplication of numbers is essentially a convolution of their digits. The speaker explains how the existence of a fast convolution algorithm (FFT) implies the existence of a faster method for multiplying large integers, requiring O(n log n) operations instead of O(n²), although practically useful only for extremely large numbers. Suppose I give you two different lists of numbers, or maybe two different functions,, and I ask you to think of all the ways you might combine those two lists to get a ne Suppose I give you two different lists of numbers, or maybe two different functions,, and I ask you to think of all the ways you might combine those two lists to get a new list of numbers, or combine the two functions to get a new function..