Variational Autoencoders - EXPLAINED! | Highlights and Annotations by Gistr.

This video explains Variational Autoencoders (VAEs), a type of generative neural network. VAEs, unlike standard autoencoders, generate new data samples by learning a probability distribution over the input data's latent representation. This allows for the generation of novel outputs (e.g., images) by sampling from this learned distribution. The video contrasts VAEs with Generative Adversarial Networks (GANs), highlighting their differences in training methods and the quality of generated images (VAEs often produce blurrier images than GANs). i'm also going to throw in some technical jargon for you extra curious viewers This is code emporium So let's get started Let's start out with a broad concept Generative modeling Generative models are also just neural networks themselves normal neural network models usually take some sample as input and this sample is like raw data. It could be like an image text or audio generative models on the other hand produce a sample as an output because of this flip I think you can see how and why this is so interesting. with this technology. There is so much potential. For example. you can train a model to understand how dogs work by feeding it hundreds of dog images. Then during test time we can just ask the model for an image and it'll spit out a dog image. The cool thing is every time that we ask our model to generate a dog, it'll generate a different dog every time. So you can create an unlimited gallery of your favorite animal. Dago's sweet. But what does this generative model Black box look like? Let's take a look at this variational auto encoder. As an example as mentioned before. Variational autoencoders are a type of generative model. They are based off another type of architecture called auto-encoders. These auto-encoders consists of two parts, an encoder and a decoder. The encoder takes an input sample and converts its information into some vector basically a set of numbers and we have a decoder which takes this vector and x-man's it out to reconstruct the input sample. Now you may be thinking why are we doing this? What is the point of trying to generate an output that is the same as the input and the answer to that is there is no point while using auto-encoders we don't tend to care about the output itself but rather the vector constructed in the middle. This vector is important because it is a representation of the input image or audio and it's in a form that the computer understands. So another question what is so great about this vector on its own? I'd say the vector itself has limited use but we can feed it to complex architectures to solve some really cool problems. Here's an example of a paper that uses auto-encoders to infer location of an individual based on his or her tweet This architecture that they use consists of three stacked autoencoders to represent the input text from the tweet This is then piped to two output layers. one of them is used to determine the state in the United States where the tweet was made and the other is to estimate the latitude and longitude positions of the user where the tweet was made. I'll link the paper below in case your extra curious. It consists of an encoder and a decoder during training time we feed the images input and make the model learn the encoder and decoder parameters required to reconstruct the image. Again During testing time we only need the decoder part because this is the part that generates the image. To do this we need to input some vector. However, we have no idea about the nature of this vector. If we just give it some random values, more likely than not, we will end up with an image that looks like garbage So that's pointless. Now we need some method to determine this hidden vector. Here's some more intuition. The idea behind a term. This vector is through sampling from a distribution. I'll explain these basic concepts of sampling and distribution but I'll also translate that into more technical terms for those of you who are more advanced in probability theory. Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) are both generative models, but they differ significantly in their approach to generating data and their training stability. VAEs utilize an encoder-decoder architecture with two loss functions: reconstruction loss and latent loss. The reconstruction loss ensures the decoder reconstructs the input data accurately, while the latent loss encourages the latent vector (the representation of the input in a lower-dimensional space) to follow a specific distribution (often Gaussian). By optimizing these two losses, the VAE learns to generate new data points by sampling from the learned latent distribution and decoding the sample. Training VAEs is generally more stable than training GANs because the loss functions are well-defined and differentiable, allowing for the use of standard optimization techniques. GANs, on the other hand, consist of two components: a generator and a discriminator. The generator creates synthetic data, while the discriminator tries to distinguish between real and fake data. These two components engage in a minimax game: the generator aims to fool the discriminator, and the discriminator aims to correctly identify fake data. The training process involves updating both the generator and discriminator iteratively, based on their performance in this game. GAN training is notoriously unstable, often suffering from issues like mode collapse (the generator producing only a limited variety of samples) and vanishing gradients. While VAEs have a more stable training process, GANs, when successfully trained, can often generate higher-quality samples.