VAE In Stable Diffusion – The Complete Overview


If you click on a link and make a purchase, I may receive a small commission. As an Amazon affiliate partner, I may earn from qualifying purchases.
Read our disclosure.

What Is VAE In Stable Diffusion?

VAE Comparison. Image credits.

Variational Autoencoder (VAE) is an autoencoder with regularization during training to prevent overfitting. It ensures that the latent space possesses desirable properties for effective generative processes.

VAEs, in simple terms, are powerful tools used together with Stable Diffusion checkpoint models to enhance the quality of images, enriching them with vibrant colors and sharper details.

You can find VAEs in the older and .ckpt format as well as the newer, safer, and faster .safetensors format.

VAEs offer the additional advantage of enhancing the appearance of hands and faces. While all models come with built-in VAEs, there are instances where external VAEs can outperform the default option, delivering superior results.

How to Use VAE In Stable Diffusion?

When using Stable Diffusion, there’s no need to install a separate VAE file, as all the models, whether v1, v2, or custom, already have a default VAE.

When downloading custom or separate VAE, paste it to the following folder (when using AUTOMATIC1111’s WebUI): *\stable-diffusion-webui\models\VAE

You can also manually select VAEs from AUTOMATIC1111 WebUI’s Settings tab.

Selecting VAE manually from AUTOMATIC1111 WebUI’s Settings. Settings -> Stable Diffusion -> SD VAE -> Choose your preferred VAE

When people mention downloading and utilizing a VAE, they are referring to an enhanced version of it. Sometimes a checkpoint model creator trainer might fine-tune the VAE component with additional data, and thus you might need to download one.

This is the case when you are using, for example, the NyanMix diffusion model. The creator notifies users to download additional -file for the model to work correctly.

The left side image uses the NyanMix VAE. The right side image was created without VAE. Checkpoint model: NyanMix.

Instead of releasing an entirely new checkpoint model, which can be a large file, only the updated portion (VAE), which is small in size, is made available.

The impact of using a VAE is typically subtle but significant. An improved VAE excels at decoding and encoding images from the latent space, resulting in better recovery of fine details. This enhancement becomes particularly valuable in rendering elements like eyes and text, where preserving intricate features is crucial.

Image showing the dimensionality reduction principle with encoder and decoder. Image credits.

Stability AI has introduced two variants of fine-tuned VAE decoders:

  • EMA (Exponential Moving Average)
  • MSE (Mean Square Error)

These variants serve as metrics for gauging the effectiveness of the autoencoders.

Where to Find VAEs For Stable Diffusion?

Hugging Face is one of the most popular places to find different VAEs, besides Civitai. When using Civitai, remember to add VAE as a filter.

The Best VAEs For Stable Diffusion

How VAEs Work In Stable Diffusion?

Regularization is applied to the output distributions of variational autoencoders (VAEs) to ensure the latent space possesses desirable properties. Image credits.

Variational autoencoders (VAEs) are neural network architectures that address the issue of irregular latent spaces in traditional autoencoders. Autoencoders perform dimensionality reduction by encoding data into a bottleneck and decoding it back with minimal information loss.

However, the latent space of an autoencoder can suffer from overfitting, resulting in inconsistencies and meaningless content.

VAEs mitigate this problem by having the encoder output a distribution over the latent space instead of a single point. They also include a regularization term in the loss function to promote a more organized latent space. This regularization term ensures that the returned distribution exhibits desirable properties.

The loss function of VAEs combines a reconstruction term and a regularisation term, derived using variational inference, a statistical technique. This approach assumes a simple underlying probabilistic model for the data.

In comparison to generative adversarial networks (GANs), VAEs have received less scientific attention, partly due to the perceived complexity of their theoretical foundations, such as probabilistic models and variational inference. However, efforts are being made to make VAEs more accessible by providing valuable insights and strong theoretical explanations, similar to what has been done for GANs.

VAE Simplified

Imagine lowering a set of 512×512 images to 256×256 resolution and then returning the same set to its original size without losing details. This encoding and decoding can be done with a trained neural network.

By examining millions of images, the neural network can calculate probabilities and make accurate guesses about the original 512×512 data encoded in the 256×256 “latent” images.

A variational autoencoder can compress 512×512 images to 256×256 and further to 128×128 latent images. It can then reconstruct near-approximations of the original images. The same process can be done with 64×64 images using a three-layer VAE.

Reversing this process through the three layers of the VAE can produce an image that resembles the original.

In summary, this technique is called a variational auto-encoder.

Feature image credits.



Digital Artist

I’m a digital artist who is passionate about anime and manga art. My true artist journey pretty much started with CTRL+Z. When I experienced that and the limitless color choices and the number of tools I could use with art software, I was sold. Drawing digital anime art is the thing that makes me happy among eating cheeseburgers in between veggie meals.

More Posts

Thank You!

Thank you for visiting the page! If you want to build your next creative product business, I suggest you check out Kittl!