How to Use Stable Diffusion For AI Art Generation – Beginners Guide to AI Art

how-to-use-stable-diffusion-feature-image

If you click on a link and make a purchase, I may receive a small commission. As an Amazon affiliate partner, I may earn from qualifying purchases.
Read our disclosure.


Key Takeaways

  • Stable Diffusion enables you to do more varied, personal, and unique works of AI art.
  • With different models, such as checkpoints, textual inversions, LoRAs, etc., you can create AI art that is specific to your taste.
  • You can use Stable Diffusion for text-to-image and image-to-image AI art generation.
  • Use ControlNet, depth-to-image, and image-to-image to control the image’s composition.
  • Use image upscaling like Nightmare AI to upscale the AI image to a higher resolution.
  • Use inpainting, VAE, Easynegative, etc., to correct mistakes in AI images.
  • If the local install is not your thing, do check out the best AI art generators article. Especially the Mage.Space AI art generator.

What Is Stable Diffusion and How It Works

Stable Diffusion is a machine learning model that generates photo-realistic images given any text input using a latent text-to-image diffusion model.

The model is based on a particular type of diffusion model called Latent Diffusion, which reduces the memory and compute complexity by applying the diffusion process over a lower dimensional latent space instead of using the actual pixel space.

The model consists of three main components: an autoencoder, a U-Net, and a text-encoder, e.g., CLIP’s Text Encoder (Stable Diffusion v1.5). In Stable Diffusion 2.x, the text encoder used is OpenCLIP.

Downloading And Installing Stable Diffusion

To run Stable Diffusion, you need the following files:

Check out: Beginners guide to AUTOMATIC1111 WebUI

Hardware requirements for Stable Diffusion

Minimum hardware requirements for using Stable Diffusion:

Troubleshooting

Below are links to additional material that can help you get started and troubleshoot issues with Stable Diffusion:

Stable Diffusion Models

civitai-homepage
Civitai is currently the leading Stable Diffusion model repository/library.

Stable Diffusion is extremely popular among AI art enthusiasts because it can produce AI art based on different user-generated AI art models.

You can think of models, embeddings, LoRAs, etc., as stylization and data guides for the AI art algorithm. Models enable AI artists to produce unique-looking images with a certain artistic style and vibe. User-generated AI art models are pretty much why Stable Diffusion can produce unique AI art pieces that differ greatly from what commercial AI art generators produce.

Checkpoint models are the same as models but only named differently and are usually fine-tuned versions of the base model (Stable Diffusion v1.5, v2.0, etc.).

Downloading and installing a Stable Diffusion model

Currently, the best source for Stable Diffusion models is Civitai. From the site, you can explore all sorts of models, Checkpoints, LoRAs, Hypernetworks, Textual Inversions (embeddings), etc. Download the model you are most interested in and place it in the appropriate folder:

ModelDirectory/FolderFile typesHow to use in prompt
Checkpoints*\stable-diffusion-webui\models\Stable-diffusion*.ckpt, *.safetensors(select a checkpoint model from the upper left-hand corner of the Web UI)
Hypernetworks*\stable-diffusion-webui\models\hypernetworks*.pt, *.ckpt, *.safetensors<hypernet:filename:multiplier>
LoRA*\stable-diffusion-webui\models\Lora*.pt<lora:filename:multiplier>
Textual Inversion*\stable-diffusion-webui\embeddings*.pt, *.safetensors, imagesembedding’s filename
ControlNet Models (preprocessor)*\stable-diffusion-webui\extensions\sd-webui-controlnet\models*.pth, *.yamlSelect from the WebUI
Wildcards*stable-diffusion-webui\extensions\sd-dynamic-prompts\wildcards*.safetensorsWrite a normal prompt using words related to the wildcard
Table showing models, where to copy them, and how to use them.

For further guidance on how to use models, specifically in Stable Diffusion, read GitHubs How to use models article. Also, read the best Stable Diffusion models article to find the best models to be used with Stable Diffusion.

Types of Stable Diffusion Models

  • Checkpoint: Full Stable Diffusion models that contain everything you need to create an image. No additional files are required. Checkpoint models are usually sized at 2-7GB.
  • Hypernetworks: Network modules to be added and used with checkpoint models. File size ranges from 5 to 300MB in size.
  • LoRAs: Small patch files to be used with checkpoint models for modifying styles. Files tend to be 10-500MB in size.
  • Textual inversions: These are also called embeddings. Small files to be used with checkpoint models that define new keywords to be used for generating new styles or objects. Files are roughly 10-100KB in size.
  • ControlNet: Helps to create better compositions in AI art.
  • Aesthetic Gradient
  • LyCORIS
  • Poses
  • Wildcards

Using Stable Diffusion with AUTOMATIC1111 WebUI

You can launch Stable Diffusion by locating and executing the following file: webui-user (Type: Windows Batch File) from the following folder: *\stable-diffusion-webui

At the end of the execution, the WebUI-user batch file script will automatically launch the AUTOMATIC1111 WebUI to your browser. Do not close the cmd window, however, as AI image generation needs that to be open.

Using the Stable Diffusion Web UI

The easiest way to get started and get some sort of results with SD (Stable Diffusion) is to go to Civitai, download a model, and select that model to be used with Stable Diffusion.

stable-diffusion-user-interface
Notice that I’ve selected the comicsVision AI art model as the Stable Diffusion checkpoint model (upper left-hand corner).

When you’ve selected a model (upper left-hand corner in the Web UI), the next step is to copy and paste the text prompt and negative text prompt found from Civitai for that specific image and model.

By doing this, you can instantly get results that look like something, rather than typing random words and getting mediocre results.

Check out: The Best Stable Diffusion Prompts

Remember that SD v1.4 and v1.5 do not work effectively with negative text prompts, as do SD 2.x versions (because of the text OpenCLIP text encoder).

When you hit the Generate button, the Command Prompt and the Web UI shows how the image is generated percentage-wise, and after it’s done generating the image, you can find the image from the following folder: *\stable-diffusion-webui\output\txt2img-images

You will quickly notice that the output images are pretty small in resolution. You will need to upscale the image; luckily, there are plenty of free and paid image upscalers.

How to write Stable Diffusion text prompts

civitai-text-prompt-copy
The text prompt is shown next to the image in Civitai. Also, notice that it has the Sampler, CFG Scale, and Steps information for you to copy and experiment with.

The best way to get the hang of text prompts is to go to Civitai, open a model, and then select an image showcasing what kind of images the model can generate.

On the right side of the image, you can see what text prompt and negative text prompt were used to generate the image.

Starting to type text prompt out of your head is difficult for a beginner, so start with ready-made text prompts and slowly understand how text prompts work.

Use brackets () to define the weight of that specific prompt: (green eyes: 1.6) -this means the AI puts 60% more emphasis on including green eyes in the image.

You can also type certain “trigger words” to the text prompt to activate certain textual inversions, such as typing into a negative text prompt: (easynegative)will activate easynegative embedding to be used as part of the image generation in Stable Diffusion.

You can find these trigger words (specific to each model) from Civitai in the Details panel.

Check out: The Best Stable Diffusion Prompts

Image Upscaling And Image Enhancements

AI art generators usually output low-resolution images compared to artists who produce handmade digital art. It’s common to see 7000×9000 pixel-sized images from digital artists, whereas AI art generators tend to produce 512-1024px-sized images in standard.

Upscaling AI art is important if you are thinking of selling AI art in any form.

Upscaling your AI art for free

nightmare-ai-upscaling-images

Go to https://replicate.com/nightmareai/real-esrgan, create a GitHub account, upload your AI art, and hit the Submit button; that’s it.

You can upscale an image up to 10x the original size.

Check out: Best AI Image Upscalers

Local AI art upscaling

image-upscaling-demo-real-esrgan
The base image (original) had a resolution of 320×448 pixels. It was upscaled twice through the Real-ESRGAN model. The middle image has a resolution of 1280×792 pixels, and the biggest image has a resolution of 5120×7168 pixels.
Note: the base image isn’t that good as per output, so the upscaled version isn’t that good either (but it’s quite sharp and has enough pixels for monetization).

There are many ways to upscale and enhance image quality locally on your computer; one is using Real-ESRGAN. You can download Windows executable files from GitHub site (it also has instructions on how to set it up – instructional YouTube video).

windows-files-example-input-output

Steps for using Real-ESRGAN:

  • Create a new folder named, for example, REAL-ESRGAN to any drive you want. I created the folder to this path: C:\Users\***\OneDrive\Desktop\***\REAL-ESRGAN
  • Extract the downloaded file to this folder
  • Name the extracted folder something like Real-ESRGAN-Master
  • Open Command Prompt (Windows->Search->CMD)
  • Paste the following path to the Command Prompt: cd C:\Users\***\OneDrive\Desktop\***\REAL-ESRGAN\Real-ESRGAN-Master
    • You basically paste the folder path where the files are to the command prompt, plus the cd (command) in front of it.
  • Have an image ready (that you would like to upscale) and paste it to the folder: Real-ESRGAN-Master
  • Name it input (it’s ok to overwrite the extracted input.jpg)
  • Paste the following code to the command prompt: realesrgan-ncnn-vulkan.exe -i input.png -o output.png
    • If your image is a PNG file, name the input as input.png
  • Hit ENTER
  • The output is shown in the folder: Real-ESRGAN-Master. The file is output.png.
    • The model basically overwrites the output.png with the image file you provided as input.

If manual upscaling feels intimidating, there are plenty of image upscaling services that you can use to upscale your AI art. You need high-resolution images if you think about doing a print-on-demand business model using AI art.

Fixing errors in AI art and enhancing images

AI art isn’t perfect, and you frequently see errors in the output images. Regular mistakes such as disfigured faces, weird-looking eyes, double heads, triple hands, not showing body, etc., pop up often when generating AI art.

Common fixes for Stable Diffusion art mistakes:

ProblemFixResources
Two-heads, double head.-Use a 1:1 image ratio.
-Generate multiple images and discard ones that have two heads.
-Generate full-body image.
Full-body missing (while included in the text-prompt).-Add keywords to the text prompt that describe the body part of the figure: jumping, standing, legs, shoes, beautiful dress, etc.
-Use a portrait image ratio (2:3, for example)
Disfigured face and weird-looking eyes.-Use the SD’s built-in ‘Restore faces’ option.
-Use post-processing sites such as CFPGAN or CodeFormer.
-Use VAE (variational auto-encoder) for weird-looking eyes.
CFPGAN,
CodeFormer,
VAE
Disfigured anime fingers and hands, extra hands, and missing hands.-Use Easynegative textual inversion.
-Use inpainting (enable the model by selecting it from the dropdown list ‘Stable Diffusion checkpoint models’ -upper left-hand corner in the Web UI)
Easynegative,
Inpainting model for SD
Table showing common problems in Stable Diffusion and the fix for the problem, including resources.

Check out: Midjourney vs. Stable Diffusion

Feature image credits.

Search
artist-profile-picture-avatar

Okuha

Digital Artist

I’m a digital artist who is passionate about anime and manga art. My true artist journey pretty much started with CTRL+Z. When I experienced that and the limitless color choices and the number of tools I could use with art software, I was sold. Drawing digital anime art is the thing that makes me happy among eating cheeseburgers in between veggie meals.

More Posts

Thank You!

Thank you for visiting the page! If you want to build your next creative product business, I suggest you check out Kittl!