okuha-logo-small-white

AI Art Models – Everything You Need to Know (Incl. SD Models)

ai-art-models-feature-image-03-2024

If you click on a link and make a purchase, I may receive a small commission. As an Amazon affiliate partner, I may earn from qualifying purchases.
Read our disclosure.


Key Takeaways

  • AI art models determine what kind of images the AI art generator can produce.
  • AI art generators can only produce images that are trained in their model.
  • If a model does not have, for example, dogs in the model, the AI art generator can’t produce an image with a dog in it.
  • A model can’t have a specific style, aesthetic, or understanding if the model wasn’t trained with a specific dataset.
  • The dataset and training determine the models output.
  • You can find the best checkpoint models, LoRA’s, Hypernetworks, and Embeddings from Civitai.
  • Midjourney and Lexica, for example, have developed their own models, such as Midjourney v6 and Lexica Aperture v4.

How AI Art Models Work

AI art models determine what kind of images the AI art generator can produce.

If a model was trained with images with only people in them, the model couldn’t produce an image of a cat.

Likewise, if a model is specifically trained with anime artwork, the generator can’t produce real-life-looking persons.

When you see images produced with Midjourney, and you think about the scale and variety it can produce images, realize that there are 400M+ images in their model. Midjourney is based on the LAION-400M dataset, which means there are 400,000,000 image-text pairs in their model.

Image-text pairs, in their simplicity, mean that if you have an image, it is paired with a text that somewhat describes what the image has or is all about.

With these image-text pairs, the AI algorithm can generate new images by combining the images inside the models (simplified explanation).

image-text-pairs-LAION-400M-and-LAION-5B
Image-text pairs that you can query from the LAION5B and LAION400M datasets.

The Different Types of AI Art Models

Below are the different types of models you can find online:

On top of the mentioned models, AI art generators like Midjourney and Lexica have created their own models that create images with a signature art style (Midjourney style).

Base Models (to which the models are either applied or are extended)

  • SD (Stable Diffusion) 1.4
  • SD 1.5
  • SD 2.0
  • SD 2.1
  • SDXL
  • SD 3

Regarding Midjourney and other AI art generators, they might or might not use Stable Diffusion as the base model. Some AI art generators have stated which base model they use, while others keep that information private, like Midjourney.

Stable Diffusion has some of the best AI art models available because users worldwide are creating them as a hobby.

Checkpoint

checkpoint-model-example
Checkpoint models improve the base model with an additional training set.

Checkpoint models, models, and checkpoint files mean the same thing regarding Stable Diffusion.

You can think of checkpoint models as fine-tuning models (though there are always base models too). While LAION-400M and LAION-5B are kind of the base datasets, the checkpoint models are models that fine-tune the existing base model a bit more and toward a certain aesthetic. The model becomes biased toward generating certain types of images.

Checkpoint model file sizes usually range between 2-7GB and use .ckpt or .safetensors file format.

Custom checkpoint models can be available in different file sizes: fp16 and fp32. These refer to the formats in which the data are stored inside the model, with fp16 being floating point 16-bit (2 bytes) and fp32 being 32-bit (4 bytes).

Unless you’re interested in training or creating your own custom model, the smaller fp16 version, typically around 2 GB, is sufficient. For casual users, the difference in image quality between fp16 and fp32 is usually insignificant.

Why do checkpoint models exist?

The Stable Diffusion (SD) base model (SD v1.5) was trained on a large-scale dataset LAION-2B-EN. While the dataset has roughly 2 billion image-text pairs, it might not have enough anime-related artwork, or it might lack certain stylized images, such as fantasy or cyberpunk.

To make the AI art generator produce, for example, more fantasy-like images, the model has to be trained with images related to that art style using image-text pairs.

Understanding that there is no “one best model” and recognizing that each model has its strengths can greatly enhance your experience. Take the time to experiment with different models using simple prompts and observe the style of images they generate.

Then, choose the model that aligns closest with the requirements of your project.

Like before, the model can’t produce images that it has no knowledge of. Funnily enough, the same applies to digital artists. An artist can’t produce ink punk images if the artist hasn’t seen what those images look like before (the artist can have an idea of the style, but most likely not the exact understanding of it). The same applies to AI.

How are checkpoint models created?

You have two options; either create a model with additional training or with the DreamBooth technique.

Additional training starts by training a base model, such as SD 2.1, with an additional dataset that you are interested in. Your dataset could include images of flowers, marbles, etc., that you would then use to train an AI algorithm. If you are interested, Huggingface has a lot of tutorials on training an AI model, such as Fine-tune a pretrained model.

DreamBooth is a technique that enables the customization of text-to-image models, such as Stable Diffusion, by utilizing only a handful (3-5) of images of a particular subject. With this method, the model can create context-specific images of the subject in various settings, poses, and angles.

You can find a lot of tutorials, how-to guides, and general information about hugging face on their transformers page.

Hypernetworks

Hypernetworks are a method for improving the output of Stable Diffusion models by utilizing additional Hypernetwork files alongside checkpoint models to achieve a specific theme or aesthetic. For instance, using the InCase style Hypernetwork with any checkpoint model can help produce results resembling the art style of the artist InCase.

incase-style-hypernetwork
InCase style (Hypernetwork) helps achieve the art style of the artist InCase.

Compared to alternatives such as LoRAs and Textual Inversion (Embeddings), Hypernetworks vary in underlying training methodology but are effective in refining the quality of the generated images.

While LoRAs are more widely used and produce better results with larger file sizes, Textual Inversion (Embeddings) are preferred for their small file size. Hypernetwork file size ranges from 5 to 300MB in size.

Textual Inversion (Embeddings)

textual-inversion-embeddings-example
Textual inversion for fixing “bad hands” in anime image generation.

Textual inversion (TI) is a technique where a few reference images can be converted into a new “word” that represents those images. This “word” can then be used in prompts to generate images that accurately match the reference images and other semantic concepts.

By using only 3-5 images of a concept, Textual Inversion can capture unique and varied concepts and reduce biases in the training data.

The new embeddings can be linked to new pseudo-words, which can be incorporated into new sentences, enabling the creation of novel scenes and compositions. Textual Inversion can also be used to represent visual art styles, and it works well with downstream models.

Textual inversion (embeddings) files are roughly 10-100KB in size and use *.pt or *.safetensors file extension.

LoRAs

lora-example-model
LoRA model to align the anime image generation with the style of Makima from Chainsaw Man.

LoRAs (Low-Rank Adaptation) are files that modify the output of Stable Diffusion models to align with a particular concept or theme, such as an art style, character, real-life person, or object. Examples of LoRAs include Yae Miko | Realistic Genshin LORA, Makima (Chainsaw Man) LoRA, and blindbox.

While LoRAs can be used with any Stable Diffusion model, using them with the AnyLoRA checkpoint merge model is recommended, which is neutral enough to work well with any LoRA.

The usual file size of LoRA’s is 10-500MB.

How to Use AI Art Models

Most models are used with some kind of WebUI (AUTOMATIC1111, for example), but Midjourney and some other AI art generators only offer you to use specific models within their AI art generator.

To experiment with different models found in Civitai, you must download and install WebUI and a diffusion model (SDXL, Realistic Vision, GhostMix, etc.) on your computer.

Midjourney, Getimg, and other AI art generators offer some models you can experiment with, but you can’t upload your own model to be used inside the AI art generator.

Using AI Art Models And Stable Diffusion

The easiest way to use AI art models is to go to Civitai or Hugging Face Models (and select text-to-image pipeline -the link already has that enabled).

Explore the different models available and download the model that interests you the most. Notice that to protect your computer from malicious code, you should only download safetensors files (on Civitai’s site, you can see that below the download button).

safetensor-mark
Notice the SafeTensor mark below the download button.

When you’ve downloaded the model you like, just copy and paste it to the following folder (the same where you have Stable Diffusion) stable-diffusion-webui\models\Stable-diffusion.

Restart Stable Diffusion, and you will see the model is available for use in the dropdown menu:

stable-diffusion-models-location-user-interface
You will notice in the upper left-hand corner the Stable Diffusion checkpoint, from which you can select different checkpoints to be used in the image generation.

In the stable-diffusion-webui\models\ -folder, you will notice specific folders for each model (textual inversions go to the embeddings folder -found in the stable-diffusion-webui\ -folder ), such as Hypernetwork, LoRA, etc. Each model should be copied to the appropriate folder.

Using AI Art Models In AI Art Generators

AI art generators like Getimg, Lexica, and Midjourney have their own models that you can play around with. You can’t use the downloaded checkpoint models with Midjourney, for example. You can only use models the AI art generator provides.

AI Art GeneratorModels
MidjourneyMidjourney v1, Midjourney v2, … v3, v4, v5, etc. Niji v5, etc., MJ Test.
LexicaLexica Aperture v2 and v3.
GetimgStable Diffusion v2.1, SD v1.5, Realistic Vision, Anime Diffusion, Inkpunk Diffusion, NeverEnding Dream, etc.
Table showcasing the different models inside an AI art generator.

AI art generators make model selection easy and intuitive. Usually, AI art generators have buttons that you just click, and the model is activated for use.

One of the reasons why AI art generators are popular is that they require zero configuration knowledge from you to get started with AI art models.

What Are Safetensors in Stable Diffusion

Safetensors is a repository that implements a new format for storing tensors that is safe and fast. It allows for safe file usage by preventing arbitrary code running and has no limit on file size. It also supports lazy loading and layout control, which is important for loading specific parts in distributed settings.

Safetensors has bfloat16 support, which is becoming increasingly important in the ML (machine learning) world. Compared to other ML formats like Pickle, H5, and Numpy (npz), Safetensors is faster and safer to use. It also prevents DOS attacks and enables faster load times and lazy loading in distributed settings.

Filters

Multiple AI art generators have given users filters instead of models. While they can be the same thing, sometimes the filters are not models but words added to your text prompts. Filters such as oil painting, retro anime, fashion magazine, etc., work the same way as models as they apply a certain look to the image.

However, the results might not be as strong as using a model created specifically for achieving a certain look or theme.

In some AI art generators, the filter works more like added words rather than an actual model with pre-trained material. In these cases, the filter helps you generate certain-looking images as a text prompt when you don’t know what words to write.

The Weaknesses of AI Art Models

The AI art generator and AI models are only as strong as the dataset it has been offered and have been trained with. If an AI art generator model lacks images of dogs, horses, water, fire, 90s anime, etc., it can’t produce images containing those elements.

Also, the reason why there are thousands of models already generated by users worldwide is that the current base models that Stable Diffusion, Midjourney, Lexica, or other AI art generators have are not enough.

When models are trained, they are trained with certain image-text pairs or images, and in most cases, users who train these models do not have the right to use images found online. This creates a dilemma of whether the use is fair use or copyright infringement.

Feature image credits.

Search
artist-profile-picture-avatar

Okuha

Digital Artist

I’m a digital artist who is passionate about anime and manga art. My true artist journey pretty much started with CTRL+Z. When I experienced that and the limitless color choices and the number of tools I could use with art software, I was sold. Drawing digital anime art is the thing that makes me happy among eating cheeseburgers in between veggie meals.

More Posts

Thank You!

Thank you for visiting the page! If you want to build your next creative product business, I suggest you check out Kittl!