okuha-logo-small-white

Textual Inversion In Stable Diffusion – Guide to Fine-Tuning

textul-inversion-in-stable-diffusion-feature-image

If you click on a link and make a purchase, I may receive a small commission. As an Amazon affiliate partner, I may earn from qualifying purchases.
Read our disclosure.

What Is Textual Inversion?

Textual inversion is a technique used in text-to-image models to add new styles or objects without modifying the underlying model. It involves defining a new keyword representing the desired concept and finding the corresponding embedding vector within the language model.

This allows the model to generate images based on the user-provided concept. Remarkably, textual inversion can achieve this with as few as 3-5 sample images. The process enables personalized creation through the composition of natural language sentences using these new “words” in the embedding space of the model. A single-word embedding is often enough to capture diverse and distinct concepts.

Textual inversion (embeddings) files are typically 10-100KB in size and use *.pt or *.safetensors file extension.

Where to Find Textual Inversions?

The best place to find textual inversion is from Civitai. Remember to filter the results only to include textual inversion. Another place to find textual inversions is Hugging Face.

How to Use Textual Inversion?

textual-inversion-embedding-civitai-example
Image showing an example of what textual inversion can look like. Type is a textual inversion, trigger word(s) is easynegative and it’s a Safetensors file.

You can use textual inversions with different diffusion models, especially with Stable Diffusion v2.0 and later versions. The most common way to use textual inversions is with AUTOMATIC1111’s WebUI.

After downloading a textual inversion (embedding) file, place it in the following folder *\stable-diffusion-webui\embeddings if you use AUTOMATIC1111’s Stable Diffusion WebUI.

Check out: How to use Stable Diffusion

Textual inversions work with a keyword or trigger word; this is usually shown to you in the same place where you’ve downloaded the embedding.

Use trigger word(s) (tokens) in your text prompt to activate the textual inversion to be applied to the image-generation process.

selecting-textual-inversion-in-automatic1111
Image showing where you can find the textual inversion from AUTOMATIC1111’s WebUI.

Click the little “image” icon (Show/hide extra networks) under the Generate button to show available textual inversions. When you click the Textual inversion, it will be applied to the correct text prompt. In the example image, the easynegative trigger word was applied to the negative text prompt.

no-easynegative-textual-inversion-applied

Text prompt results without textual inversion applied to the image-generation process. Notice the abnormal-looking three-finger hand.

no-easynegative-textual-inversion-applied-v2

Text prompt results without textual inversion applied to the image-generation process. Notice the three/four finger hand.

textual-inversion-applied-to-the-image-generation

Text prompt results when textual inversion is applied to the image-generation process.

While textual inversion generally works great, the outputs can still be bad, regardless of using textual inversions.

textual-inversion-applied-to-the-image

Text prompt results when textual inversion is applied to the image-generation process. Notice how there are still mistakes in the hand.

textual-inversion-applied-mistakes

Text prompt results when textual inversion is applied to the image-generation process. There are still visible mistakes.

textual-inversion-bad-hands-5-easynegative

Text prompt results when two different textual inversions are applied to the image-generation process.

easynegative, and bad-hands-5 textual inversions were included in the negative text prompt.

Both of the text prompts (normal and negative) were intentionally kept minimal to show the “strengths” of different textual inversions (embeddings). The more specific and detailed your text prompt is, the better chances you have to get the exact results you are after.

Textual inversions help you save time from writing in-depth guides on what you are looking for from the AI. Textual inversions somewhat help in the same way as LoRA’s. Textual inversions give you a shortcut to specific results.

If you are not getting the outputs you are looking for, apply a weight to the trigger word to emphasize the use of textual inversion. For example, (easynegative:1.4) puts 40% more weight on applying the easynegative textual inversion as part of the image-generation process.

How Do Textual Inversions Work?

textual-inversion-how-it-works
Image showing how textual inversion works. Image credits.

Textual inversion works by incorporating new concepts without modifying the model itself. By learning a new token embedding v* from a special token S* in the diagram above, the model can better understand prompts and generate clearer images with less noise.

The optimization process involves training a generator model with noisy images and optimizing the token embedding based on its performance.

Feature image credits.

Search
artist-profile-picture-avatar

Okuha

Digital Artist

I’m a digital artist who is passionate about anime and manga art. My true artist journey pretty much started with CTRL+Z. When I experienced that and the limitless color choices and the number of tools I could use with art software, I was sold. Drawing digital anime art is the thing that makes me happy among eating cheeseburgers in between veggie meals.

More Posts

Thank You!

Thank you for visiting the page! If you want to build your next creative product business, I suggest you check out Kittl!