Tutorial | Weird Wonderful AI Art

Flux LoRA Training Tutorial by Araminta

Tue, 13 Aug 2024 04:01:38 +0000

Video Tutorial by Araminta

Araminta who has been featured before on this blog has just released a tutorial video where she trains a new Flux LoRA using her own computer with RTX3090. She walks through the entire process of taking the AI Toolkit by @ostrisai

The toolkit seems straight forward to use requires you to use your Huggingface Token to download the models and other components. I guess this first step will take time, I have not yet explored it myself but there may be a way to re-use the model if you have it downloaded already which means you won’t have to re-download again. But as I don’t know yet I am guessing this is necessary for the initial first run.

Another observation from this video is that it does not generate the prompts for your dataset of images, so you need to do that using other tools like Kohya SS to generate the prompts for your images.

Once you have all the images and dataset ready you can simply generate the YAML configuration file train_lora_flux_24gb.yaml like the one provided in the config/samples folder of AI Toolkit and can create your own with settings you want. It is nicely commented and therefore easy to follow. Araminta explains and walks through this in the video.

It is great to see Araminta expanding and sharing her knowledge via YouTube.

]]>

How to install ComyUI Manager – Step by Step Video

Sun, 10 Mar 2024 00:52:16 +0000

ComyUI is very powerful when it comes to building custom workflows to generate images, videos and all kinds of created images.
In this video tutorial I show you how you can quickly and easily install ComfyUI Manager in your ComfyUI install. The steps are very easy and hopefully with this video you are able to get started and benefit from the vast ecosystem of ComfyUI extensions/custom nodes.

Installation

Navigate to Custom Nodes folder (ComfyUI/custom_nodes) inside ComfyUI folder.
On Windows type CMD in address bar and hit Enter to open command prompt in this folder. On Linux, you should navigate to above folder.
Type git clone https://github.com/ltdrdata/ComfyUI-Manager.git this will close the repository onto your computer
Restart ComfyUI, which will install ComfyUI Manager and any prerequisites automatically.

Conclusion

Now you have a new button in ComyUI interface “Manager” which you can use to install new custom nodes, install missing nodes which can happen when you download a pre-built workflow. You can keep your ComfyUI updates as well as extensions and do much more via the ComfyUI Manager without having to run command lines and scripts.

]]>

How to Run Stable Cascade Locally

Mon, 19 Feb 2024 02:42:10 +0000

Last week has been incredible with the release of some really cool updates in the AI space, most notably the one getting least attention at the moment is Stable Cascade.

git clone https://huggingface.co/spaces/multimodalart/stable-cascade
cd Stable-Cascade
python -m venv env
env\Scripts\activate
pip install gradio
pip install torch torchvision torchaudio xformers --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

Once the installation is complete review the errors, if you are on Windows you are likely to get ModuleNotFoundError: No module named 'triton', if you get this don’t worry it will still work and run. This is a known limitation as Triton is not compiled for Windows OS. I got this error and my instance still runs.

Edit and replace the app.py with my modified version. This includes changes so that your local instance will run.

#Modified version of the original App.py from https://huggingface.co/spaces/multimodalart/stable-cascade by https://weirdwonderfulai.art team
import os
import random
import gradio as gr
import numpy as np
import PIL.Image
import torch
from typing import List
from diffusers.utils import numpy_to_pil
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline
from diffusers.pipelines.wuerstchen import DEFAULT_STAGE_C_TIMESTEPS
#import spaces
from previewer.modules import Previewer
#import user_history

os.environ['TOKENIZERS_PARALLELISM'] = 'false'

DESCRIPTION = "# Stable Cascade"
DESCRIPTION += "\nUnofficial demo for Stable Cascade, a new high resolution text-to-image model by Stability AI, built on the Würstchen architecture - non-commercial research license"
if not torch.cuda.is_available():
    DESCRIPTION += "\nRunning on CPU 🥶"

MAX_SEED = np.iinfo(np.int32).max
CACHE_EXAMPLES = False
MAX_IMAGE_SIZE = int(os.getenv("MAX_IMAGE_SIZE", "1536"))
USE_TORCH_COMPILE = False
ENABLE_CPU_OFFLOAD = os.getenv("ENABLE_CPU_OFFLOAD") == "1"
PREVIEW_IMAGES = True

dtype = torch.bfloat16
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
if torch.cuda.is_available():
    prior_pipeline = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", torch_dtype=dtype)#.to(device)
    decoder_pipeline = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade",  torch_dtype=dtype)#.to(device) 

    if ENABLE_CPU_OFFLOAD:
        prior_pipeline.enable_model_cpu_offload()
        decoder_pipeline.enable_model_cpu_offload()
    else:
        prior_pipeline.to(device)
        decoder_pipeline.to(device)

    if USE_TORCH_COMPILE:
        prior_pipeline.prior = torch.compile(prior_pipeline.prior, mode="reduce-overhead", fullgraph=True)
        decoder_pipeline.decoder = torch.compile(decoder_pipeline.decoder, mode="max-autotune", fullgraph=True)
    
    if PREVIEW_IMAGES:
        previewer = Previewer()
        previewer_state_dict = torch.load("previewer/previewer_v1_100k.pt", map_location=torch.device('cpu'))["state_dict"]
        previewer.load_state_dict(previewer_state_dict)
        def callback_prior(i, t, latents):
            output = previewer(latents)
            output = numpy_to_pil(output.clamp(0, 1).permute(0, 2, 3, 1).float().cpu().numpy())
            return output
        callback_steps = 1
    else:
        previewer = None
        callback_prior = None
        callback_steps = None
else:
    prior_pipeline = None
    decoder_pipeline = None


def randomize_seed_fn(seed: int, randomize_seed: bool) -> int:
    if randomize_seed:
        seed = random.randint(0, MAX_SEED)
    return seed

#@spaces.GPU
def generate(
    prompt: str,
    negative_prompt: str = "",
    seed: int = 0,
    width: int = 1024,
    height: int = 1024,
    prior_num_inference_steps: int = 30,
    # prior_timesteps: List[float] = None,
    prior_guidance_scale: float = 4.0,
    decoder_num_inference_steps: int = 12,
    # decoder_timesteps: List[float] = None,
    decoder_guidance_scale: float = 0.0,
    num_images_per_prompt: int = 2,
    #profile: gr.OAuthProfile | None = None,
) -> PIL.Image.Image:
    previewer.eval().requires_grad_(False).to(device).to(dtype)
    prior_pipeline.to(device)
    decoder_pipeline.to(device)
    
    generator = torch.Generator().manual_seed(seed)
    #print("prior_num_inference_steps: ", prior_num_inference_steps)
    prior_output = prior_pipeline(
        prompt=prompt,
        height=height,
        width=width,
        num_inference_steps=prior_num_inference_steps,
        timesteps=DEFAULT_STAGE_C_TIMESTEPS,
        negative_prompt=negative_prompt,
        guidance_scale=prior_guidance_scale,
        num_images_per_prompt=num_images_per_prompt,
        generator=generator,
        callback=callback_prior,
        callback_steps=callback_steps
    )

    if PREVIEW_IMAGES:
        for _ in range(len(DEFAULT_STAGE_C_TIMESTEPS)):
            r = next(prior_output)
            if isinstance(r, list):
                yield r[0]
        prior_output = r

    decoder_output = decoder_pipeline(
        image_embeddings=prior_output.image_embeddings,
        prompt=prompt,
        num_inference_steps=decoder_num_inference_steps,
        # timesteps=decoder_timesteps,
        guidance_scale=decoder_guidance_scale,
        negative_prompt=negative_prompt,
        generator=generator,
        output_type="pil",
    ).images

    #Save images
    #for image in decoder_output:
    #    user_history.save_image(
    #        profile=profile,
    #        image=image,
    #        label=prompt,
    #        metadata={
    #            "negative_prompt": negative_prompt,
    #            "seed": seed,
    #            "width": width,
    #            "height": height,
    #            "prior_guidance_scale": prior_guidance_scale,
    #            "decoder_num_inference_steps": decoder_num_inference_steps,
    #            "decoder_guidance_scale": decoder_guidance_scale,
    #            "num_images_per_prompt": num_images_per_prompt,
    #        },
    #    )

    yield decoder_output[0]


examples = [
    "An astronaut riding a green horse",
    "A mecha robot in a favela by Tarsila do Amaral",
    "The sprirt of a Tamagotchi wandering in the city of Los Angeles",
    "A delicious feijoada ramen dish"
]

with gr.Blocks() as demo:
    gr.Markdown(DESCRIPTION)
    gr.DuplicateButton(
        value="Duplicate Space for private use",
        elem_id="duplicate-button",
        visible=os.getenv("SHOW_DUPLICATE_BUTTON") == "1",
    )
    with gr.Group():
        with gr.Row():
            prompt = gr.Text(
                label="Prompt",
                show_label=False,
                max_lines=1,
                placeholder="Enter your prompt",
                container=False,
            )
            run_button = gr.Button("Run", scale=0)
        result = gr.Image(label="Result", show_label=False)
    with gr.Accordion("Advanced options", open=False):
        negative_prompt = gr.Text(
            label="Negative prompt",
            max_lines=1,
            placeholder="Enter a Negative Prompt",
        )

        seed = gr.Slider(
            label="Seed",
            minimum=0,
            maximum=MAX_SEED,
            step=1,
            value=0,
        )
        randomize_seed = gr.Checkbox(label="Randomize seed", value=True)
        with gr.Row():
            width = gr.Slider(
                label="Width",
                minimum=1024,
                maximum=MAX_IMAGE_SIZE,
                step=512,
                value=1024,
            )
            height = gr.Slider(
                label="Height",
                minimum=1024,
                maximum=MAX_IMAGE_SIZE,
                step=512,
                value=1024,
            )
            num_images_per_prompt = gr.Slider(
                label="Number of Images",
                minimum=1,
                maximum=2,
                step=1,
                value=1,
            )
        with gr.Row():
            prior_guidance_scale = gr.Slider(
                label="Prior Guidance Scale",
                minimum=0,
                maximum=20,
                step=0.1,
                value=4.0,
            )
            prior_num_inference_steps = gr.Slider(
                label="Prior Inference Steps",
                minimum=10,
                maximum=30,
                step=1,
                value=20,
            )

            decoder_guidance_scale = gr.Slider(
                label="Decoder Guidance Scale",
                minimum=0,
                maximum=0,
                step=0.1,
                value=0.0,
            )
            decoder_num_inference_steps = gr.Slider(
                label="Decoder Inference Steps",
                minimum=4,
                maximum=12,
                step=1,
                value=10,
            )

    gr.Examples(
        examples=examples,
        inputs=prompt,
        outputs=result,
        fn=generate,
        cache_examples=CACHE_EXAMPLES,
    )

    inputs = [
            prompt,
            negative_prompt,
            seed,
            width,
            height,
            prior_num_inference_steps,
            # prior_timesteps,
            prior_guidance_scale,
            decoder_num_inference_steps,
            # decoder_timesteps,
            decoder_guidance_scale,
            num_images_per_prompt,
    ]
    gr.on(
        triggers=[prompt.submit, negative_prompt.submit, run_button.click],
        fn=randomize_seed_fn,
        inputs=[seed, randomize_seed],
        outputs=seed,
        queue=False,
        api_name=False,
    ).then(
        fn=generate,
        inputs=inputs,
        outputs=result,
        api_name="run",
    )
    
with gr.Blocks(css="style.css") as demo_with_history:
    with gr.Tab("App"):
        demo.render()
    #with gr.Tab("Past generations"):
    #    user_history.render()

if __name__ == "__main__":
    demo_with_history.queue(max_size=20).launch()

After saving the App.py you can launch the application by running the below commands, you can copy & paste them in .BAT file and have it run as a script.

cd Stable-Cascade
env\Scripts\activate
python app.py

The command line will run up the Gradio app with the URL http://127.0.0.1:7860/, open that in your browser and start creating.

If you are curious on how to use this UX, then check out my video where I shared how (showing the online free version but both local and online works the same).

Video showing how to use the Stable Cascade UX

Conclusion

Really enjoying the aesthetics and prompt coherence with Stable Cascade, it feels like to understands better the natural language and produces more well composed images as a result. Have to say its pretty good at handling text as you can see below. I’d love to experiment more as this get further implemented in Automatic1111 and ComfyUI in the coming weeks.

A few samples from this implementation which I have running on my PC.

]]>

Midjourney Hallucination and more by MANΞKI NΞKO

Sun, 04 Feb 2024 03:47:55 +0000

I came across this innovative technique where @manekinekoAIArt (Neko) describes the process of how he came across a prompt that let’s you take Midjourney on a Hallucinogenic trip to create some unique images created using simple prompt that Neko shares and using abstract images as a starting point.

Here are some of the results from my experimentation using this method.

The Process (by Neko)

As described by Neko in his post on X, I will walk you through the process in more details and what tweaks I have added of my own to this method.

In Midjourney, first start by creating an abstract image of your choice. I created what you see below using prompt: “abstract map of Fukushima, dueotone velvet grey and cream“. I am using the Midjourney website to create as I have now got access to Alpha site but if you don’t have it you can do this in Midjourney Discord using the same process.

Now select the abstract one you like and grab its URL from Discord, here is how:

Run a new prompt “and more and more and more” + URL of your abstract image you created. This forces Midjourney to do some Hallucination to try to generate something out of the image you supplied plus the “add more” text.

In my usage of this process, I also tweak the –stylize parameter which further helps with hallucination. You can try with or without but know this that there is always a default value of 200 applied for every generation, so I’m exaggerating it with higher values.

Here is the result of my initial run.

Now you create Variations “Vary Strong” or “Vary subtle” of the images you like. Below are results from 3 and 4 position image above.

This variation is “Vary Subtle” which create subtle variations with original prompt + image and the selected result.

Here is an example of “Vary Strong” which introduces lot more new details and face is much bigger in the resulting image.

You don’t need to stop here you can further take the prompt “and more and more and more” + URL of abstract + URL of one of other generations. Get creative at this stage because Midjourney is adding more details and elements to your resulting image based on the supplied reference files.

Here are some tips that Neko quotes in original post (credit to Neko):

Use with an image url
Best used with version 6
Best used with an image that is somewhat maximalist and/or abstract. Even better if it’s an image that was not created in MJ because that seems to introduce even more variety.
Remixes and Vary (Strong) can “lock in” the subject of a first gen result and amplify it. This means to get lots of different hallucinations, start “fresh” with a new run of the same prompt.
Experiment with the number of “and more”s: 3 is a good start but can get even more crazy with higher numbers!
You can replace the “more”s with specific things, but leaving it ambiguous gets more varied results

I have more example just below that did like this. You can also change the amount of time “and more” is used in your text prompt, you don’t have to have 3 repeats of it, you can have more.

Below you see in the image I changed the prompt and added weights to it as well using ::2 as the strength and you start to get some intriguing result.

Vary Strong of the #4 image above (see the large version above at the start of the post)

Another Example

Generated some abstract spoke and then used exaggerated “add more” to go crazy with Midjourney.

Created variations of #1 and #3 image above and I started to get these super awesome results.

YouTube Tutorial

Following the interest in this post, I created a YT tutorial around this method so you can check it out and follow along as you create your own images.

Example with multiple URLs

Here I mixed the two Abstract images created for this post and let Midjourney go to town.

Got some pretty awesome results once I ran Variation Strong on the #1 and #3 image.

]]>

Getting started with PhotoMaker – Stacked ID Embedding

Sun, 21 Jan 2024 20:30:39 +0000

PhotoMaker is one of the new AI tools that many people say is going to replace LoRA training and IPAdapter when it comes to re-creating a face consistently. It does not require any training like a LoRA and can re-generate the reference face with ease, so they say!!

In this post I am exploring how this PhotoMaker works and sharing samples and my findings of its testing. You can access PhotoMaker Repo and follow the instructions to spin up your own instance. However I had all sorts of issues when installing this on my Windows 11 PC. I later discovered through its Issues list that another Fork of this Repo was created for Windows users, which when followed works fine first time around.

Installation

Installing this on your PC is quite straight forward as long as you are using this repo. There are few main things to install which are listed in Installation section of the repo.

Install Python
Install Git
Install Visual Studio Re-distributable
Run the commands listed (grab a tea or coffee while this happens)

To run simply execute the GUI.bat file, which will download the models at first execution (where tea or coffee helps while you wait). Subsequent startup executions are faster!!

Click on the local URL: http://127.0.0.1:7860 (ctrl+click) to launch the Gradio User Interface.

Running PhotoMaker

In order to start using it you need to upload a few images of the sample face, ideally the image uploaded should be mostly of the face. In my testing I used 4 cropped images of a face (AI Generated in Midjourney) as reference images.

Once uploaded enter the prompt, make sure you use the trigger/class word “img” in your prompt. eg. man img or woman img etc. Then you can see any pre-defined style templates that are provided in the Gradio app, they simply enhance your prompt that’s all to stylize it.

Generation takes sometime based on your GPU, in my case it takes about 30 seconds to run on my RTX4080 16GB PC.

Advanced options let you customise a few of the generation parameters. I left most of them to default except size which I change to generate portrait images.

Once your generations are done you need to click on the download icon to save them because there is no output directory where they are saved by default. Otherwise you will lose the past image if you run again.

You can also checkout my full walk-through video on this below.

Assessment of the Results

Now the keen eye comes in to compare the original reference vs the generated image. What do you see and think?

Having compared several examples my findings are that the generations are consistent with the interpretation of the face, however they are not exactly the same face being generated. It has some resemblance to the original reference face but its not exact copy.

I tried many different faces to get to this conclusion even real human faces and mine. The results did not regenerate the same face, I find that having a trained LoRA is still a better solution because there is a lot more information stored in a LoRA and therefore its more close to original training images.

This no training approach might be a good concept but still needs work before it can produce canny faces that are a clone of the reference. However, as with anything new the first cut is never the last so hopefully future versions of this way to regenerating a face will improve the results.

]]>

Where’s Waldo Illustration by Framer

Sat, 13 Jan 2024 04:22:19 +0000

I came across this very talented artist @0xFramer who has made some really amazing illustrations and animations I have reviewed the artist’s X feed. Framer created their own version of Where’s Waldo style illustrations and then has shared their process on how it was all done.

It is so cool that I had to reach out and ask for approval to post the tutorial on the blog and some of work.

Framer’s version of Waldo looks something like this and the tutorial on making of these illustration is sourced from this X post.

Tools used

Framer used various tools to create these illustrations.

Can you spot Waldo in the images shared on this blog post?? Its not easy but Waldo is there!

Framer’s Process

First a simple prompt is used in Midjourney to create multiple wide aspect images

The resulting images look something like this.

Framer explains that for Midjourney to create the image in the style of Where’s Waldo, you need to provide at least 5 reference images of actual Where’s Waldo illustrations from the web. The reference images also need to be similar to what you want to create, as explained: if you want a spaceship but supply farm images as reference – you will end up with farm like images.

So one of the prompts used and shared is:

/imagine prompt: https://s.mj.run/GXMZK-s9ZWk https://s.mj.run/7p2EHmWNFW0 https://s.mj.run/VeyBGZHo7uE https://s.mj.run/bN19fHxQOTY https://s.mj.run/MsU8xsH7gTk https://s.mj.run/e0yihGydIiM https://s.mj.run/4V6uWA2EJq4 Where is Waldo in LOCATION –ar 16:9

You substitute LOCATION with desired location.

Framer then employees ChatGPT to further produce some creative ideas to inspire his creations.

Next step is to generate some base images using these ideas with Midjourney by construction and using the above prompt variations.

To enhance the Midjourney images Framer uses Magnific AI (I have done several posts on this great tool, you can find here).

Using Magnific preset, Art and Illustration with the same prompt from Midjourney, the image is enhanced using 2x Upscale – Creativity:3-5, HDR: 1 and Resemblance: -1 (these are Magnific settings – in case you are wondering).

To enhance the details another 2x Upscale which will add more details to your image, this time the settings are: Creativity: 0, HDR: 1 and Resemblence: -1

Next step was to clean up each image using Photoshop to ensure the image fits well and any artifacts are removed.

The final step is of course to add Framer’s version of Waldo in the secret spot of the image and blend it in, this is all done in Photoshop in the end.

You end up with some wonderful creations like these. Make sure you check out Framer’s socials and follow.

]]>

Generative AI for Krita

Sun, 26 Nov 2023 05:03:57 +0000

Oddly enough seeing a lot of live LCM popping up everywhere especially Krea.Ai which has implemented it already got me further curious and I somehow discovered Krita as a result. Krita is an standalone open source image editing software, so its free and feature rich as I’m exploring, it also has a plug-in called Krita AI Diffusion.

This plug-in is awesome as its super easy to get it working. I was able to set this up with my existing ComfyUI installation and use it to generate images.

Setup

Download and install Krita from krita.org

Next grab the Kirta AI Diffusion and download the zip file onto your computer.

Extract it onto your Windows (c:\Users\%username%\AppData\Roaming\krita\pykrita) or Linux (~/.local/share/krita/pykrita)

Once extracted, enable the plug-in by launching Krita and enabling the AI Image Diffusion plugin via Configure Krita > Python Plugin Manager. If you cannot access the Configure Krita menu then its likely because you don’t have a new document create, just create a New Document and you can access it.

Next you should enable the Docker (Settings > Dockers > AI Image Generation) which will open up the AI Image Diffusion panel where you can now configure the settings.

If you are not running ComfyUI already (ie. have it installed) you can run up an instance using this plug-in, in my case I have that already installed and configured, I simply fire up the ComfyUI instance and then point to the URL that its running on.

Click Connect to connect to your existing server. If the connection does not establish you need to make sure you have some Custom Nodes installed in the ComfyUI instance you are running. You can check this link for the details.

Once connected you can start to generate images inside Krita.

Image Generation

There are three options for image generation: Generate (text to image), Upscale (upscale existing image) and Live (real-time generation)

The key focus and reason for exploring Krita is to use Live where images are produced as you draw in real-time by linking your drawing layer to the image prompt specified in the plug-in. Be sure to watch the video demonstrating this capability.

You can see below the drawing on the Paint Layer 1 is linked to the Scribble mode in the plug-in which is using the drawing and the prompt to compose and create the resulting image.

This is a really cool way to create different drawing and bring them to life, I’m sure if you give this to your kids, nieces and nephews they will have a blast bringing their drawings to life. The fun aside, this is paving the way how fast AI image generation technology is evolving that we can take a rudimentary drawing and covert it into a more refined vision.

You can further explore the various options that are demonstrated by the author in the documentation page of this plug-in.

]]>

Training my first SD LoRA

Sun, 08 Oct 2023 07:17:25 +0000

Inspired by a post by @rainisto where he shared how he created an AI character (women) using Midjourney and then trained a Stable Diffusion (SD) v1.5 Low Rank Adaptation (LoRA) so that you can create consistent character of the women in different settings.

This re-kindled the idea where I’ve been meaning to create fictional character that will play the main character in stories that I will create with my daughter and eventually converted to comics using SD created images. The challenge has always been how to get a consistent character in SD.

So I followed the post of @rainisto to start creating my character in Midjourney. She a 5 year old girl so I created many images in Midjourney. Here are a handful of images but I ended up with total of 30 images of which I chose 14 images to train my LoRA.

Next I searched the web for a video Tutorial which would help me decipher the world of Kohya SS Lora training WebUI.

I found this very well explained and detailed video. It covers all aspects of installation, setup and how to get going with Kohya.

The settings explained in this video were really handy for me and I opted to use the recommended settings. I share my settings below, you can download and save them as .JSON file. After this you can load it in Kohya.

{
  "LoRA_type": "Standard",
  "adaptive_noise_scale": 0,
  "additional_parameters": "",
  "block_alphas": "",
  "block_dims": "",
  "block_lr_zero_threshold": "",
  "bucket_no_upscale": true,
  "bucket_reso_steps": 64,
  "cache_latents": true,
  "cache_latents_to_disk": false,
  "caption_dropout_every_n_epochs": 0.0,
  "caption_dropout_rate": 0,
  "caption_extension": "",
  "clip_skip": "1",
  "color_aug": false,
  "conv_alpha": 1,
  "conv_block_alphas": "",
  "conv_block_dims": "",
  "conv_dim": 1,
  "decompose_both": false,
  "dim_from_weights": false,
  "down_lr_weight": "",
  "enable_bucket": true,
  "epoch": 12,
  "factor": -1,
  "flip_aug": false,
  "full_bf16": false,
  "full_fp16": false,
  "gradient_accumulation_steps": "1",
  "gradient_checkpointing": false,
  "keep_tokens": "0",
  "learning_rate": 0.0001,
  "logging_dir": "C:/AI/Training/Girl-lora/log",
  "lora_network_weights": "",
  "lr_scheduler": "cosine",
  "lr_scheduler_args": "",
  "lr_scheduler_num_cycles": "",
  "lr_scheduler_power": "",
  "lr_warmup": "10",
  "max_bucket_reso": 2048,
  "max_data_loader_n_workers": "0",
  "max_resolution": "512,512",
  "max_timestep": 1000,
  "max_token_length": "75",
  "max_train_epochs": "",
  "max_train_steps": "",
  "mem_eff_attn": false,
  "mid_lr_weight": "",
  "min_bucket_reso": 256,
  "min_snr_gamma": 0,
  "min_timestep": 0,
  "mixed_precision": "fp16",
  "model_list": "custom",
  "module_dropout": 0,
  "multires_noise_discount": 0,
  "multires_noise_iterations": 0,
  "network_alpha": 1,
  "network_dim": 128,
  "network_dropout": 0,
  "no_token_padding": false,
  "noise_offset": 0,
  "noise_offset_type": "Original",
  "num_cpu_threads_per_process": 2,
  "optimizer": "AdamW8bit",
  "optimizer_args": "",
  "output_dir": "C:/AI/Training/Girl-lora/model",
  "output_name": "wwaa-g13l-1",
  "persistent_data_loader_workers": false,
  "pretrained_model_name_or_path": "\"C:\\AI\\models\\Stable-diffusion\\v1-5-pruned-emaonly.safetensors\"",
  "prior_loss_weight": 1.0,
  "random_crop": false,
  "rank_dropout": 0,
  "reg_data_dir": "C:/AI/Training/Girl-lora/reg",
  "resume": "",
  "sample_every_n_epochs": 0,
  "sample_every_n_steps": 0,
  "sample_prompts": "",
  "sample_sampler": "euler_a",
  "save_every_n_epochs": 1,
  "save_every_n_steps": 0,
  "save_last_n_steps": 0,
  "save_last_n_steps_state": 0,
  "save_model_as": "safetensors",
  "save_precision": "fp16",
  "save_state": false,
  "scale_v_pred_loss_like_noise_pred": false,
  "scale_weight_norms": 0,
  "sdxl": false,
  "sdxl_cache_text_encoder_outputs": false,
  "sdxl_no_half_vae": true,
  "seed": "",
  "shuffle_caption": false,
  "stop_text_encoder_training": 0,
  "text_encoder_lr": 5e-05,
  "train_batch_size": 1,
  "train_data_dir": "C:/AI/Training/Girl-lora/img",
  "train_on_input": true,
  "training_comment": "",
  "unet_lr": 0.0001,
  "unit": 1,
  "up_lr_weight": "",
  "use_cp": false,
  "use_wandb": false,
  "v2": false,
  "v_parameterization": false,
  "v_pred_like_loss": 0,
  "vae_batch_size": 0,
  "wandb_api_key": "",
  "weighted_captions": false,
  "xformers": "xformers"
}

Using these settings I setup my Kohya and clicked on Start training.

Memory usage on my RTX4080 was quite low based on these settings and using a 512x512px dataset. It was hovering around 40% usage when I checked the Task Manager which leads me to believe around 6-7Gb of my 16Gb total total GPU memory.

This leads to be feel confident that I can push LoRA training for SDXL as well if I want to explore this but that will be future post.

Once the LoRA training is finished you need to move the model files generated by Kohya (.safetensors) into the Stable Diffusion Web UI directory under models\Lora folder. If you want to learn about how I organise my models then check out my tips and tricks on SD

Once you have copied these files then its worth plotting an XYZ plot to see what difference the different LoRA make to your images. As I had 12 different version I decided to try only the EVEN versions and also compare against various CFG scale values.

Defining in the Prompt S/R in my X axis: chk,2,4,6,8,10,12

Defining the CFG Scale in my Y axis: 7,8,9,10

The prompt is: anime, comic, cartoon, portrait of a girl chk:1>, illustration, dark hair

The CHK variable is what the Prompt S/R which is search and replace will look for and replace with values 2, 4, 6 and so on.

The resulting grid looks something like this. The first row is where the value is NULL so its the standard model output at various different CFG scales. The second row and beyond, you see the character come to life and resembles close to the girl that I had generated using Midjourney. If I do more experimentation and fine-tuning of the LoRA it probable could get close.

However the goal of this was to create a LoRA that allows me to get consistent character each seed after another. Below grid is with a locked seed but you will see sample further below that are with varying seed.

Reviewing the images above you can decide which version of the LoRA you wish to use in your generations. In my opinion version #8 looks pretty nice to me, so I will keep this and archive the other versions.

To verify that I am getting the same character with each image, I run the above prompt as a batch with my chosen LoRA #8

Conclusion

Finally to summarise, I’m happy with my very first experiment that achieved two results for me: one – I got to learn and discover how to train my own LoRA using a set of images; and two – I got to create a fictitious character that will help me develop further projects that I had been meaning to do.

Thanks to @rainisto for sharing his wonderful tutorial on X. If you are on X (aka Twitter) then go give him a follow and check out his content. There is such a wonderful community there which shares its learning in the AI space, so we all learn & grow together.

Questions or comments welcome!

]]>

Create Infinite Kids Colouring Pages with Midjourney AI

Wed, 04 Oct 2023 11:06:57 +0000

Something my daughter used to love doing when she was younger but still today enjoys a bit of colouring in by giving herself challenges and limiting the number of distinct colours she would use. I saw a post by @chaseleantj on X where the creator used Midjourney and Chat-GPT4 to create such images, which challenged me to see if I could create such colouring pages purely using Midjourney alone.

Surely with the right prompt combo Midjourney is capable of creating black and white illustration without colours, I wondered.

In case you were wondering: how to create colouring sheets with Midjourney? You are on the right post.

Prompt Construct

The construction of the prompt needs to enforce a few key concepts so that the images produced comply with our needs:

Black and white
White background
Comical and Cartoon
Kids colouring sheet/page

Prompt therefore needs to specify what’s required starting with the [main subject], cartoon, kids coloring page, black and white, white background

Starting with a cute cottage with flowers in the garden, cartoon, kids coloring page, black and white, white background we can a half decent result which makes it for an easier start.

a cute cottage with flowers in the garden, cartoon, kids coloring page, black and white, white background

Noticeably we can see there are bit more greys in the image and shadows which are okay but typically we want them to be more cleaner image with cleaner lines. This would make for more surface area for the child to colour and let their imagination go wild.

Adding some –no arguments to the prompt will allow us to further finetune and control the images created. So let’s update the prompt to be: a cute cottage with flowers in the garden, cartoon, kids coloring page, black and white, white background –no shading, gradient, colors, saturation, colored, shadow

a cute cottage with flowers in the garden, cartoon, kids coloring page, black and white, white background –no shading, gradient, colors, saturation, colored, shadow

Now the resulting image is free from any grey areas, shading and any hints of colour. NOTE: in the prompt I use the US spelling of colour but it should achieve the same result if you don’t.

Finally now as I experiment more with different images, I noticed that some results still had hints of colour and some had a 3D feel to the image. Refining further the prompt with some weights and a few additional words the final prompt looks like: [SUBJECT] illustration, cartoon, outlines, kid’s coloring page, black and white:1.5, white png background, flat 2d –no shading, gradient, colors:1.5, saturation:1.2, colored, shadow:1.1, 3d

Adapt further and try with following variations. You can also add more details in the image by using the –stylize argument which you can discover more by reviewing this dedicated post.

a drawing of [SUBJECT] illustration, cartoon, outlines, kid’s coloring page, black and white:1.5, white png background, flat 2d –no shading, gradient, colors:1.5, saturation:1.2, colored, shadow:1.1, 3d

Results

Experimenting with many different kind of images that would interest Kids, I found the prompt to work very well with various different subjects. If you do have access, you might want to let your child have a go at creating their own colouring pages which can be a great fun activity for them. Let them come of their own preferred subject and then paste the remainder of the prompt.

In the meantime take inspiration from the images below, download and print them for your children if you like.

a drawing of spiderman in the city, illustration, cartoon, outlines, kid’s coloring page, black and white:1.5, white png background, flat 2d –no shading, gradient, colors:1.5, saturation:1.2, colored, shadow:1.1, 3d –ar 3:2

a simple drawing of a cute mermaid in the underwater ocean, illustration, cartoon, outlines, kid’s coloring page, black and white:1.5, white png background, flat 2d –no shading, gradient, colors:1.5, saturation:1.2, colored, shadow:1.1, 3d –ar 3:2

a simple drawing of the marvel avengers, illustration, cartoon, outlines, kid’s coloring page, black and white:1.5, white png background, flat 2d –no repeating character, shading, gradient, colors:1.5, saturation:1.2, colored, shadow:1.1, 3d –ar 5:4

a drawing of ironman in the city, illustration, cartoon, outlines, kid’s coloring page, black and white:1.5, white png background, flat 2d –no shading, gradient, colors:1.5, saturation:1.2, colored, shadow:1.1, 3d –ar 3:2

a bouquet of flowers, illustration, cartoon, bold outlines, black and white:1.5, white png background –stylize 250 –no shading, gradient, colors:1.5, saturation, colored, shadows –ar 3:2

a flower bed with butterfly, cartoon, bold lines, black and white:1.5, white background –stylize 250 –no shading, gradient, colors, saturation, colored, shadows

a drawing of a cute princess in front of a castle, illustration, cartoon, outlines, kid’s coloring page, black and white:1.5, white png background, flat 2d –no shading, gradient, colors:1.5, saturation:1.2, colored, shadow:1.1, 3d –ar 2:3

]]>

Spiral Optical Illusion Image Generation – Stable Diffusion

Sun, 17 Sep 2023 09:03:08 +0000

Trending right now there is a who flurry of images being generating using spiral optical illusion that is embedded inside the image generated using stable diffusion. The result looks quite intriguing and this is not just limited to spirals you can use any other kind of shapes as you will discover in this post.

What do you need?

You need to be running your own local or hosted Stable Diffusion instance. For local instance you can follow these posts or SDXL and for hosted instances follow this post

You also need to be running a ControlNet extension inside of Stable Diffusion webui which is quite easy follow this post if you are new to it. Next, download this ControlNet model called “Controlnet QR Code Monster v2 for SD 1.5“. Place the .safetensor file in the “models/ControlNet” folder.

The original model was designed to help create cool look QR codes using this model. But this technique I’ve covered in the past using another ControNet model, which you can explore here.

However let’s continue with this spiral optical illusion and how we will create the image.

Image Generation

The process is actually quite straight forward and we will following the steps using Automatic1111 (v 1.6.0) which is the WebUI interface for Stable Diffusion.

Follow these steps to setup and generate your own images:

Enter your positive and negative prompt. For example: a medieval village scene with busy streets and castle in the distance
Enable Hirex.fix so that your image is upscaled which will bring to life the illusion and produce a higher resolution image. I used steps:30 and denoise strength:0.7 with upscale by 2
Adjust the CFG Scale depending upon your taste, I suggest between 8 and 10.5. Experiment with different values.
Open up the ControNet panel (you will see this if you have ControlNet extension installed, read this to learn how) and make sure you check the Enable button. This is a rookie mistake that many people make…they setup all the settings but then forget to enable it and wonder why it didn’t work.
Load the image in Single Image section or draw a spiral if you can. I found a spiral vector image (PNG) and use that. You can find many images online just google them.
Next select the Model “control_v1p_sd15_qrcode_monster” from the dropdown. You should have this download if you read all the instructions above.
Control Weight should be set between 0.5 to 0.75 as you don’t want the vector image to overpower the result. You want the affect to be subtle which draws the viewer in.
Click Generate to generate you image. And that’s all there is to it!

Now that you have the basis of this and its working well, you can experiment with various other vector PNG images that you can feed to ControlNet and enjoy the results it produces.

You can create images in different aspect ratios and using explore their results.

Why only just stick to landscapes, you can explore portraits and other abstract creations. Let your imagination run wild and get creative with these images.

Conclusion

Yet again the world of AI keeps surprising and intriguing us where the usage of tools developed are endless. Something that was designed originally for QR code generation can now be used for some more amazing and intriguing images as well.

Share your images and tag me @harmeetgabha on 𝕏, as I’d love to see what you create. You can also find me on other socials with the same handle. If you liked this tutorial I suggest you share it with your friends and spread the word.

]]>