General | Weird Wonderful AI Art

Flux – A New Open Source Model to Compete with Midjourney

Harmeet — Sun, 04 Aug 2024 11:27:41 +0000

August 1, 2024 Black Forest Labs released three new models Flux.1 – Pro, Dev and Schnell. The Pro version is not open source and is available through their API but DEV and Schnell are both open source and available to download via Huggingface page.

Dev is a higher quality model than Schnell, but Schnell is much faster (4 steps). These are big models though both of them weight a whopping 23.8GB each and they require high level of VRAM to run. It is recommended that you have 32GB RAM.

However, don’t be sad because there is a way to run them on lower VRAM GPUs. I have RTX4080 with 16GB and I can run both Dev and Schnell only difference is that Dev takes about 3 minutes to generate an image 1024px by 1536px while Schnell takes only 30-40 seconds to generate the same.

The buzz at the moment is that these models are at par with Midjourney and in my testing I have to agree that they are much better. It is better at many aspects actually:

Resolution – the model is able to handle any image size you want from extremely wide to extremely tall, there is no set resolution that you have to adhere to
Prompts – it is much better at handling the prompt and adhering to the various nuances of the prompt
Quality – quality is much better and higher in this initial release. Hands are better formed, composition is almost spot on always and facial features are well defined
Text – renders text better than any model out there even SD3

Most importantly it doesn’t apply its own recipe or sauce to make your image better, so it stays close to your prompt as much as possible. Whereas, with Midjourney there is always the influence that their model tries to add in the image to make it better which can often make it hard to control the image with just a text prompt.

Download

In order to run this, you need ComfyUI (update to the latest version) and then download these files.

Model: Flux1-Schnell or Flux1-Dev (you need to agree to the terms). Files are 23.8GB each
VAE: AE (its own VAE). File is 335Mb
Clip1: T5xxl_fp8_e4m3fn (for under 32GB VRAM) or T5xxl_fp16 (32GB or above VRAM)
Clip2: Clip_l

Place the Model in the models\unet folder, VAE in models\VAE and Clip in models\clip folder of ComfyUI directories. Make sure you restart ComfyUI and Refresh your browser.

The default workflows are provided by ComfyAnonymous on their github page.

My adapted workflows are available as well for download. I provide two workflows Text 2 Image and Image 2 Image, just drag the PNG files in the zip into ComfyUI. Install any missing nodes using ComfyUI Manager.

Flux.1 Txt2Img and Img2Img Workflows (745 downloads )

Preview of Text to Image workflow in ComfyUI (download the zip above)

Preview of Image to Image workflow in ComfyUI (download the zip above)

My Image to Image workflow utilises Florence 2 LLM and Clip Interrogator (got the original version online from somewhere I can’t recall) to generate an accompanying prompt to help guide Flux. So you have Image that is influencing the generation plus the text prompt that makes the result super!!

Sample Results

It’s been a wonderful breath of fresh air to get a model that can produce such high quality coherent results which has kick off the month of August with a bang. In wonder what other excitement is awaiting us next. For me I keep exploring Flux and had already downscaled my Midjourney subscription but is it time to ditch Midjourney, we will see.

Rouge Noir SDXL LoRA

Harmeet — Thu, 27 Jun 2024 10:44:27 +0000

Inspired by the latest post about some really cool LoRAs by Araminta, I got back into training my own LoRA which I had not completed since some time.

The aesthetics of this is based on black and red tones, creating high contrast images with silhouette and reflections of the main subjects. Hence the name Rouge Noir which is french for Red Black.

I had collected my dataset long time ago and was using 50+ images in the original style training but my poor RTX4080 could not handle such size and then I explored Runpod to train my images which was good but I didn’t train this model instead another model which I trained on Octane Render token resulting in high quality 3d render images.

However seeing Araminta’s work it re-ignited my passion to train specific style in SDXL. I updated my Kohya_ss and resumed training my LoRA model. In the final training which took several attempts and lots of evaluation testing I only used 26 diverse images. The final run I used some tips shared by Araminta in this post to further refine my training settings. The result was a much cleaner trainer style Rouge Noir for SDXL.

Get “Rouge Noir”

I am also considering training this for SD 1.5 model but let’s see how you creative people like my LoRA first.

Here are a few more sample results from this trained LoRA model. You can download it here.

These sample were created using Base SDXL and other fine-tuned SDXL models that I use in my workflow.

Get “Rouge Noir”

This is personally one of my favourite LoRAs I’ve produced so far and I hope you will enjoy creating some fun images like I have. Don’t hesitate to leave your questions or comments below.

Stable Diffusion 3 in ComfyUI

Harmeet — Fri, 19 Apr 2024 04:02:20 +0000

Stable Diffusion 3 (SD3) is close to being release I can sense as Stability.AI have just release its APIs that you can access. These are free APIs but you need credits to generate images. You can access SD3 and SD3 Turbo models and they use 6.5 and 4 credits respectively.

And of course upon its release my favourite ComfyUI creator Zho got to work and produced on the same days ComfyUI Nodes that will let you access the APIs. In this post we will explore these further. Access the English translation if you have trouble understanding the instructions.

Use ComfyUI Manager to install these custom nodes by using the option Install from GIT and provide this URL for installation. Once installed restart ComfyUI using ComfyUI Manager and refersh the ComfyUI page.

The workflow is available also but you can also create a new Two Node workflow: Stable Diffusion 3 API node and Save Image. If you do use the provided workflow make sure that you notice Zho uses Preview node, which means generated image is not stored/saved in the Output folder. It caught me off-guard and I didn’t have some of the initial images that were generated because they weren’t saved in my Output folder.

The Node features:

positive: positive prompt text
negative: negative prompt text (not supported by Turbo model)
aspect_ratio: Aspect ratio of the images, 9 types in total: “21:9”, “16:9”, “5:4”, “3:2”, “1:1”, “2:3”, “4:5” , “9:16”, “9:21” (not applicable to Tu Sheng Tu)
mode: text to image or image to image
model: SD3 or SD3 Turbo
seed: seed
image: Optional, only used for drawing pictures
strength: optional, only used for graph generation

Before you can start using it you need to get the API Key from Stability.AI Dev site and navigate to API Keys to create a new one or copy an existing one. Here you can top up credits if you need via the Billing page, for $10 you will get 1000 credit which lets you generate 153 images in SD3 and 250 in SD3 Tubo.

Once you have the API Key, edit the Config.JSON file of the custom node which is found under “custom_nodes/ComfyUI-StableDiffusion3-API/” folder. Edit and enter the API Key you copied into the “STABILITY_KEY” tag as shown below.

Save and close the file.

Now when you run the Workflow you should get images coming from the API into the ComfyUI preview/save node. The custom node also supports Image2Image which is used to generate the resulting image along with your prompt.

Here in this example you can see the image2image generation, where the colours are somewhat influenced by the input image.

Overall I see big improvements in text handling but in terms of quality of the image it is quite similar to SDXL. SD3 is also more coherent when it comes to understanding counts, so when you specify two cats you will get 2x cats in your image. I ran out of credits before I could experiment further so I will wait for the full release of the model which could be run locally.

For the time being if you are curious and cannot wait then its best to use ComfyUI Workflow for the SD3 API to generate images and experiment.

Top 5 RunPod Templates for AI Image Generation

Harmeet — Wed, 17 Apr 2024 16:51:59 +0000

3 weeks away from my desktop has forced me to explore RunPod.io environment more and more. This lead to getting comfortable and testing the various builds of the Templates that are available in Runpod.io. So here I will share my opinion on my Top 5 Runpod Templates that I have found to be feature rich and versatile.

What is a RunPod Template?

A template is a pre-built script which when deployed on a RunPod GPU will execute all the code to build the environment, generally a version of Linux and will run the script to do the necessary pre-requisite installations for AI Image Generation software like ComfyUI and Stable Diffusion. So its pretty much select and go!!!

Access Templates in RunPod

Once logged into your RunPod account, navigate to the Explore menu on the left hand side and in the Search box type in the names of the templates below. I have also provided the name of the template creator so make sure you verify that as well before selecting a template.

RunPod Portal where you will find these Templates

All of the community templates also come with standard Stable Diffusion models that are already pre-downloaded so you can start creating right away. You can expect to see Stable Diffusion 1.5, Base XL and Refiner XL included in these. As you will discover below they are more feature rich instead of standard Official RunPod templates which just have the standard application and no extra bells and whistles. I don’t recommend you use standard ones.

All the templates provide a Readme file so make sure you review if and can find this marked by the |?| icon.

1. A1111 Stable Diffusion

Number one for me this template provided by ashleykza and is feature packed to make your experience of using Automatic1111 (a Stable Diffusion Web UI) easy. This template is able to run any Stable Diffusion Model and provide plenty of controls for you to manage your environment.

At the time of writing this blog post the version was 1.9.0 (the template name is often ending with version x.x.x)

A1111 Stable Diffusion Template

It provides:

A1111 Stable Diffusion Web UI – main user interface for running Stable Diffusion and generating images. It if feature packed with lots of extensions already that will make it easier to try them without having to install them. It also has a Civitai Extension that makes it easy to install community models and lora etc.

Automatic1111 Web UI version 1.9.0

Application Manager – Let’s you restart different components of the environment. Use it in the case where you get CUDA error or something gets stuck. Restarting the WebUI should clear any errors and get you back in a few seconds.

Application Manager interface

Jupyter Lab – File Explorer, Command Line terminal and run IPYNB Notebooks. This comes in very handy if you need to upload a specific model or file.

Jupyter Lab with File Explorer, Command Line Terminal, and Notebook support

RunPod File Uploader – a client software provided by RunPod team to allow you to upload large files easily.

RunPod file uploader

There are many more details and the provider keeps updating the template so refer to its readme.

2. ComfyUI – Jupyter

At number two, this template is provided by ghcr.io and will always contain the latest version of ComfyUI. ComfyUI is a wonderful interface that lets you build custom workflows and easily access/share these workflows with the community. It is backed by a huge community of users who are always innovating and building new extensions for it.

ComfyUI Jupyter Template

It provides:

ComfyUI – main comfy interface with ComfyUI Manager already installed. This UI lets you design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart based interface. You can use ComfyUI Manager to also download models and install extensions.

ComfyUI interface

JupyterLab – File Explorer, Command Line terminal and run IPYNB Notebooks. This comes in very handy if you need to upload a specific model or file.

Jupyter Lab with File Explorer, Command Line Terminal, and Notebook support

Service Manager – A service manager application that let’s you stop and start ComfyUI. It also lets you monitor ComfyUI Logs

Make sure you review the readme file provided in the the Template represented by the |?| icon.

3. Stable Diffusion Kohya_ss ComfyUI Ultimate

At number three, this template is provided by ashleykza and combines in a single instance three very powerful tools. It has Automatic1111 (Stable Diffusion Web UI), ComfyUI and Kohya_ss (used to train models, LoRAs etc). As this template is by the same creator as in #1, all the features and functions mentioned above are included plus additional listed below.

Stable Diffusion Kohya_ss ComfyUI Ultimate Template

It includes:

All the goodness of the above template #1 and more.
Kohya_ss – Web UI interface that be used to prepare, build a dataset from collected images to train your Own Model and LoRA. I used this template when I trained by own LoRA as covered in my earlier post.
ComfyUI – main comfy interface with ComfyUI Manager already installed. This UI lets you design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart based interface. You can use ComfyUI Manager to also download models and install extensions.

4. Kohya_ss

This template is provided by ashleykza and provides a standalone instance of Kohya_ss so you can run this dedicate instance for the purpose of training alone. You’d need to use another RunPod or local instance to test the trained model or LoRA that you were working on.

Kohya_ss Template

It includes:

Kohya_ss – Web UI interface that be used to prepare, build a dataset from collected images to train your Own Model and LoRA. I didn’t use this when I trained by own LoRA as covered in my earlier post as I was tested the models in RunPod as well.
JupyterLab – File Explorer, Command Line terminal and run IPYNB Notebooks. This comes in very handy if you need to upload a specific model or file.
Application Manager – Let’s you restart different components of the environment. Use it in the case where you get CUDA error or something gets stuck. Restarting the WebUI should clear any errors and get you back in a few seconds.

5. Stable Diffusion WebUI Forge

This template is provided by ashleykza and provides a standalone instance of Stable Diffusion WebUI Forge. Forge is a version of Stable Diffusion Automatic1111 that is optimised for low VRAM usage. However, optimised version means its very niche and therefore as a result some extensions have not be developed or modified to function on this version. Eg. TileDiffusion which I have been using extensively in Automatic1111 and have documented my workflow is not available in Forge 🙁

Stable Diffusion WebUI Forge Template

However if you are someone who will not need this and just want to generate images at low cost you can run this up in 16GB VRAM on RunPod which costs $0.35 per hour. It’s the cheapest GPU.

Stable Diffusion Forge UI

It includes:

Forge
Jupyter Lab
RunPod File Uploader

Conclusion

In closing, I have enjoyed exploring these Templates and created some cool images that I would not have been able to on my local GPU due to its 16GB limit. For a few cents, less than a dollar an hour I can access RunPod GPUs that are 48GB VRAM which is incredible value for money. As we come across newer models, these would definitely require higher VRAM for sure which would make it difficult for any individual to invest this kind of cash in buying 24GB or 48GB GPU.

I wanted to share my top 5 recommendations that you can try on RunPod and benefit from the community’s work who built these Templates. I’m very thankful for this.

Unlimited Smartphone Wallpapers with Midjourney + ChatGPT

Harmeet — Thu, 04 Apr 2024 18:11:00 +0000

Midjourney is by far the leading provider of text-to-image service and is able to create the most realistic images possible. Now combine the power of ChatGPT (v3.5 Free version) and you have a very powerful combination.

I came across this post on X by Umesh which is the inspiration behind this blog post.

ChatGPT Prompt

Submit the below prompt to ChatGPT which will get to work and create 10 prompts for you to begin with. Don’t have ChatGPT then don’t worry just scroll to the prompts below.

Imagine you are a world renowned artist who focuses on minimalistic visual design for background wallpaper. Get creative and write a prompt for text to image that describe such visual creations that are serene, beautiful, award winning images. Append at the end --ar 9:16 --stylize 400 --style raw Give me 10 prompts to begin with, they must be at least 50 words, use sophisticated language.

ChatGPT will produce 10 prompts which you can take and run in Midjourney. The –s Style parameter control the stylization of the images you can read more about this in this post and –ar is aspect ratio which is tall ratio suited for most smartphones.

Once you have the 10 generated, you can simply prompt “10 more” and it will give you 10 new prompts. You can keep going as much as you like.

Midjourney Prompts

Now we are at the next real fun part where we get to see the results of our prompts generated with ChatGPT. I’m using the Alpha site of Midjourney but you can do the same in their Discord or directly messaging with the Midjourney Bot.

Here is a collection of prompts I generated with ChatGPT below, feel free to use them and modify them to your liking. But have fun with ChatGPT with creating new prompts.

Sample prompts

Delicate intertwining lines resembling a dance of ribbons, evoking a sense of graceful movement and unity. Soft pastel hues gently blend, creating a serene ambiance. Each curve whispers tranquility, inviting contemplation and calm. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Abstract geometric shapes intersect in harmonious balance, radiating modern elegance. Subtle gradients shift seamlessly, infusing depth and dimension. Every angle captivates with its precision and sophistication. A minimalist masterpiece, embodying refined simplicity and timeless allure. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Ethereal clouds drift across a canvas of muted tones, casting a dreamlike aura. Soft edges blur boundaries, blurring the line between reality and imagination. Layers of translucent veils suggest hidden depths, inviting exploration. A tranquil escape, offering solace and serenity to the soul. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Fluid waves ripple in a symphony of light and shadow, mesmerizing the senses. Subdued colors dance in perfect harmony, exuding quiet sophistication. Each crest and trough tells a silent story of motion and grace. A captivating composition, capturing the essence of fluidity and grace. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Whispers of nature emerge in an abstract tapestry of organic forms, imbued with gentle vitality. Earthy hues mingle with soft whispers of green, evoking a sense of natural tranquility. Every curve and contour echoes the rhythm of life, celebrating the beauty of imperfection. A serene sanctuary, harmonizing with the soul's longing for connection. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Bold strokes of monochrome elegance converge in a study of contrast and composition. Stark black lines carve pathways through a sea of pristine white, commanding attention with their stark simplicity. Negative space breathes life into the design, inviting reflection and introspection. A timeless statement of minimalist beauty, resonating with understated power and poise. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Ethereal petals cascade in a gentle ballet of color and light, evoking the delicate beauty of a spring morning. Soft pastel hues dance across the canvas, imbuing the scene with a sense of enchantment. Each petal seems to shimmer with a whispered promise of renewal and hope. A serene ode to the ephemeral beauty of nature's embrace. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Gleaming metallic strands intertwine in a mesmerizing display of industrial elegance. Polished surfaces catch the light, casting shimmering reflections that dance across the screen. Each line and curve exudes a sense of meticulous craftsmanship, inviting admiration for its intricate detail. A modern marvel, blending sophistication with a touch of avant-garde allure. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Celestial bodies converge in a cosmic ballet of light and shadow, painting the night sky with a symphony of color. Soft gradients meld seamlessly, capturing the ethereal beauty of distant galaxies. Each star seems to shimmer with the promise of endless possibility, inviting contemplation of the vastness of the universe. A celestial masterpiece, whispering secrets of the cosmos. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Geometric patterns intersect and overlap in a kaleidoscope of vibrant hues, pulsating with dynamic energy. Bold lines carve pathways through the chaos, guiding the eye on a journey of discovery. Every shape and color seems to vibrate with a playful intensity, infusing the scene with a sense of joyful exuberance. A visual symphony, celebrating the beauty of spontaneity and creativity. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Crystalline shards shimmer with iridescent hues, casting prismatic reflections that dance across the screen. Each facet glows with an inner light, illuminating the darkness with its ethereal beauty. Soft gradients blend seamlessly, lending an air of otherworldly enchantment to the scene. A captivating kaleidoscope, capturing the magic of the infinite cosmos. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Serene ripples cascade across a tranquil pool, echoing the gentle whispers of a secluded oasis. Subtle gradients blend seamlessly, evoking a sense of calm and serenity. Each ripple seems to hold a secret, inviting the viewer to lose themselves in its tranquil depths. A serene sanctuary, offering solace and renewal to the weary soul. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Interlocking circles spiral in an endless dance of symmetry and harmony, exuding a sense of timeless elegance. Soft pastel hues mingle with shimmering metallic accents, casting a subtle glow that mesmerizes the senses. Each curve and contour whispers of infinite possibility, inviting contemplation and introspection. A visual symphony, celebrating the beauty of balance and unity. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Wisps of fog drift through a forest of towering trees, cloaking the landscape in an ethereal haze. Soft gradients blend seamlessly, casting a dreamlike aura over the scene. Each tree stands as a silent sentinel, bearing witness to the passage of time. A tranquil retreat, offering respite from the chaos of the world. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Swirling galaxies converge in a cosmic ballet of light and shadow, painting the night sky with a symphony of color. Soft gradients meld seamlessly, capturing the ethereal beauty of distant stars. Each nebula seems to shimmer with a whispered promise of wonder, inviting contemplation of the mysteries of the universe. A celestial masterpiece, echoing the secrets of the cosmos. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Intricate mandala patterns unfurl in a mesmerizing display of symmetry and precision, drawing the eye into a hypnotic trance. Vibrant hues dance across the screen, casting a kaleidoscope of color that captivates the senses. Each intricate detail seems to pulse with a rhythmic energy, inviting the viewer to lose themselves in its intricate beauty. A visual symphony, celebrating the artistry of the cosmos. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Gossamer threads intertwine in an intricate web of delicate beauty, casting shimmering reflections that dance across the screen. Soft pastel hues blend seamlessly, imbuing the scene with a sense of ethereal grace. Each thread seems to shimmer with a whispered promise of wonder, inviting contemplation of the interconnectedness of all things. A serene sanctuary, offering solace to the weary soul. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Abstract waves ripple across the surface of a tranquil lake, casting shimmering reflections that dance in the moonlight. Soft gradients blend seamlessly, lending an air of serenity to the scene. Each ripple seems to hold a secret, inviting the viewer to lose themselves in its tranquil embrace. A serene sanctuary, offering respite from the chaos of the world. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Vibrant blooms unfurl in a riot of color and texture, painting the landscape with the vibrant hues of spring. Soft pastel tones mingle with bold strokes of color, casting a kaleidoscope of light that dances across the screen. Each petal seems to shimmer with a whispered promise of renewal, inviting contemplation of the beauty of nature's ever-changing canvas. A floral symphony, celebrating the joy of new beginnings. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Bold strokes of monochrome elegance intersect in a mesmerizing display of contrast and composition, drawing the eye into a hypnotic trance. Stark black lines carve pathways through a sea of pristine white, commanding attention with their stark simplicity. Each curve and contour exudes a sense of meticulous craftsmanship, inviting admiration for its intricate detail. A timeless statement of minimalist beauty, resonating with understated power and poise. 8k, hd, high resolution, stunning --ar 9:16 --stylize 400 --style raw

Resulting Wallpapers

Have to say the results are pretty incredible on what you get out of Midjourney using these ChatGPT generated prompts.

Conclusion

Incredible results by combining the power of two of these awesome tools. But you should not be limited to creating only smartphone wallpapers. Change the aspect ratio to –ar 16:9 and you get some widescreen wallpapers for your computer and tablet. Make sure you upscale these which will create higher resolution images so you get the maximum experience.

Really enjoyed generating a ton of wallpapers using this technique, you should also check out my ScreenCandi wallpapers dedicated to AI Generated images for your smartphone.

Stable Diffusion 3 – Summary of the paper

Harmeet — Tue, 05 Mar 2024 20:18:23 +0000

Improving Text-to-Image Synthesis with Multi-Modal Diffusion Models

Text-to-image synthesis is a rapidly growing field that has gained significant attention in recent years. The goal of text-to-image synthesis is to generate an image based on a given text description. This task has numerous applications, such as in computer vision, natural language processing, and multimedia processing.

In this blog post, we’ll explore how Stable Diffusion 3 Multi-modal Diffusion Transformer (MMDiT) can be used to improve text-to-image synthesis. We’ll discuss the current state-of-the-art methods, the limitations of existing approaches, and the potential benefits of using multi-modal diffusion models. All this is based on recent release of the paper on Stable Diffusion 3

Current State-of-the-Art Methods

Current state-of-the-art methods for text-to-image synthesis involve using generative adversarial networks (GANs) or variational auto encoders (VAEs) to generate images based on text descriptions. These methods have achieved impressive results, but they have limitations. For example, they often produce images that are not semantically consistent with the text description or that lack detail and realism.

Limitations of Existing Approaches

One of the main limitations of existing approaches to text-to-image synthesis is that they rely solely on text-to-image models. These models are trained on large datasets of text-image pairs and learn to generate images based on the text descriptions. However, these models often produce images that are not semantically consistent with the text description or that lack detail and realism.

Another limitation of existing approaches is that they do not take into account the multi-modal nature of the text-to-image task. Text-to-image synthesis involves generating an image based on a text description, but it also involves understanding the text and the image simultaneously. This requires a deep understanding of the text and the image, as well as their relationships with each other.

Multi-Modal Diffusion Models

Multi-modal diffusion models offer a promising solution to the limitations of existing approaches to text-to-image synthesis. These models are trained on large datasets of text-image pairs and learn to generate images based on the text descriptions. However, they also take into account the multi-modal nature of the text-to-image task.

Multi-modal diffusion models use a combination of text and image features to generate images. These features are learned during training and are used to generate images that are semantically consistent with the text description. The models also use a combination of text and image features to generate images that are visually plausible and aesthetically pleasing.

Advantages of Multi-Modal Diffusion Models

There are several advantages to using multi-modal diffusion models for text-to-image synthesis. One advantage is that these models can generate images that are semantically consistent with the text description. This means that the images generated by these models are more likely to be accurate and realistic.

Another advantage of multi-modal diffusion models is that they can generate images that are visually plausible and aesthetically pleasing. This means that the images generated by these models are more likely to be attractive and engaging.

Future Research Directions

There are several future research directions for improving text-to-image synthesis with multi-modal diffusion models. One direction is to explore the use of other modalities, such as audio or video, to improve the performance of these models. Another direction is to explore the use of other techniques, such as reinforcement learning or transfer learning, to improve the performance of these models.

Conclusion

In conclusion, multi-modal diffusion models offer a promising solution to the limitations of existing approaches to text-to-image synthesis. These models take into account the multi-modal nature of the text-to-image task and can generate images that are semantically consistent with the text description and visually plausible and aesthetically pleasing. There are several future research directions for improving text-to-image synthesis with multi-modal diffusion models, including exploring the use of other modalities and techniques.

10 RunwayML Multi Motion brush examples

Harmeet — Sat, 20 Jan 2024 05:43:00 +0000

2024 is accelerating the movement of video tools that can animate and create motion in still images using AI. RunwayML has releases some new motion brushes, yes “es”. There are multiple motion brush – 5 to be precise that you can use to select different objects in your image and then define the motion that you want them to take.

In this video I show you my first attempt which is pretty ordinary and does not count towards the examples which you will see below and they are much better. Must mention that I have not played much in RunwayML with its video tools but I think that time has come.

Framer

Wonderful example of product showcase using RunwayML new Motion Brush. Make sure you follow @0xFramer

Want to make tons of money from your passion for AI?

1. Record a 15-minute tutorial on how to easily animate ad creatives with Gen2 Motion Brush

2. List it on Gumroad for $29

3. Make a huge list of e-commerce companies that sell physical goods and send them an email… pic.twitter.com/a9WA9Sabmg
— Framer 🇱🇹 (@0xFramer) January 19, 2024

Blaine Brown

Blaine demonstrates how with the use of the multiple brushes you can bring a still image to life. Can you feel the terror in this clip? Follow @blizaine

What happens when you combine all five of @runwayml's new multi-motion brushes with a touch of camera movement?

Cinematic, fully controllable, dynamic scenes 🔥

SFX Included. Turn it 🆙🔊🔊 pic.twitter.com/h7QDH8Lq01
— Blaine Brown  (@blizaine) January 18, 2024

Christopher Fryant

Chris takes the classic Inception city bending scene to RunwayML and is easily able to replicate the movements as originally done in the movie. Follow @cfryant

Inception movie "city curl" style effect

Created with the Multi Motion Brush, coming soon to @runwayml's video AI suite of tools.

Repost if you'd like to see a tutorial on how this effect was achieved. pic.twitter.com/tbKvOq7t0E
— Christopher Fryant (@cfryant) January 18, 2024

Abel Art

Multi object movement using the brushes from RunwayML, wonderfully demonstrated by Abel. Check out @a_b_e_l_a_r_t profile and give Abel a follow.

Here we are: "Car, Boat, Plane, Clouds and Sea".

A poetic work created with @runwayml Multi Motion Brush 😂 pic.twitter.com/dJXpCNoDaq
— Abel Art (@A_B_E_L_A_R_T) January 18, 2024

Shimayuz

Checkout a simple yet very nice example of movement between two subjects in a scene. Follow @shimayus

【控えめに言ってこれはヤバい】 @runwayml が提供するMotion brush。
画像の一部を動画にする機能ですが、これがアップデート！
早速使ってみました。
これまで出来なかった人物像の独立した動き！

使い方詳細はリプ欄に👇 pic.twitter.com/mfT2G3UKVq
— shimayuz@AIクリエイター&プロンプトデザイナー (@Shimayus) January 19, 2024

Dan Dawson

Dan creates an action packed scene using RunwayML, there smoke, fire, explosions and a chase!! Follow @dan_dawson_

I put RunwayML's CRAZY new 5 motion brush to the test!

Now control 5 different parts of the image. I asked @runwayml to……

(you can see the visual representation of the brush request in the thread)

1 – move the jeep closer
2 – move the helicopter up and closer
3 – give… pic.twitter.com/EbpouO38Hi
— Dan Dawson (@Dan_Dawson__) January 18, 2024

Always Editing

A very cute application of the RunwayML brushes where 5 ducks are animated above and under the water, very clever. Follow @notiansans

Today we dropped Multi Motion Brush inside of Gen-2 and it's super ducking exciting!! 🦆 pic.twitter.com/98axwe2Lp6
— Always Editing (@notiansans) January 18, 2024

Rory Flynn

In this clip Rory takes a few images and shows the animations that can be created using RunwayML. Check it out and try these simple things yourself – I know I will be playing with these. Make sure you follow @ror_fly

Runway had to go and ruin my day.

The new multi-direction motion brush is killer.#runwayml #midjourney #AIArtCommuity pic.twitter.com/22U0Zmwd9D
— Rory Flynn (@Ror_Fly) January 18, 2024

Vincenzo Cosenza

Vince is playing with fire on water in this wonderful display of animation he created. Follow @vincos

Playing with @runwayml Multi Motion Brush.
Next tweet for behind the scene pic.twitter.com/yiKy5XtPS5
— Vincenzo Cosenza (@vincos) January 18, 2024

RunwayML

Last but not least we have a super cool example by RunwayML team themselves shared on their X account. Check out RunwayML yourself and start your free account. Follow them on X @runwayml

Introducing Multi Motion Brush.

Control multiple areas of your video generations with independent motion.

Available now for Gen-2 at https://t.co/ekldoIshdw pic.twitter.com/ZZoBWczNkg
— Runway (@runwayml) January 18, 2024

How I re-created a T-Shirt design with Midjourney

Harmeet — Fri, 05 Jan 2024 19:40:00 +0000

On a recent trip to Queenstown, New Zealand I came across this cool T-shirt at one of the downtown stores of Global Culture and took a photo of the T-shirt design. I realise now as I write this the image was also available on their website.

I ran the image through Midjourney to see what it would describe this as and I got few prompts suggested. Note that Midjourney version for this post was v6 alpha

I started with the 1st one which seemed to be the easiest one to begin with to see what I’d get. The initial Midjourney prompt I started with was this: a sheep is wearing sunglasses with a beanie, sunglasses have reflection of mountainous vistas, in the style of t-shirt printing, digitally enhanced, blink-and-you-miss-it detail, wildstyle –ar 3:4 –v 6.0

The result I got was okay and let me identify what I was missing. I went with adding more details about specific location and also the image should be more illustrative. So my second attempt was with this Midjourney prompt: an illustration of a sheep wearing sunglasses with a beanie, sunglasses have reflection of mountainous vistas of “the remarkables” from Queenstown New Zealand, in the style of t-shirt printing, digitally enhanced, blink-and-you-miss-it detail, wildstyle –ar 3:4 –v 6.0

The Remarkables is the magnificent mountain range that forms the backdrop south of Queenstown in New Zealand and its beauty give the European Alps a run for their money, in my opinion and its only 3 hours away from Sydney (where I’m based) as opposed to getting to Europe which is 20-24hr journey by air.

Now the mountain range starts to resemble The Remarkables and image produced is illustrative in its look and feel. However because of the “the remarkables” in the quotes it starts to add the text in the above images. If you notice there is something that resembles the text. Also for a t-shirt design to match original I need black background and aviator style sunglasses.

So third Midjourney prompt changes to become: a illustration of a sheep wearing aviator sunglasses with a beanie, sunglasses have reflection of mountainous vistas of the remarkables from Queenstown New Zealand, in the style of t-shirt print, digitally enhanced, blink-and-you-miss-it detail, black background –ar 3:4 –v 6.0

I felt very happy that I managed to reproduce a similar design with Midjourney to the original t-shirt design, only slight variation is the coloured image above vs the mostly black and white original (with neon green frame). I could stop here and do rest of the edit in Photoshop but why not go a little further and get close to the original one. Let’s see what Midjourney can do if I describe more specific details about the desired image, since V6 Alpha supposed to be more language aware and coherent.

Final adjusted Midjourney prompt is: a black and white illustration of a sheep head wearing aviator sunglasses with a beanie, beanie is plain knitted, aviator sunglasses have a neon green frame, sunglasses have reflection of mountainous vistas with view is Queenstown city in New Zealand, in the style of t-shirt print, digitally enhanced, blink-and-you-miss-it detail, black background –ar 3:4 –v 6.0

Pretty happy with the result, even though it feels that some of the aviator style has been lost, even after several re-creations it couldn’t stay consistent with the sunglasses frame or maybe there is a more accurate way to describe this sunglasses style. Anyways, I am quite happy with the results and impressed with the possibilities that this presents to any avid Midjourney user that they can create their own t-shirt design.

Review of Midjourney v6 Alpha

Harmeet — Sun, 24 Dec 2023 07:29:23 +0000

Midjourney has been making small incremental increases since last few version but the expectation with version 6 is that is a bigger leap and change coming up. In some of the shares the Midjourney team have highlighted several aspects will improve such as:

Prompt interpretation
Details across the image – less artefacts
Upscaled quality improvements
Text Generation
Upscale Subtle or Creative

I played around the rating party earlier in the week and found some really nice gems that I saved but now the Alpha release is out so you can try it out yourself. You can use the switch –v 6 or using the /settings command to set v6 alpha as your default.

Prompt Interpretation

A simple test can be describing various objects next to each other with different colours assigned to them. My test prompt for this: a red book on a wooden table with a white cup

Version 6 alpha understand this well and is able to render the desired image consistently, the book is red and the cup is white in all the images produced. The table is also wooden.

Result produced with Midjourney v6 alpha

Below image is v5.2 result with the same prompt but as you can see cup is white in only 1 of 4 images. Book is red in 3 of 4 however the background is somehow red in 2 of 4 images.

Results produced by Midjourney v5.2

So the consistency and coherence to the prompt is much stronger in the version 6 alpha than its predecessors.

To further evaluate the prompt coherence I employed good friend called ChatGPT-3 to describe a more detailed scene and it produced a prompt that went something like: A visually captivating still life features a vibrant bouquet of flowers, a bowl of assorted fruit, an aged book, a ceramic teapot, and a dynamic abstract painting. The carefully balanced composition showcases a harmonious interplay of colors, textures, and forms, inviting viewers to appreciate the beauty in the ordinary and extraordinary.

There are a lot of subjects described here and I wonder how the model will interpret this. Check out the comparison below by sliding the slider left to right. You have the left side which is v5.2 image and right side which is v 6.0 alpha.

Surprisingly the v6 alpha image is more consistent with the prompt and has the book in all the occurrences, the variety of fruit is there where as the v5.2 image struggles to be consistent with the prompt, the book is missing, only a couple of fruit types. As as you can v5.2 interprets the whole thing as painting in grid position 1.

People

Human beings are much more realistic and the details are vastly improved where the skin is not being imitated by small squiggles and artifacts but actual skin with pores and textures. There are two kinds of upscalers available as well which will add Subtle details and the other more pronounced Creative upscale.

I downsized the images for the web after upscaling these using the Subtle Upscale option. However its not hard to notice the details in the eyes, eyebrows, skin pores and imperfections, and lips. I mean if you didn’t know you were looking at an AI Generated human you won’t have a clue that this is not a real person.

You can also skin folds and creases that should be there naturally. Look at the above image at her right shoulder, you see the pores, tiny hair on the skin. The only odd thing in the above image is one of eyelashes starts to be mashed up with the sunglasses but you only notice that at 1:1 zoom when looking at the full high res image.

You can get more creative with your images and the details continue to stick.

Here is another example which demonstrates the power of the new model, the hair in the beard & head are more pronounced then ever before, the skin pores, a mole or skin tag on the forehead. Due to the depth of field the skin is softened a touch but this could also be due to Subtle upscale. The jacket has jean texture and weaving like the fabric should, the double stitching is present as it should have.

When I upscaled with Creative I see a lot more details in the resulting image. Here is a 1 to 1 zoom snippet of some sections

Notice the pores on the nose and the skin and the eyebrow hair. Eye lashes are well formed and the iris is also very natural. There are even tiny details in the nose bridge of the eye glasses.

Text Creation

As I write this post on the eve of Christmas, I thought I would create some images that have the theme of the moment. The first creation I made was using the prompt: a Christmas theme wallpaper for your smartphone, with text “Merry Christmas ” written in white color, beautiful and elegant, vibrant color tones, centered –ar 9:16 –style raw –v 6.0

The results were surprising well, the text is written in fancy seasonal fonts that fit the moment and the lettering is correctly spelled out. However, after the initial success the next few generations were falling apart with the lettering not being correct.

Overall the images are very beautiful and very well composed but the text and lettering are not correct. I further tried to generate some cards and New Years cards.

The prompt for the new year: a cityscape night scene with fireworks overhead, with text “2024” written in white, beautiful elegant, vibrant colors, centered –ar 3:2 –style raw –v 6.0

It seems that this is getting better with Midjourney version 6 alpha, however it’s a much better improvement than the version 5.2. My repeated attempts were failing me and the resulting images did not have correctly spelled text, although the characters are forming better the new model still struggles to spell the text correctly. Its a hit and miss it seems.

Cars

Another passion of mine is cars, I like my luxuries and engineering of European cars, so I had to try out and see how good would Midjourney versions 6 alpha cars would be.

First we start off with a muscle car from the states. Its correctly rendered shapes and lines of the car with the logo and 5.0 lettering appear in the correct locations and being easily recognizable.

Then my current ride Mercedes-Benz C43 looks nice, the double fin on the front is correct resembles of an AMG version and although very tiny the AMG lettering is on wedged in-between the two fins. A normal C200-300 model (non-AMG) would have two separated fins.

Let’s get some Porches from Midjourney and man are these lines and shapes nice or what. The logo however is not correctly rendered in this version which I upscaled.

As we are in the woods and maybe we’ve been having a bit of fun sliding the car around, you see some dirt on the rear bumper/fender and on the tires. The dried pine sticks are on the ground with some grass and moss. Just gorgeous!!

Conclusion

Even though the Midjourney model can do a lot more, I particularly want to focus on the details and quality of the images in few areas that I wanted to explore. The version 6 alpha model is certainly much more improved since its predecessor and has better prompt coherence when interpreting the context. It is consistent in the images and quality, however still lacks in text rendering which is not quite there yet, perhaps some improvements will come in the final version or in future versions.

The details have certainly improved when you upscale an image and this is apparent in the people images generate above. I’d love to compare this quality in a future post against Magnific AI which I have already talked about on this blog (Magnific.AI Upscaler) and compared the upscalers head to head in Upscaler Comparison Midjourney vs Magnific AI

Stable Video Diffusion – Cyberpunk Cityscape

Harmeet — Fri, 01 Dec 2023 10:42:00 +0000

I came across this video in my daily feeds when scrolling through and the author (videojongleur) shared the tools how this video was created. Amongst a combination of things the potential of Stable Video Diffusion (SVD) is showcased here.

Teamed with a creative mind, concept and right execution you can create something really cool with little effort. The result is truly wonderful.

This cyberpunk video titled “Night Pulse: A Cyberpunk Cityscape” is based on a poem that the author claims was created with GPT-4, including the prompts.

Stable Video – Night Pulse: A Cyberpunk Cityscape
byu/Videojongleur inStableDiffusion

Videojongleur explains that the tools used were as below:

Concept, image prompts, poem: ChatGPT (GPT-4)
Image generation: Stable Diffusion XL(JuggernautXL model)
Video generation: Stable Video Diffusion
Upscale and interpolation: Waifu2x GUI
Voiceover: Elevenlabs
Animated text: After Effects
Video Editing and Sound Design: Premiere Pro

So far this is the best work I have come across online demonstrating the wonderful application of Stable Video Diffusion.