Stable diffusion 3.5 support. #2578

LaurentMazare · 2024-10-26T18:56:48Z

Image generated with 3.5 turbo.

LaurentMazare · 2024-10-26T19:04:46Z

Another image generated with 3.5 large (non-turbo).

super-fun-surf · 2024-10-26T21:26:33Z

nice work cleaning up and integrating thank you.
where do you learn about how to make these 3.5 specific changes?

LaurentMazare · 2024-10-26T22:21:44Z

where do you learn about how to make these 3.5 specific changes?

Mostly by looking at the python reference implementation here: https://github.com/Stability-AI/sd3.5

Czxck001

I looked into sd3.5/mmdit-x.py in comparison with sd3-ref/mmdit.py. Looks like so far this PR is missing some changes to MMDiT blocks, including

an extra attention for the MMDiT-X block is placed here.
modified pre_attention for x_block (code here)
modified post-attention for x_block (code here)
and different block-joining (between x and context) is present here.

I noticed you already finished adapting the projection layer. Looks like this PR is still WIP or maybe changes in some files are missing in the pushing? Let me know if I got it right and if/how I can help.

Interestingly, I am still able to reproduce generating beautiful images, though. I'm not sure if my guess is right, but perhaps the original SD3.0's MMDiT could simply benefit from stacking more layers without having to change the architecture.

Czxck001 · 2024-10-26T23:34:46Z

candle-examples/examples/stable-diffusion-3/main.rs

-    #[arg(long, default_value_t = 4.0)]
-    cfg_scale: f64,
+    #[arg(long)]
+    cfg_scale: Option<f64>,

    // Time shift factor (alpha).
    #[arg(long, default_value_t = 3.0)]


I noticed SAI's reference implementation of sd3.5 seems to change the recommended time_shift for sd3.0-medium from 3.0 to 1.0, but it's for dpmpp_2m sampler (code here). This wasn't the case in their original reference impl for sd3.0-medium, which is always 3.0 but for euler sampling (code here).

Not sure if it's worth to change the default behavior here. The difference should be small, though.

Right, as both the scheduler and the parameter changed I preferred keeping everything as they were before and not modify the existing behavior.

Czxck001 · 2024-10-26T23:38:22Z

candle-examples/examples/stable-diffusion-3/main.rs

+        {
+            let vb_vae = vb_fp16.rename_f(sd3_vae_vb_rename).pp("first_stage_model");
+            let autoencoder = build_sd3_vae_autoencoder(vb_vae)?;
+
+            // Apply TAESD3 scale factor. Seems to be significantly improving the quality of the image.
+            // https://github.com/comfyanonymous/ComfyUI/blob/3c60ecd7a83da43d694e26a77ca6b93106891251/nodes.py#L721-L723
+            autoencoder.decode(&((x / 1.5305)? + 0.0609)?)?
+        }


Good to know the VAE hasn't changed so this part could share across sd3.0 and sd3.5. Maybe it's worth to wrap those lines into a small function to avoid duplication, in case we need to change or tweak this part in the future. We can still keep the call site into a curly brace scope to control the lifecycle of VAE model.

Good point, I've pushed some changes to avoid a bit of the duplication (so the vae application and creation as well as the text model/mmdit application)

LaurentMazare · 2024-10-27T05:38:38Z

Thanks for looking into this, note that the new attention mechanism that you mentioned is I think triggered based an the existence of the attn2 weights here, but these do not exist for the published model so all the logic predicated by x_block_self_attn that you mentioned doesn't get used here. So overall I feel that we can avoid supporting this for the time being and revisit in a couple days when sd3.5 medium gets released in case it actually includes the new attention layers.

Czxck001 · 2024-10-27T07:07:16Z

@LaurentMazare Got it! I didn't expect that. Seems the new archetecture is for SD3.5-medium only. Yes I agree we should wait until SD3.5-medium comes out, which should be around the corner on Oct 29. I think I should have the bandwidth to do the support and testing over the next weekend. Let me know if you want me to be on it.

This PR LGTM. Thanks for the prompt support!

LaurentMazare · 2024-10-27T09:09:06Z

Thanks, yeah let's have a look on the 29th and if it needs the new cross attention it would be great if you can add it.

LaurentMazare · 2024-10-29T16:34:02Z

The medium model is out now https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/tree/main and it seems to have some attn2 layers so I guess we will need to support it properly, would be great if you have the time to look at it.

Czxck001 · 2024-10-30T04:47:28Z

@LaurentMazare Done in #2587

* Stable diffusion 3.5 support. * Clippy fixes. * CFG fix. * Remove some unnecessary clones. * Avoid duplicating some of the code.

Stable diffusion 3.5 support.

27e3a4f

LaurentMazare mentioned this pull request Oct 26, 2024

Add Stable Diffusion 3 Example #2558

Merged

LaurentMazare added 2 commits October 26, 2024 20:59

Clippy fixes.

2de2747

CFG fix.

cb74318

Remove some unnecessary clones.

6a60fe0

LaurentMazare mentioned this pull request Oct 26, 2024

Support for Stable Diffusion 3.5 Large #2574

Open

Czxck001 reviewed Oct 27, 2024

View reviewed changes

Avoid duplicating some of the code.

a7e3733

LaurentMazare merged commit 37e0ab8 into main Oct 27, 2024
10 of 11 checks passed

LaurentMazare deleted the sd3.5 branch October 27, 2024 09:01

EricLBuehler pushed a commit to EricLBuehler/candle that referenced this pull request Nov 26, 2024

Stable diffusion 3.5 support. (huggingface#2578)

81364b3

* Stable diffusion 3.5 support. * Clippy fixes. * CFG fix. * Remove some unnecessary clones. * Avoid duplicating some of the code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stable diffusion 3.5 support. #2578

Stable diffusion 3.5 support. #2578

LaurentMazare commented Oct 26, 2024

LaurentMazare commented Oct 26, 2024

super-fun-surf commented Oct 26, 2024

LaurentMazare commented Oct 26, 2024

Czxck001 left a comment •

edited

Loading

Czxck001 Oct 26, 2024

LaurentMazare Oct 27, 2024

Czxck001 Oct 26, 2024 •

edited

Loading

LaurentMazare Oct 27, 2024

LaurentMazare commented Oct 27, 2024

Czxck001 commented Oct 27, 2024 •

edited

Loading

LaurentMazare commented Oct 27, 2024

LaurentMazare commented Oct 29, 2024

Czxck001 commented Oct 30, 2024

Stable diffusion 3.5 support. #2578

Stable diffusion 3.5 support. #2578

Conversation

LaurentMazare commented Oct 26, 2024

LaurentMazare commented Oct 26, 2024

super-fun-surf commented Oct 26, 2024

LaurentMazare commented Oct 26, 2024

Czxck001 left a comment • edited Loading

Choose a reason for hiding this comment

Czxck001 Oct 26, 2024

Choose a reason for hiding this comment

LaurentMazare Oct 27, 2024

Choose a reason for hiding this comment

Czxck001 Oct 26, 2024 • edited Loading

Choose a reason for hiding this comment

LaurentMazare Oct 27, 2024

Choose a reason for hiding this comment

LaurentMazare commented Oct 27, 2024

Czxck001 commented Oct 27, 2024 • edited Loading

LaurentMazare commented Oct 27, 2024

LaurentMazare commented Oct 29, 2024

Czxck001 commented Oct 30, 2024

Czxck001 left a comment •

edited

Loading

Czxck001 Oct 26, 2024 •

edited

Loading

Czxck001 commented Oct 27, 2024 •

edited

Loading