Support sd3.5 medium and MMDiT-X #2587

Czxck001 · 2024-10-30T04:46:58Z

Stable Diffusion 3.5 Medium has been released on Oct 29, with modified archetecture named MMDiT-X. This PR adds support to the Stable Diffusion 3.5 Medium and MMDiT-X model.

Change is based on reference design sd3.5/mmdit-x.py in comparison with sd3-ref/mmdit.py, including

an extra self-attention for the MMDiT-X block is placed here.
modified pre_attention for x_block (code here)
modified post-attention for x_block (code here)
and different block-joining (between x and context) is present here.

Note: A change has been made in sd3.5 after the release of Stable Diffusion 3.5 Medium on Oct 29 that fixes some bugs in the original reference design.

Implementation-wise,

a trait polymorphism is kept between the old and new JointBlock, but individual DiTBlock is re-implemented to avoid coupling. Ad-hoc adaptation to original DiTBlock has been attempted and dropped as it seems less sensible in terms of software engineering.
SD3.5 has the X-block in the first 12 layers out of total 24 layers of JointBlock changed to an extra attention attn2 side track (namely "Self Attention"). None of the context-blocks have this extra attention. So the MMDiTXJointBlock is set to use this specification without further generalization.

References:

diagram of the MMDiT-X architecture in official HuggingFace Hub Repository: https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/blob/main/mmdit-x.png

Sample image generated with Stable Diffusion 3.5 Medium:

Czxck001 · 2024-10-30T04:51:26Z

candle-transformers/src/models/mmdit/model.rs

+            depth: 24,
+            head_size: 64,
+            adm_in_channels: 2048,
+            pos_embed_max_size: 384,


Notably, pos_embed_max_size for position embedding has been increased for SD3.5-medium, making it even bigger than SD3.5-large, which kept the original size of that of SD3-medium...

LaurentMazare · 2024-10-30T05:19:03Z

Amazing, thanks!

* extract attn out of joint_attn * further adjust attn and joint_attn * add mmdit-x support * support sd3.5-medium in the example * update README.md

Czxck001 added 5 commits October 29, 2024 20:42

extract attn out of joint_attn

3b8eb05

further adjust attn and joint_attn

4faeb21

add mmdit-x support

8200886

support sd3.5-medium in the example

5193f4d

update README.md

c5974fd

Czxck001 mentioned this pull request Oct 30, 2024

Stable diffusion 3.5 support. #2578

Merged

Czxck001 commented Oct 30, 2024

View reviewed changes

LaurentMazare merged commit d232e13 into huggingface:main Oct 30, 2024
10 checks passed

Czxck001 deleted the support-sd3.5-medium branch October 30, 2024 05:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support sd3.5 medium and MMDiT-X #2587

Support sd3.5 medium and MMDiT-X #2587

Czxck001 commented Oct 30, 2024 •

edited

Loading

Czxck001 Oct 30, 2024 •

edited

Loading

LaurentMazare commented Oct 30, 2024

Support sd3.5 medium and MMDiT-X #2587

Support sd3.5 medium and MMDiT-X #2587

Conversation

Czxck001 commented Oct 30, 2024 • edited Loading

Czxck001 Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

LaurentMazare commented Oct 30, 2024

Czxck001 commented Oct 30, 2024 •

edited

Loading

Czxck001 Oct 30, 2024 •

edited

Loading