Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parallel vae #281

Merged
merged 6 commits into from
Sep 26, 2024
Merged

Fix parallel vae #281

merged 6 commits into from
Sep 26, 2024

Conversation

gty111
Copy link
Contributor

@gty111 gty111 commented Sep 21, 2024

related to #271

@gty111 gty111 changed the title Fix parallel vae Fix parallel vae and flux model Sep 23, 2024
@gty111 gty111 changed the title Fix parallel vae and flux model Fix parallel vae Sep 24, 2024
@gty111
Copy link
Contributor Author

gty111 commented Sep 24, 2024

Related PR xdit-project/DistVAE#3

To consider DP, we add gather_broadcast_latents and is_dp_last_group in xFuserPipelineBaseWrapper.

gather_broadcast_latents:

  1. create dp last group
  2. gather latents from dp last group and concatenate the latents in the batch dim
  3. broadcast latents

Then use default group to complete parallel vae.

When using parallel vae, only one process need to process img after vae. So is_dp_last_group is to correctly identify the process which needs to do sth after vae.

@Eigensystem

@Eigensystem
Copy link
Collaborator

I think it is not a elegant implementation because you combined all the dp latents into one tensor instead of using different dp groups to process different latent. It will increase the communication load.

@gty111
Copy link
Contributor Author

gty111 commented Sep 26, 2024

I think it is not a elegant implementation because you combined all the dp latents into one tensor instead of using different dp groups to process different latent. It will increase the communication load.

It is indeed not the best implementation. Using different dp groups need to refactor DistVAE. Since parallel vae is an option, maybe we can first fix parallel vae in this PR, and further optimize it in the future.

Copy link
Collaborator

@Eigensystem Eigensystem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@Eigensystem Eigensystem merged commit 48e3633 into xdit-project:main Sep 26, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants