Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prodigy is not working well with Stable Diffusion 3.5 Medium training #27

Closed
Bocchi-Chan2023 opened this issue Nov 4, 2024 · 4 comments

Comments

@Bocchi-Chan2023
Copy link

I have been trying to train with the stock settings of this optimizer and have not been successful yet. Specifically, it seems that it is not learning nearly as well as it should.

adamW8bit seems to be working with 4e-4 lr

@LoganBooker
Copy link

Hey Bocchi-Chan2023, are you able to take note of the value of d during training? If it rises very slowly or not at all, you might need to bump up d0 (to say, 1e-5 or 1e-4). I've found sometimes Prodigy just needs to get a larger read of the gradients to start with, otherwise it can take quite a few steps before it finds a good LR, by which point you're already a good portion through training.

For example, here's the results of an SDXL LoRA training, batch 8, with a modified Prodigy that treats each parameter group independently (#20). From left to right, the graphs show the d value for TE1, TE2 and the Unet.

image

As you can see, both TEs hit a good LR quickly, but the Unet took until steps 200-300 to find a decent LR, and even then it continued to search. I've been experimenting with ways to combat this but haven't been successful so far.

Also double check you're setting the regular LRs to 1 (as the LR is multiplied by d), and I'd also suggest using betas of (0.9,0.99) if you're not already (as suggested here: #8 (comment)). If beta3 is not set explicitly, then beta2 ** 0.5 is used in its place, so beta2 affects more than just the second moment.

Not sure if any of this will help, but sharing my experiences while playing around with the internals.

@Bocchi-Chan2023
Copy link
Author

Okay, I will try to record the value of d while adjusting the value of d0

@Bocchi-Chan2023
Copy link
Author

I set d0 from 1e-6 to 1e-5 and prodigy seems to be working well!

@konstmish
Copy link
Owner

Thanks for sharing your experience and a especially thanks to @LoganBooker for giving a solution to the problem. Since the problem seems to be solved, I'm closing the issue, but feel free to reopen it if you have more questions. I'll also add a comment on changing d0 in the readme with a link to this discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants