You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!, I would like if is possible apply the 8 bit strategy to Prodigy to reduce the vram used. Or it could mess something necessary for make it works. I don't understand very well the paper and I'd really like know your thoughts. Thanks in advance.
The text was updated successfully, but these errors were encountered:
Hi @KonokoAz! We have added a hyperparameter called slice_p to reduce the memory needed to store Prodigy's extra vectors used to estimate the learning rate. With a value like slice_p = 11, it should reduce the memory consumption by about 45%.
8-bit version can still be added to additionally compress the vectors used in the algorithm. That would mean compressing momentum estimates too, so we'd need to store state['s'], state['p0'], state['exp_avg'], and state['exp_avg_sq'] in 8-bit format. We think it'd be a good addition to the main optimizer and we'd be happy to review a pull request if you have the time to implement it. It's best to make an implementation based on bitsandbytes.
If you don't have time to work on a PR, we might do it in the future ourselves, though we're not currently working on it. Let us know what you think.
Thank you so much for your answer!, I rlly would like make a PR indeed I did a try but few months ago. But I gave up after several tries because they was consuming more vram than the original one haha... I'm still learning about this world. So with this information, I'll try again to see if I can get something working (but don't wait for me ; u;)
Hi!, I would like if is possible apply the 8 bit strategy to Prodigy to reduce the vram used. Or it could mess something necessary for make it works. I don't understand very well the paper and I'd really like know your thoughts. Thanks in advance.
The text was updated successfully, but these errors were encountered: