Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lion 8 bit #188

Merged
merged 21 commits into from
Apr 11, 2023
Merged

Lion 8 bit #188

merged 21 commits into from
Apr 11, 2023

Conversation

lucidrains
Copy link
Contributor

@lucidrains lucidrains commented Mar 9, 2023

per advice from Tim, the plan will be to closely follow the 1-state logic of RMSProp and conditionally branch out for the lion logic

status - Lion8bit is successfully training a small autoregressive transformer for character level enwik8 on my machine

some remaining todos before merging

Screenshot from 2023-03-09 12-43-31

in pytorch

def update_fn(p, grad, exp_avg, lr, wd, beta1, beta2):
    # stepweight decay

    p.data.mul_(1 - lr * wd)

    # weight update

    update = exp_avg.clone().mul_(beta1).add(grad, alpha = 1 - beta1).sign_()
    p.add_(update, alpha = -lr)

    # decay the momentum running average coefficient

    exp_avg.mul_(beta2).add_(grad, alpha = 1 - beta2)

to test

$ git clone https://github.com/lucidrains/bitsandbytes
$ cd bitsandbytes
# follow the make from source instructions above
  • add tests
  • add some comments
  • figure out why tests break and make it pass
  • consider just copy pasting lion into the repo

@lucidrains lucidrains marked this pull request as draft March 9, 2023 16:09
@lucidrains lucidrains changed the title initial commit, slowly work from interface into the kernel (wip) Lion 8 bit Mar 9, 2023
@lucidrains lucidrains changed the title (wip) Lion 8 bit Lion 8 bit Mar 9, 2023
@lucidrains lucidrains marked this pull request as ready for review March 9, 2023 20:35
@lucidrains
Copy link
Contributor Author

lucidrains commented Mar 9, 2023

autoregressive enwik8 with lion8bit converged successfully (sans weight decay), even though there's some inaccuracies in kernel.cu

@lucidrains lucidrains mentioned this pull request Mar 9, 2023
@lucidrains
Copy link
Contributor Author

lucidrains commented Mar 9, 2023

Screen Shot 2023-03-09 at 1 20 12 PM

it may not be hitting the 8bit1state blockwise path though, where state is being smoothed before the update

@lucidrains
Copy link
Contributor Author

Screen Shot 2023-03-09 at 2 36 48 PM

weight decay looks good

@christallire
Copy link

resolves #150

@lucidrains
Copy link
Contributor Author

@christallire welcome you to build from source and import the Lion8bit and give it a try

seems to be working great, thanks to Tim's surrounding scaffold

@lucidrains
Copy link
Contributor Author

ok, will let this PR sit for a while to gather feedback. probably will revisit it at end of this month and see if we can get it merged

@lucidrains
Copy link
Contributor Author

i think there may still be an issue with the blockwise version, where the momentum update with beta2 is occuring before the actual parameter updates

Copy link
Collaborator

@TimDettmers TimDettmers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work on this! This looks almost good to me. The only thing that is off is the order of the updates that you point out. We can get around this without rewriting the CUDA code by using the gradient variable as temporary variable to store the current momentum value + the update value for Lion (see other comments). Please add a comment to these lines to help people understand the use of the gradient as a temporary variable (adding more comments is appreciated, sorry for the mess with the undocumented code).

The enwiki test runs are great! I think this shows that it works. It would be great to add a test in test_optim.py for Lion. You need three things for that: (1) define lion str2optimizer dictionary entry, (2) define str2statenames dictionary entry, (3) add the optimizer name to the optimizer_names list for the test_optimzier8bit function. That should be all that is needed. The only question is which 32-bit baseline to use. You might want to use your own Lion 32-bit repo. I think a fair test could also be to compare against the 32-bit bnb optimizer (since you have shown that 8-bit already replicates enwiki performance). This might be simpler and quick to write up.

csrc/kernels.cu Show resolved Hide resolved
@lucidrains
Copy link
Contributor Author

@TimDettmers thank you Tim for the feedback! 🙏 will address everything you brought up and poke you when it is ready later this week

@TimDettmers
Copy link
Collaborator

Thank you, this looks good to me. The issue with the test is expected. The error from Lion is expected to be higher at times due to its noisy update. I will merge and will have a look at the test. This is an excellent PR, thank you for all the work. I think it will be invaluable to the community!

@TimDettmers TimDettmers merged commit b0ec20c into bitsandbytes-foundation:main Apr 11, 2023
@lucidrains
Copy link
Contributor Author

lucidrains commented May 1, 2024

@TimDettmers oh hey Tim! glad to see this merged

thank you for reviewing it and getting it out there! totally forgot about it, my bad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants