Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the sensitivity function #3

Open
MoeinSorkhei opened this issue May 23, 2024 · 1 comment
Open

Questions about the sensitivity function #3

MoeinSorkhei opened this issue May 23, 2024 · 1 comment

Comments

@MoeinSorkhei
Copy link

Hello, thanks for providing the code.
I have some questions about calculating sensitivity, and I appreciate it if you could clarify them for me.

  1. What values of alpha and beta should generally be used?
  2. in your experience, how many batches should be processed for reliable estimation of sensitivity?
  3. In L181 what do the values denote? Are they the number of total tunable parameters to select?
  4. Could you explain how the sweep is performed in, and why the value of 80 is chosen in L189?
  5. can you explain this condition in L282 in your code? When I run the code it only return results with for 1.0, 0.8 and 0.6, and for smaller values the condition does not satisfy apparently.
  6. In L279, can you explain why param count is calculated in this way? What is the division by 1e6 performed?
  7. In L191 and L196, why param_num is multiplied by 0.02 and 1e6 respectively?
  8. When using LoRA, I assume the additional parameters will be merged into the original params after training is done. Is the code for that available?

Thank you in advance.

@Charleshhy
Copy link
Collaborator

Hi MoeinSorkhei,

Thanks for your interest in our work!

  1. We generally set alpha as 10. and beta as 5. See scripts here for more examples.
  2. Normally 400-800 training samples are enough. See table 8.
  3. Yes.
  4. 80 controls how you sweep the desired model parameters. A larger value would just make your searched result a bit far from your desired number of trainable parameters.
  5. That's the condition to see whether the sweeped results are close enough to your desired number of trainable parameters.
  6. 1e6 represents M, e.g., 0.8M trainable parameters.
  7. 0.02 is also controlling the sweep.
  8. This part is not implemented.

Cheers,
Haoyu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants