Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing both float32 and int parameters #22

Open
huu4ontocord opened this issue Mar 22, 2022 · 0 comments
Open

Storing both float32 and int parameters #22

huu4ontocord opened this issue Mar 22, 2022 · 0 comments
Labels
bug Something isn't working

Comments

@huu4ontocord
Copy link

Hi

It looks like at least in the HF code, you are storing both the float32 AND the int weights, which would increase the memory footprint. Don't you want to either load one or the other, or at least have an option to quanitize and send to cuda or something like that, where you would clear the float32 version or int version and send to cuda, thus lowering the memory footprint. Alternately you could overload the 'to' (or 'cuda'? or whatever method is used to convert to cuda) to only move over only the right parameters?

Thanks

@huu4ontocord huu4ontocord added the bug Something isn't working label Mar 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant