Storing both float32 and int parameters #22

huu4ontocord · 2022-03-22T01:25:04Z

Hi

It looks like at least in the HF code, you are storing both the float32 AND the int weights, which would increase the memory footprint. Don't you want to either load one or the other, or at least have an option to quanitize and send to cuda or something like that, where you would clear the float32 version or int version and send to cuda, thus lowering the memory footprint. Alternately you could overload the 'to' (or 'cuda'? or whatever method is used to convert to cuda) to only move over only the right parameters?

Thanks

huu4ontocord added the bug Something isn't working label Mar 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storing both float32 and int parameters #22

Storing both float32 and int parameters #22

huu4ontocord commented Mar 22, 2022

Storing both float32 and int parameters #22

Storing both float32 and int parameters #22

Comments

huu4ontocord commented Mar 22, 2022