-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support more dtypes #256
Comments
Hey thanks, First of all yes, I think support for I think safetensors needs AI specific dtypes, one of the reason to justify its existence was the absence of So adding dtypes is definitely on the table. Why isn't it already merged:
The fact that 2/ exists (and yields much better qualitative results than qx_y_z in our internal tests) and that 4/ is true seems to suggest to me that maybe That's my current personal reasoning around delaying full support for Now arbitrary dtype.Allowing arbitrary dtype means:
Overall I think supporting TL;DRGPTQ does quantization with a different layout than GGML, which works already out of the box with safetensors. Unless there's a proved sizable benefit to doing NB: GPTQ lies about having int32 tensors which are in fact uint4, adding int4 would be OK in my book. NBB: Also for adding dtypes, I think having several distinct widely used libraries helps with justifying adding new dtypes. Currently safetensors doesn't support any sparse tensor, there's also a lot of different layouts in different libraries, with none (afaik) having super widely used models on top. |
First, thanks for your time and for a very detailed answer (lots of interesting information which I was not aware of). I see where you are heading and I sort of agree but what I'm requesting should be fine with everything you said. Let me explain briefly (and honestly).
To be a bit more clear, I don't request any changes in the rust implementation except extending the enum and that should be it. You can even fail if someone tries to read that data, I don't request full support. EDIT: I have a feeling that people in the other thread might be fine with this solution as well. I believe they also never manipulate the tensors and only do the pass-through because GGML does everything. |
@Narsil do you have any comments on this? GGML just got support for its own format in the core C lib, so the choice is currently:
I would love to use safetensors format, but I cannot just sit and wait... |
The only thing which is supported right now is the export/import of the computational graph. Everything else (versions, hparams, vocabs) is getting discussed in the thread (but it will probably happen). I think we can close this issue. |
You can do this without breaking the spec, although tools that deal with the data would have to be aware of the approach I'm about to describe. You can just put your data in |
Yes, that should work but it's a hack, not a solution :) |
True... I'd say adding dtypes for stuff like GGML would probably have to involve something similar though. I mean, formats get added/removed relatively often. Also Seems like it would be impractical for container formats to try to manage something like that. I feel like the only way to do it in a practical way would be to do something like make the "hack" official and allow the dtype to be a string and the data to be whatever. You might skip having to rely on the K/V metadata stuff that way, but generally it wouldn't really be a much different concept. |
Yes, I believe this is the best approach. I think whoever is using GGML is simply passing that pointer somewhere else, and because of that, I don't think it's necessary to be able to obtain "valid" views. I know rust people care a lot about safety but I think this is inevitable in the long run. |
If I'm understanding correctly, I don't really think this makes a difference from a Rust safety perspective. Whether there's an official dtype So that's not a Rust safety thing. |
Oh, I've just checked the code and you are probably right, I thought that the dtype was bound to the TensorView, but looking at it now, it only returns Sorry for the confusion. |
Support for torch's |
Very interesting I wasn't aware of those ! Seems quantization is in beta: https://pytorch.org/docs/stable/quantization.html. I see need to figure out where all those items are stored currently and stuff like this. |
Afaiu they're currently just metadata on the tensors you can access with the abovementioned functions. I started some preliminary work on supporting them before realizing how hard-coded the current dtypes are and that you can't easily attach metadata to each tensor. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
It looks like the format is tightly coupled to what can be returned as TensorView.
This is problematic if you want to introduce new dtypes, like
Q8_0
from ggml, which is not strictly just&[u8]
(it also includes onef16
for every 32 bytes). OtherQX_X
formats would be even more problematic.If safetensors is supposed to be the one, shared format, then it needs to support unknown dtypes.
This is a little bit related to #254 but the discussion there went in a very different way so I think it's more appropriate to open a new issue.
The text was updated successfully, but these errors were encountered: