-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better 1.5 bit quantization #5971
Merged
Merged
+1,153
−394
Commits on Mar 11, 2024
-
Configuration menu - View commit details
-
Copy full SHA for c9e9acf - Browse repository at this point
Copy the full SHA c9e9acfView commit details -
Configuration menu - View commit details
-
Copy full SHA for cd83a7d - Browse repository at this point
Copy the full SHA cd83a7dView commit details -
iq1s_blocks16: going to blocks of 32
with 2048 lattice points, so same bpw. This is even better than blocks of 16. Should I try blocks of 64? But to keep the same bpw, when I go to 4096 lattice points, I need to remove blocks alltogether and just have superblocks of 256 weights.
Configuration menu - View commit details
-
Copy full SHA for 4c4404a - Browse repository at this point
Copy the full SHA 4c4404aView commit details -
Configuration menu - View commit details
-
Copy full SHA for c55e66f - Browse repository at this point
Copy the full SHA c55e66fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 864a5c2 - Browse repository at this point
Copy the full SHA 864a5c2View commit details -
Configuration menu - View commit details
-
Copy full SHA for f092d04 - Browse repository at this point
Copy the full SHA f092d04View commit details -
iq1s_blocks16: Metal works, Neon does not
Metal works but TG is dog slow (35 t/s). PP is OKish (493 t/s). Not seeing the bug in the Neon implementation for now.
Configuration menu - View commit details
-
Copy full SHA for fbb001e - Browse repository at this point
Copy the full SHA fbb001eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 15acc79 - Browse repository at this point
Copy the full SHA 15acc79View commit details -
iq1s_blocks16: very slightly faster TG on Metal
Still pathetic at 37 t/s
Configuration menu - View commit details
-
Copy full SHA for 8561139 - Browse repository at this point
Copy the full SHA 8561139View commit details -
Configuration menu - View commit details
-
Copy full SHA for d3da9d1 - Browse repository at this point
Copy the full SHA d3da9d1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7545d69 - Browse repository at this point
Copy the full SHA 7545d69View commit details -
iq1s_blocks16: uint32_t codebook is also better in CUDA
TG-128 is now 204 t/s up from 194 t/s. PP-512 is 5890 t/s, so significantly better than other quants
Configuration menu - View commit details
-
Copy full SHA for 156220f - Browse repository at this point
Copy the full SHA 156220fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 101b18d - Browse repository at this point
Copy the full SHA 101b18dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 34bc21f - Browse repository at this point
Copy the full SHA 34bc21fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9d83171 - Browse repository at this point
Copy the full SHA 9d83171View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.