Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better 1.5 bit quantization #5971

Merged
merged 15 commits into from
Mar 11, 2024
Merged

Better 1.5 bit quantization #5971

merged 15 commits into from
Mar 11, 2024

Commits on Mar 11, 2024

  1. Configuration menu
    Copy the full SHA
    c9e9acf View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    cd83a7d View commit details
    Browse the repository at this point in the history
  3. iq1s_blocks16: going to blocks of 32

    with 2048 lattice points, so same bpw.
    This is even better than blocks of 16.
    Should I try blocks of 64? But to keep the same
    bpw, when I go to 4096 lattice points, I need to
    remove blocks alltogether and just have superblocks of
    256 weights.
    Kawrakow committed Mar 11, 2024
    Configuration menu
    Copy the full SHA
    4c4404a View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    c55e66f View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    864a5c2 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    f092d04 View commit details
    Browse the repository at this point in the history
  7. iq1s_blocks16: Metal works, Neon does not

    Metal works but TG is dog slow (35 t/s). PP is OKish (493 t/s).
    Not seeing the bug in the Neon implementation for now.
    Kawrakow committed Mar 11, 2024
    Configuration menu
    Copy the full SHA
    fbb001e View commit details
    Browse the repository at this point in the history
  8. iq1s_blocks16: fixed Neon

    Kawrakow committed Mar 11, 2024
    Configuration menu
    Copy the full SHA
    15acc79 View commit details
    Browse the repository at this point in the history
  9. iq1s_blocks16: very slightly faster TG on Metal

    Still pathetic at 37 t/s
    Kawrakow committed Mar 11, 2024
    Configuration menu
    Copy the full SHA
    8561139 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    d3da9d1 View commit details
    Browse the repository at this point in the history
  11. Formatting

    Kawrakow committed Mar 11, 2024
    Configuration menu
    Copy the full SHA
    7545d69 View commit details
    Browse the repository at this point in the history
  12. iq1s_blocks16: uint32_t codebook is also better in CUDA

    TG-128 is now 204 t/s up from 194 t/s.
    PP-512 is 5890 t/s, so significantly better than other quants
    Kawrakow committed Mar 11, 2024
    Configuration menu
    Copy the full SHA
    156220f View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    101b18d View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    34bc21f View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    9d83171 View commit details
    Browse the repository at this point in the history