-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate alternative approach for Q4 quantization #397
Comments
@ggerganov |
Might worth reading A Survey of Quantization Methods for Efficient Neural Network Inference |
Low-bit Quantization of Neural Networks for Efficient Inference deals with 4-bit quantization specifically. As a smaller step, I can think of these optimizations:
|
This paper is also relevant: Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference, although this one deals with int8 |
This should read
This should read |
I came up with a script that's able to compute RMS for various quantization methods - maybe it will come handy for experimenting: https://gist.github.com/prusnak/f54f8f33503458ca1aa9883f71897072 |
I was experimenting with grid search to find a better offset and scaling factor, but it does not seem to produce much better results than simply doing Q4_1. This doesn't justify making the whole process much slower ( Pseudocode: n_search = 30
data_min = min(data)
data_max = max(data)
search_step = (data_max - data_min) / n_search
for min_value in range(data_min, data_max, search_step):
for max_value in range(min_value + search_step, data_max, search_step):
perform Q4_1 but use min_value as offset and (max_value - min_value) as scaling_factor
measure RMS
when RMS is better than everything we've seen so far, store the result of this Q4_1 run
return the best Q4_1 run Maybe someone can come up with a better grid search? |
I also found the Lloyd-Max algorithm, but this one creates non-uniform quantization, which is no go for our usecase, I assume. Is that correct? Resources: |
I'm playing around with local search for the q4_1 parameters now, with something like the following approximately in place of the inner loop of round_block(pp, x + i*QK, min, d);
float err = sq_error(pp, x + i*QK, min, d), err0=err;
int step_count = 0;
while(1) {
++step_count;
// const float next_mins[4] = { min*1.001f, min/1.001f, min, min };
// const float next_ds[4] = { d, d, d*1.001f, d/1.001f };
for (int i=0; i<16; ++i) {
// const float next_min = next_mins[i];
// const float next_d = next_ds[i];
const float next_min = min * (0.99f + 0.0002f*(rand()%100)); //next_mins[i];
const float next_d = d * (0.99f + 0.0002f*(rand()%100));//next_ds[i];
round_block(pp, x + i*QK, next_min, next_d);
float next_err = sq_error(pp, x + i*QK, next_min, next_d);
if (next_err < err) {
min = next_min;
d = next_d;
err = next_err;
goto quantize_row_q4_1_opt_next;
}
}
break;
quantize_row_q4_1_opt_next:;
}
static float rer = 0.0f;
rer = 0.001*(err/err0) + 0.999*rer;
printf("q: %d steps, err ratio %.3f, running %.3f\n", step_count, err/err0, rer);
round_block(pp, x + i*QK, min, d); I found that the square error is indeed reduced by this, in a way that's quite sensitive to the parameter the loop over (Yes, I'm aware I could do better picking a random direction, or even N deterministic ones, than that. I promise I'll make it less silly if it ever makes it into a PR.) |
@prusnak Do you know, which distribution llama weights have? Why do you use uniform distribution for tests? Here is my suggestion: What we could have is QK=128 (or even 256?) and 16 independent fp16 values. These fp16 values forms lookup table. Each weight is quantized to 4 bit, but its value is used as key in the lookup table (I know lookup table might be implemented using AVX). 16 fp16 values need to be adjusted for minimizing RMS error. |
Any quantization method that can be evaluated efficiently works
This is interesting - do you know if it can be implemented with ARM NEON too? I had some similar ideas - instead of the existing linear mapping Q4_0 of the "normally" distributed weights, make a simple transform of the data so you get more uniform distribution and quantize that instead. The transform has to be able to evaluate efficiently, so some approximation of uniform distribution would work. Currently, the "outer" quantization bins are much less "utilized" compared to the "inner" (i.e. around the zero). |
I'm not familiar with NEON, but it does have the vtbl/vtbx instructions which permutes bytes like vpshufb. I've used the latter to do in-register lookup tables. |
If weights have a normal distribution, then I believe this approach is worth trying:
That way our quantisation will cover ~95% values for We can try different values of |
Reading the comments above - yeah, if we can efficiently implement a lookup table |
I'm stuck with older codebase, so I modified the old quantize method in
This works, although process is slower by 100 times.
Output of 7B model is nearly identical. At least it's not broken but I don't know if it's an improvement. This is just an experiment so you can decide if it's worth doing. |
For the case of int4 on AVX, you can (ab)use Let me come up with an instruction sequence for AVX2... |
Maybe of our interest https://github.com/TimDettmers/bitsandbytes |
The CDF of a normal distribution is given by:
So if we compute the intermediate weights diff --git a/ggml.c b/ggml.c
index c9a4e86..cc62e49 100644
--- a/ggml.c
+++ b/ggml.c
@@ -449,6 +449,8 @@ static inline __m128i packNibbles( __m256i bytes )
// blocks of QK elements
// represented with a single float (delta) and QK/2 8-bit ints (i.e QK 4-bit signed integer factors)
+#define TF(x, sig) (0.5*(1.0f + erf((x/sig)/sqrtf(2.0f))) - 0.5f)
+
// reference implementation for deterministic creation of model files
static void quantize_row_q4_0_reference(const float * restrict x, void * restrict y, int k) {
assert(k % QK == 0);
@@ -461,11 +463,17 @@ static void quantize_row_q4_0_reference(const float * restrict x, void * restric
uint8_t pp[QK/2];
+ double sig = 0.0;
+ for (int i = 0; i < k; i++) {
+ sig += x[i]*x[i];
+ }
+ sig = sqrt(sig/k);
+
for (int i = 0; i < nb; i++) {
float amax = 0.0f; // absolute max
for (int l = 0; l < QK; l++) {
- const float v = x[i*QK + l];
+ const float v = TF(x[i*QK + l], sig);
amax = MAX(amax, fabsf(v));
}
@@ -476,8 +484,8 @@ static void quantize_row_q4_0_reference(const float * restrict x, void * restric
pd += bs;
for (int l = 0; l < QK; l += 2) {
- const float v0 = x[i*QK + l + 0]*id;
- const float v1 = x[i*QK + l + 1]*id;
+ const float v0 = TF(x[i*QK + l + 0], sig)*id;
+ const float v1 = TF(x[i*QK + l + 1], sig)*id;
const uint8_t vi0 = ((int8_t) (round(v0))) + 8;
const uint8_t vi1 = ((int8_t) (round(v1))) + 8; ./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
tok_embeddings.weight - [ 4096, 32000], type = f16 quantizing .. size = 500.00 MB -> 78.12 MB | hist: 0.000 0.050 0.069 0.069 0.069 0.069 0.069 0.069 0.069 0.069 0.069 0.069 0.069 0.069 0.069 0.050
norm.weight - [ 4096, 1], type = f32 size = 0.016 MB
output.weight - [ 4096, 32000], type = f16 quantizing .. size = 500.00 MB -> 78.12 MB | hist: 0.000 0.050 0.068 0.069 0.069 0.069 0.070 0.070 0.070 0.070 0.070 0.070 0.069 0.069 0.068 0.049
layers.0.attention.wq.weight - [ 4096, 4096], type = f16 quantizing .. size = 64.00 MB -> 10.00 MB | hist: 0.000 0.042 0.057 0.064 0.069 0.074 0.076 0.078 0.079 0.078 0.076 0.073 0.069 0.064 0.057 0.042
layers.0.attention.wk.weight - [ 4096, 4096], type = f16 quantizing .. size = 64.00 MB -> 10.00 MB | hist: 0.000 0.044 0.061 0.066 0.070 0.072 0.074 0.075 0.075 0.075 0.074 0.072 0.070 0.066 0.061 0.044
layers.0.attention.wv.weight - [ 4096, 4096], type = f16 quantizing .. size = 64.00 MB -> 10.00 MB | hist: 0.000 0.050 0.067 0.067 0.068 0.069 0.070 0.072 0.073 0.072 0.070 0.069 0.068 0.067 0.067 0.050
layers.0.attention.wo.weight - [ 4096, 4096], type = f16 quantizing .. size = 64.00 MB -> 10.00 MB | hist: 0.000 0.052 0.056 0.058 0.064 0.071 0.077 0.081 0.082 0.081 0.077 0.071 0.064 0.058 0.056 0.052
layers.0.feed_forward.w1.weight - [ 4096, 11008], type = f16 quantizing .. size = 172.00 MB -> 26.88 MB | hist: 0.000 0.050 0.069 0.069 0.069 0.069 0.069 0.070 0.070 0.069 0.069 0.069 0.069 0.069 0.069 0.050
layers.0.feed_forward.w2.weight - [11008, 4096], type = f16 quantizing .. size = 172.00 MB -> 26.88 MB | hist: 0.000 0.050 0.069 0.069 0.069 0.069 0.070 0.070 0.070 0.070 0.069 0.069 0.069 0.069 0.069 0.050
layers.0.feed_forward.w3.weight - [ 4096, 11008], type = f16 quantizing .. size = 172.00 MB -> 26.88 MB | hist: 0.000 0.050 0.069 0.069 0.069 0.069 0.070 0.070 0.070 0.070 0.069 0.069 0.069 0.069 0.069 0.050
layers.0.attention_norm.weight - [ 4096, 1], type = f32 size = 0.016 MB
layers.0.ffn_norm.weight - [ 4096, 1], type = f32 size = 0.016 MB
layers.1.attention.wq.weight - [ 4096, 4096], type = f16 quantizing .. size = 64.00 MB -> 10.00 MB | hist: 0.000 0.049 0.068 0.069 0.069 0.070 0.070 0.070 0.070 0.070 0.070 0.070 0.069 0.069 0.068 0.049
layers.1.attention.wk.weight - [ 4096, 4096], type = f16 quantizing .. size = 64.00 MB -> 10.00 MB | hist: 0.000 0.049 0.067 0.069 0.069 0.070 0.070 0.070 0.070 0.070 0.070 0.070 0.069 0.069 0.068 0.049 The bins are now much more evenly utilized. |
I am wondering how we could update the algorithm so also the first bin is utilized, currently it is unused.
Maybe we can try computing the mean too and store both Most probably the storing of |
Basic idea for a int4->f16 lookup table on AVX2: Low nibbles:
High nibbles:
Result:
You can get away without the final unshuffle if you preshuffle the input int4s instead. I'm familiar with ARM, but not so much NEON. (The perils of embedded programming.) That being said, it looks very similar at a first glance, with the following wrinkles:
It looks like you can swap A quick attempt is at https://godbolt.org/z/G74oYP8nK. Not perfect - shuffles tend to be slow (throughput of 1/2) so I suggest storing the int4 table interleaved, and I haven't actually tested this, just stared at the output assembly for a bit - but may be good enough. |
The reason why the first bin is not utilized is due to the following. Ignore floating-point rounding for a moment:
...and then something will be assigned to bin 0 if
...but here we have a contradiction. Because This is because we're dealing with a signed bin, not an unsigned bin, and so the correct number is 15/2, not 7. See below. For an optimal placement, assuming that the inputs have been transformed into a uniformish distribution with a maximum value of
Tl:DR: try this: const float d = amax * (2.0f / 15.0f);
const float id2 = amax ? 8.0f/amax : 0.0f; // not 1/d!
[...]
const float v0 = TF(x[i*QK + l + 0], sig)*id2; // [-amax..amax] -> [-8..=8]
const float v1 = TF(x[i*QK + l + 1], sig)*id2; // [-amax..amax] -> [-8..=8]
// Edge case handling: if v0 == 8 (because input value == amax exactly), then we'd end up with +16 as a result.
// Deal with it by rounding this case down to 7.
// Ditto, due to rounding abs(v0) can end up slightly larger than 8. Preemptively fix up if so.
// Any value in [7..<8] works.
const float BELOW8 = 7.99999952316f; // nextbefore(8.0f) // 7.0f
const float v02 = min(max(v0, -8.0f), BELOW8); // [-8..=8] -> [-8..8]
const float v12 = min(max(v1, -8.0f), BELOW8); // [-8..=8] -> [-8..8]
const uint8_t vi0 = ((int8_t) (floor(v02))) + 8; // [-8..8] -> [0..16]
const uint8_t vi1 = ((int8_t) (floor(v12))) + 8; // [-8..8] -> [0..16] Note that (This does end up "always" shrinking the maximum or minimum value by |
Just in case if this detail was left unnoticed, the code I shared above that adjusts
Histogram is still a bit skewed to the right but it's much more symmetric. |
I tried to run the previously mentioned Q4_1 quantization method with some number of local relaxation steps to reduce the square error (down to 83% of the naive computation's error on average), but the result did not appear to improve on perplexity for Wikitext, being within 0.01 of naive Q4_1's after 30 steps (which I argued here to be sufficient for a preliminary estimate):
Along with this, I had some lower-quality data suggesting that just throwing out 1 min and max outlier when this improved square error actually made perplexity worse (by about 0.05). My current hypothesis is that perhaps it matters more to accurately represent weights that are further away from 0, as those wind up influencing the final dot product more. I want to try only throwing away the value closest to 0 next. |
Perhaps Posit arithmetic could be valuable? http://www.johngustafson.net/pdfs/BeatingFloatingPoint.pdf |
I used the following patch to compute histograms of the model: diff --git a/convert-pth-to-ggml.py b/convert-pth-to-ggml.py
index ccf2c57..a17a3c2 100644
--- a/convert-pth-to-ggml.py
+++ b/convert-pth-to-ggml.py
@@ -22,6 +22,9 @@ import struct
import numpy as np
import torch
+from matplotlib import pyplot as plt
+idx = 0
+
from sentencepiece import SentencePieceProcessor
def parse_args():
@@ -124,6 +127,13 @@ def process_and_write_variables(fout, model, ftype):
fout.write(sname)
# data output to file
+ hist, bins = np.histogram(data, bins=100)
+ plt.stairs(hist, bins)
+ global idx
+ plt.savefig(f"hist_{idx:08}.png")
+ plt.clf()
+ idx += 1
+
data.tofile(fout)
def main(): From quickly inspecting the results it seems that most of the layers indeed have normal distribution around mean 0.0, but there are also around 20% of layers which have mean != 0.0. Attaching the zipped histograms: hist.zip |
Hey guys, what is the official name for the compression method used for Q4? |
Page 7 of this paper has a relatively efficient iterative procedure based on discrete calculus to find the scale factor for the minimum RMS quantisation error. They use it for a FPGA friendly two step log quantiser, but the math should work for a scaled uniform quantiser too. https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136710657.pdf |
I've finally had the time to implement some of the ideas I mentioned previously, and though it may be of purely academic interest, I'd like to share some results. I implemented 3 types of significance coding strategies for unary exponent coding:
Until a pre-determined percentage of weights are deemed significant, 4 options (delta, bitset and binary partitioning with 2 different thresholds) are tried and the best one is chosen. Afterwards, the bitset encoding strategy is always used. Testing on the 7B model, for the 65 small layers with 4096 weights that don't really follow the same weight distribution, an optional preprocessing step that performs variable rounding was also implemented. Results below:
As we go down in average bits-per-weight, we see that even though as expected RMSE scales almost perfectly, the maximum absolute error explodes rather quickly, influenced by those small layers. If we choose to skip quantizing them, the results are much better:
Now, obviously, the most interesting aspect of this approach is not using it in CBR mode, but instead to use a VBR mode where the encoder stops whenever a certain metric is achieved. Possible useful metrics to try are RMSE, maximum and mean absolute errors, all divided by the range for each layer. Assuming a metric that translates well to perplexity degradation is used, that would allow us to get the smallest possible model size that still retains the quality we want. The obvious elephant in the room here is that this encoding would have to be decoded and stored in memory in a custom sparse-capable format, so in practice it would probably only be useful with a very sparse and heavily quantized 65B model. For future reference, doing a lossless encoding of the 7B model with this method requires an average of 13,76 bits-per-weight, so better than using the usual general-purpose compression algorithms, and only slightly behind a simple context-mixing compression algorithm. |
Now that the significance coding method (SCM) provides a good baseline for what should be achievable, I ran a few experiments to see how close to it I could get with a simple q4_1-like encoding. The range of values per block can be encoded in 10 bits (5 bits for the exponent and 5 bits for the mantissa) and the minimum value per block can be encoded in 12 bits (1 sign bit, 5 exponent bits, 6 mantissa bits). That leaves us 2 bits, that I'm using to index into a lookup table of quantizer step mappings, so that we can pick the one that either provides the smallest squared error or the smallest absolute error. The first entry in the LUT is the original linear mapping, and then I'm using 3 different logit-like mappings:
At the default block size (QK) of 32, this works out to 4.75 bits-per-weigth (bpw). Now for some quick results, when optimizing the mapping choice for RMSE:
At 4.75 bpw, the RMSE is ever so slightly better than the improved q4_0 method with 2 fp16 at 5 bpw, and significantly outperforms it in terms of MAE. However, it is still far from the result of SCM, until we start increasing the block size. At QK = 128, it still performs better than the original Q4_0 method, and is much closer to SCM. Crucially, when compared to SCM, dequantization is much faster, and the LUT size can be cut in half by making use of the symmetry in the mappings. |
Thank you for the detailed analysis. Few notes:
The 1D normalization layers (i.e.
How often do we end up choosing either of the 4 mappings? I will need some time to get into all the details provided here (and also from other users), but I like the various ideas that are being generated. Just an update, the short-term plan is to try to implement #995 efficiently. After that, we can try to apply some RMSE optimizing strategy to additionally bring the perplexity down. |
As the block size increases, we're getting a better approximation of the distribution, and hence the most "skewed" mapping (map3) is increasingly favored, and the "backup" linear mapping is almost never used. However, it's important to note that very skewed mappings may not always be better, especially if MAE is a main factor for perplexity. Here's a run at QK=128 with the most "conservative" non-linear mapping (map1) replaced with an even more skewed mapping than map3:
Here, even if we get a very small improvement to RMSE (0.002036 => 0.002017), the MAE increases by a non-negligible amount (0.139648 => 0.158203). It should also be considered that until proper perplexity measurements taken in controlled, reproducible runs are available, comparing on RMSE and/or MAE alone might not reflect the characteristics of all the different quantization strategies proposed. |
I ran some perplexity tests on SCM and the non-linear mapping quantization method (NLM), here are some results: LLaMA-7B
SCM @ 6bpw[1]4.2472,[2]4.7509,[3]5.6108,[4]6.2010,[5]6.3296,[6]6.2916,[7]6.4857,[8]6.5790,[9]6.8988,[10]7.1445,[11]7.3559,[12]7.3710,[13]7.2867,[14]7.3399,[15]7.5808,[16]7.2083,[17]7.0967,[18]7.0441,[19]6.6939,[20]6.6843,[21]6.5953,[22]6.4216,[23]6.3900,[24]6.2996,[25]6.3004,[26]6.1424,[27]5.9721,[28]5.8749,[29]5.7887,[30]5.6339,[31]5.6051,[32]5.6264,[33]5.5719,[34]5.6020,[35]5.6253,[36]5.6611,[37]5.6657,[38]5.6747,[39]5.7071,[40]5.7572,[41]5.7669,[42]5.8043,[43]5.7666,[44]5.8232,[45]5.8260,[46]5.8001,[47]5.8202,[48]5.7954,[49]5.7969,[50]5.7581,[51]5.7545,[52]5.7447,[53]5.7894,[54]5.7738,[55]5.7517,[56]5.7805,[57]5.7997,[58]5.8189,[59]5.8355,[60]5.8766,[61]5.8701,[62]5.9277,[63]5.9586,[64]5.9716,[65]6.0135,[66]6.0206,[67]6.0374,[68]6.0511,[69]6.0743,[70]6.1041,[71]6.1249,[72]6.1559,[73]6.2136,[74]6.2178,[75]6.2313,[76]6.2431,[77]6.2540,[78]6.2395,[79]6.2666,[80]6.2598,[81]6.2708,[82]6.2754,[83]6.2259,[84]6.2086,[85]6.1956,[86]6.1747,[87]6.1102,[88]6.0848,[89]6.0654,[90]6.0511,[91]6.0730,[92]6.0675,[93]6.0673,[94]6.0647,[95]6.0922,[96]6.0920,[97]6.0869,[98]6.0810,[99]6.0674,[100]6.0663,[101]6.0895,[102]6.0843,[103]6.1045,[104]6.1118,[105]6.1116,[106]6.1277,[107]6.1267,[108]6.1401,[109]6.1353,[110]6.1319,[111]6.1540,[112]6.1741,[113]6.1762,[114]6.1727,[115]6.1784,[116]6.1699,[117]6.1750,[118]6.2028,[119]6.2240,[120]6.2582,[121]6.2727,[122]6.2969,[123]6.3327,[124]6.3497,[125]6.3408,[126]6.3793,[127]6.4147,[128]6.4441,[129]6.4292,[130]6.4376,[131]6.4341,[132]6.4267,[133]6.4133,[134]6.4230,[135]6.4191,[136]6.4090,[137]6.4018,[138]6.3845,[139]6.3743,[140]6.3706,[141]6.3415,[142]6.3379,[143]6.3086,[144]6.2888,[145]6.2794,[146]6.2676,[147]6.2710,[148]6.2711,[149]6.2658,[150]6.2617,[151]6.2637,[152]6.2540,[153]6.2378,[154]6.2297,[155]6.2363,[156]6.2317,[157]6.2484,[158]6.2528,[159]6.2572,[160]6.2596,[161]6.2711,[162]6.2433,[163]6.2318,[164]6.2089,[165]6.1788,[166]6.1523,[167]6.1161,[168]6.0859,[169]6.0724,[170]6.0619,[171]6.0360,[172]6.0196,[173]6.0035,[174]5.9743,[175]5.9532,[176]5.9418,[177]5.9224,[178]5.9003,[179]5.8839,[180]5.8748,[181]5.8540,[182]5.8367,[183]5.8233,[184]5.8224,[185]5.8152,[186]5.8159,[187]5.8220,[188]5.8182,[189]5.8352,[190]5.8360,[191]5.8564,[192]5.8723,[193]5.8886,[194]5.8995,[195]5.9201,[196]5.9352,[197]5.9557,[198]5.9703,[199]5.9734,[200]5.9782,[201]5.9731,[202]5.9916,[203]5.9989,[204]5.9977,[205]6.0079,[206]6.0146,[207]6.0108,[208]6.0189,[209]6.0228,[210]6.0280,[211]6.0383,[212]6.0452,[213]6.0556,[214]6.0580,[215]6.0603,[216]6.0741,[217]6.0921,[218]6.1053,[219]6.1050,[220]6.1015,[221]6.0970,[222]6.0947,[223]6.0852,[224]6.0781,[225]6.0744,[226]6.0946,[227]6.1025,[228]6.1076,[229]6.1136,[230]6.1104,[231]6.1266,[232]6.1152,[233]6.0991,[234]6.0848,[235]6.0653,[236]6.0590,[237]6.0497,[238]6.0527,[239]6.0385,[240]6.0285,[241]6.0306,[242]6.0340,[243]6.0325,[244]6.0216,[245]6.0188,[246]6.0081,[247]5.9967,[248]5.9897,[249]5.9874,[250]5.9918,[251]5.9849,[252]5.9814,[253]5.9719,[254]5.9667,[255]5.9558,[256]5.9384,[257]5.9265,[258]5.9187,[259]5.9167,[260]5.9088,[261]5.9047,[262]5.8994,[263]5.8939,[264]5.8717,[265]5.8712,[266]5.8699,[267]5.8634,[268]5.8724,[269]5.8704,[270]5.8714,[271]5.8789,[272]5.8822,[273]5.8823,[274]5.8848,[275]5.8930,[276]5.8988,[277]5.9143,[278]5.9238,[279]5.9330,[280]5.9359,[281]5.9455,[282]5.9513,[283]5.9655,[284]5.9734,[285]5.9819,[286]5.9950,[287]5.9946,[288]6.0004,[289]5.9923,[290]5.9772,[291]5.9627,[292]5.9482,[293]5.9351,[294]5.9372,[295]5.9362,[296]5.9409,[297]5.9397,[298]5.9427,[299]5.9401,[300]5.9297,[301]5.9299,[302]5.9222,[303]5.9137,[304]5.9056,[305]5.9022,[306]5.8899,[307]5.8920,[308]5.8951,[309]5.8798,[310]5.8746,[311]5.8682,[312]5.8704,[313]5.8650,[314]5.8634,[315]5.8483,[316]5.8433,[317]5.8275,[318]5.8076,[319]5.8193,[320]5.8312,[321]5.8358,[322]5.8319,[323]5.8254,[324]5.8227,[325]5.8327,[326]5.8328,[327]5.8347,[328]5.8384,[329]5.8440,[330]5.8464,[331]5.8584,[332]5.8558,[333]5.8625,[334]5.8571,[335]5.8511,[336]5.8547,[337]5.8524,[338]5.8518,[339]5.8468,[340]5.8426,[341]5.8504,[342]5.8530,[343]5.8577,[344]5.8579,[345]5.8585,[346]5.8561,[347]5.8599,[348]5.8631,[349]5.8655,[350]5.8623,[351]5.8631,[352]5.8632,[353]5.8577,[354]5.8577,[355]5.8627,[356]5.8656,[357]5.8621,[358]5.8711,[359]5.8736,[360]5.8701,[361]5.8698,[362]5.8767,[363]5.8876,[364]5.8936,[365]5.8986,[366]5.8999,[367]5.9081,[368]5.9057,[369]5.9067,[370]5.9082,[371]5.9029,[372]5.9077,[373]5.9121,[374]5.9105,[375]5.9106,[376]5.9171,[377]5.9128,[378]5.9154,[379]5.9212,[380]5.9134,[381]5.9101,[382]5.9051,[383]5.9044,[384]5.9039,[385]5.9030,[386]5.9026,[387]5.9025,[388]5.8990,[389]5.8941,[390]5.8873,[391]5.8799,[392]5.8759,[393]5.8741,[394]5.8768,[395]5.8756,[396]5.8685,[397]5.8756,[398]5.8793,[399]5.8870,[400]5.8872,[401]5.8887,[402]5.8897,[403]5.8917,[404]5.8981,[405]5.8889,[406]5.8857,[407]5.8853,[408]5.8869,[409]5.8982,[410]5.9089,[411]5.9198,[412]5.9353,[413]5.9458,[414]5.9532,[415]5.9587,[416]5.9662,[417]5.9778,[418]5.9814,[419]5.9881,[420]5.9969,[421]6.0083,[422]6.0122,[423]6.0191,[424]6.0296,[425]6.0380,[426]6.0443,[427]6.0487,[428]6.0567,[429]6.0617,[430]6.0697,[431]6.0833,[432]6.0871,[433]6.0864,[434]6.0824,[435]6.0832,[436]6.0857,[437]6.0951,[438]6.1025,[439]6.0994,[440]6.0985,[441]6.0936,[442]6.0922,[443]6.0935,[444]6.0939,[445]6.0921,[446]6.0944,[447]6.0973,[448]6.1012,[449]6.0988,[450]6.0996,[451]6.0957,[452]6.0821,[453]6.0738,[454]6.0682,[455]6.0693,[456]6.0739,[457]6.0758,[458]6.0736,[459]6.0741,[460]6.0826,[461]6.0799,[462]6.0785,[463]6.0824,[464]6.0812,[465]6.0787,[466]6.0710,[467]6.0712,[468]6.0710,[469]6.0730,[470]6.0734,[471]6.0687,[472]6.0730,[473]6.0679,[474]6.0689,[475]6.0629,[476]6.0646,[477]6.0573,[478]6.0562,[479]6.0619,[480]6.0664,[481]6.0683,[482]6.0639,[483]6.0599,[484]6.0619,[485]6.0600,[486]6.0543,[487]6.0540,[488]6.0517,[489]6.0472,[490]6.0448,[491]6.0420,[492]6.0364,[493]6.0338,[494]6.0321,[495]6.0317,[496]6.0279,[497]6.0225,[498]6.0207,[499]6.0165,[500]6.0075,[501]6.0010,[502]6.0012,[503]6.0006,[504]5.9922,[505]5.9943,[506]5.9950,[507]5.9892,[508]5.9853,[509]5.9847,[510]5.9880,[511]5.9925,[512]5.9960,[513]5.9980,[514]6.0042,[515]5.9989,[516]5.9980,[517]5.9990,[518]5.9988,[519]6.0016,[520]6.0040,[521]6.0054,[522]6.0081,[523]6.0087,[524]6.0145,[525]6.0177,[526]6.0185,[527]6.0204,[528]6.0155,[529]6.0160,[530]6.0110,[531]6.0099,[532]6.0145,[533]6.0168,[534]6.0152,[535]6.0173,[536]6.0120,[537]6.0099,[538]6.0147,[539]6.0157,[540]6.0194,[541]6.0197,[542]6.0208,[543]6.0223,[544]6.0234,[545]6.0215,[546]6.0222,[547]6.0182,[548]6.0135,[549]6.0136,[550]6.0108,[551]6.0075,[552]6.0053,[553]6.0017,[554]5.9998,[555]5.9968,[556]5.9965,[557]5.9987,[558]5.9949,[559]5.9944,[560]5.9943,[561]5.9945,[562]5.9923,[563]5.9920,[564]5.9960,[565]5.9981,[566]5.9980,[567]5.9959,[568]5.9964,[569]5.9951,[570]5.9979,[571]5.9983,[572]5.9993,[573]5.9993,[574]5.9959,[575]5.9954,[576]5.9954,[577]5.9940,[578]5.9921,[579]5.9927,[580]5.9864,[581]5.9828,[582]5.9817,[583]5.9826,[584]5.9829,[585]5.9755,[586]5.9690,[587]5.9695,[588]5.9743,[589]5.9795,[590]5.9825,[591]5.9846,[592]5.9834,[593]5.9802,[594]5.9813,[595]5.9791,[596]5.9823,[597]5.9803,[598]5.9775,[599]5.9797,[600]5.9792,[601]5.9778,[602]5.9787,[603]5.9816,[604]5.9824,[605]5.9858,[606]5.9879,[607]5.9862,[608]5.9831,[609]5.9839,[610]5.9873,[611]5.9856,[612]5.9881,[613]5.9845,[614]5.9797,[615]5.9727,[616]5.9754,[617]5.9695,[618]5.9648,[619]5.9596,[620]5.9463,[621]5.9397,[622]5.9381,[623]5.9397,[624]5.9402,[625]5.9403,[626]5.9392,[627]5.9415,[628]5.9416,[629]5.9412,[630]5.9442,[631]5.9498,[632]5.9554,[633]5.9540,[634]5.9574,[635]5.9580,[636]5.9547,[637]5.9513,[638]5.9538,[639]5.9506,[640]5.9516,[641]5.9518,[642]5.9583,[643]5.9604,[644]5.9617,[645]5.9599,[646]5.9638,[647]5.9599,[648]5.9607,[649]5.9609,[650]5.9646,[651]5.9698,[652]5.9709,[653]5.9748,[654]5.9686,[655]5.9681,
SCM @ 5bpwsystem_info: n_threads = 12 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | NLM @ 4.75bpw[1]4.3701,[2]4.8432,[3]5.7421,[4]6.3178,[5]6.4298,[6]6.3957,[7]6.5841,[8]6.6749,[9]7.0199,[10]7.2736,[11]7.5004,[12]7.5294,[13]7.4493,[14]7.4927,[15]7.7488,[16]7.3650,[17]7.2446,[18]7.2010,[19]6.8428,[20]6.8327,[21]6.7444,[22]6.5722,[23]6.5339,[24]6.4387,[25]6.4364,[26]6.2679,[27]6.0888,[28]5.9846,[29]5.8991,[30]5.7407,[31]5.7090,[32]5.7299,[33]5.6773,[34]5.7099,[35]5.7314,[36]5.7620,[37]5.7697,[38]5.7763,[39]5.8055,[40]5.8532,[41]5.8603,[42]5.9043,[43]5.8646,[44]5.9230,[45]5.9263,[46]5.8992,[47]5.9203,[48]5.8953,[49]5.8961,[50]5.8534,[51]5.8493,[52]5.8408,[53]5.8857,[54]5.8698,[55]5.8493,[56]5.8784,[57]5.8976,[58]5.9187,[59]5.9383,[60]5.9786,[61]5.9680,[62]6.0267,[63]6.0577,[64]6.0717,[65]6.1134,[66]6.1234,[67]6.1414,[68]6.1548,[69]6.1806,[70]6.2106,[71]6.2326,[72]6.2656,[73]6.3216,[74]6.3258,[75]6.3399,[76]6.3521,[77]6.3635,[78]6.3497,[79]6.3775,[80]6.3709,[81]6.3816,[82]6.3895,[83]6.3373,[84]6.3192,[85]6.3067,[86]6.2862,[87]6.2247,[88]6.1990,[89]6.1798,[90]6.1658,[91]6.1892,[92]6.1841,[93]6.1842,[94]6.1817,[95]6.2099,[96]6.2100,[97]6.2054,[98]6.1995,[99]6.1855,[100]6.1843,[101]6.2077,[102]6.2035,[103]6.2237,[104]6.2306,[105]6.2317,[106]6.2479,[107]6.2466,[108]6.2576,[109]6.2530,[110]6.2489,[111]6.2716,[112]6.2924,[113]6.2953,[114]6.2916,[115]6.2973,[116]6.2880,[117]6.2925,[118]6.3208,[119]6.3430,[120]6.3780,[121]6.3931,[122]6.4184,[123]6.4530,[124]6.4705,[125]6.4605,[126]6.5001,[127]6.5366,[128]6.5651,[129]6.5515,[130]6.5603,[131]6.5570,[132]6.5489,[133]6.5377,[134]6.5467,[135]6.5432,[136]6.5330,[137]6.5263,[138]6.5092,[139]6.4988,[140]6.4957,[141]6.4666,[142]6.4628,[143]6.4344,[144]6.4148,[145]6.4058,[146]6.3948,[147]6.3976,[148]6.3977,[149]6.3926,[150]6.3888,[151]6.3906,[152]6.3797,[153]6.3642,[154]6.3559,[155]6.3626,[156]6.3575,[157]6.3739,[158]6.3777,[159]6.3822,[160]6.3856,[161]6.3983,[162]6.3699,[163]6.3581,[164]6.3348,[165]6.3042,[166]6.2767,[167]6.2398,[168]6.2103,[169]6.1973,[170]6.1870,[171]6.1604,[172]6.1438,[173]6.1275,[174]6.0972,[175]6.0757,[176]6.0635,[177]6.0439,[178]6.0216,[179]6.0054,[180]5.9956,[181]5.9743,[182]5.9560,[183]5.9423,[184]5.9426,[185]5.9357,[186]5.9364,[187]5.9428,[188]5.9394,[189]5.9570,[190]5.9584,[191]5.9799,[192]5.9954,[193]6.0123,[194]6.0232,[195]6.0445,[196]6.0606,[197]6.0811,[198]6.0966,[199]6.0995,[200]6.1058,[201]6.1011,[202]6.1193,[203]6.1269,[204]6.1265,[205]6.1366,[206]6.1436,[207]6.1404,[208]6.1486,[209]6.1527,[210]6.1577,[211]6.1685,[212]6.1758,[213]6.1863,[214]6.1888,[215]6.1906,[216]6.2034,[217]6.2218,[218]6.2359,[219]6.2355,[220]6.2315,[221]6.2253,[222]6.2230,[223]6.2130,[224]6.2069,[225]6.2036,[226]6.2237,[227]6.2323,[228]6.2376,[229]6.2428,[230]6.2393,[231]6.2559,[232]6.2449,[233]6.2280,[234]6.2128,[235]6.1927,[236]6.1859,[237]6.1770,[238]6.1795,[239]6.1648,[240]6.1539,[241]6.1554,[242]6.1587,[243]6.1572,[244]6.1464,[245]6.1430,[246]6.1321,[247]6.1210,[248]6.1140,[249]6.1117,[250]6.1164,[251]6.1093,[252]6.1054,[253]6.0959,[254]6.0904,[255]6.0791,[256]6.0611,[257]6.0487,[258]6.0402,[259]6.0373,[260]6.0291,[261]6.0256,[262]6.0200,[263]6.0145,[264]5.9949,[265]5.9940,[266]5.9929,[267]5.9865,[268]5.9954,[269]5.9938,[270]5.9946,[271]6.0022,[272]6.0059,[273]6.0060,[274]6.0081,[275]6.0165,[276]6.0219,[277]6.0374,[278]6.0470,[279]6.0557,[280]6.0590,[281]6.0684,[282]6.0742,[283]6.0891,[284]6.0971,[285]6.1060,[286]6.1194,[287]6.1191,[288]6.1243,[289]6.1157,[290]6.1004,[291]6.0863,[292]6.0716,[293]6.0580,[294]6.0599,[295]6.0582,[296]6.0632,[297]6.0618,[298]6.0647,[299]6.0625,[300]6.0523,[301]6.0525,[302]6.0452,[303]6.0361,[304]6.0273,[305]6.0242,[306]6.0114,[307]6.0139,[308]6.0162,[309]6.0003,[310]5.9943,[311]5.9878,[312]5.9900,[313]5.9846,[314]5.9830,[315]5.9675,[316]5.9626,[317]5.9457,[318]5.9255,[319]5.9373,[320]5.9495,[321]5.9545,[322]5.9506,[323]5.9437,[324]5.9413,[325]5.9520,[326]5.9524,[327]5.9543,[328]5.9581,[329]5.9636,[330]5.9663,[331]5.9784,[332]5.9757,[333]5.9824,[334]5.9770,[335]5.9712,[336]5.9747,[337]5.9722,[338]5.9719,[339]5.9672,[340]5.9634,[341]5.9715,[342]5.9741,[343]5.9788,[344]5.9791,[345]5.9795,[346]5.9769,[347]5.9811,[348]5.9842,[349]5.9864,[350]5.9836,[351]5.9844,[352]5.9850,[353]5.9791,[354]5.9798,[355]5.9848,[356]5.9878,[357]5.9844,[358]5.9932,[359]5.9959,[360]5.9922,[361]5.9919,[362]5.9991,[363]6.0101,[364]6.0168,[365]6.0216,[366]6.0226,[367]6.0312,[368]6.0287,[369]6.0294,[370]6.0312,[371]6.0257,[372]6.0301,[373]6.0347,[374]6.0328,[375]6.0325,[376]6.0393,[377]6.0346,[378]6.0370,[379]6.0424,[380]6.0348,[381]6.0318,[382]6.0264,[383]6.0260,[384]6.0256,[385]6.0249,[386]6.0247,[387]6.0246,[388]6.0208,[389]6.0155,[390]6.0088,[391]6.0013,[392]5.9969,[393]5.9952,[394]5.9976,[395]5.9964,[396]5.9893,[397]5.9955,[398]5.9993,[399]6.0071,[400]6.0068,[401]6.0083,[402]6.0098,[403]6.0119,[404]6.0183,[405]6.0094,[406]6.0065,[407]6.0064,[408]6.0081,[409]6.0198,[410]6.0306,[411]6.0418,[412]6.0574,[413]6.0688,[414]6.0766,[415]6.0821,[416]6.0900,[417]6.1021,[418]6.1057,[419]6.1124,[420]6.1211,[421]6.1323,[422]6.1363,[423]6.1435,[424]6.1541,[425]6.1624,[426]6.1689,[427]6.1734,[428]6.1814,[429]6.1864,[430]6.1945,[431]6.2085,[432]6.2122,[433]6.2111,[434]6.2073,[435]6.2083,[436]6.2111,[437]6.2210,[438]6.2285,[439]6.2251,[440]6.2243,[441]6.2193,[442]6.2180,[443]6.2195,[444]6.2199,[445]6.2183,[446]6.2206,[447]6.2237,[448]6.2279,[449]6.2256,[450]6.2262,[451]6.2222,[452]6.2100,[453]6.2017,[454]6.1962,[455]6.1972,[456]6.2022,[457]6.2044,[458]6.2023,[459]6.2027,[460]6.2111,[461]6.2083,[462]6.2071,[463]6.2113,[464]6.2103,[465]6.2076,[466]6.1998,[467]6.2004,[468]6.2005,[469]6.2026,[470]6.2030,[471]6.1987,[472]6.2034,[473]6.1979,[474]6.1992,[475]6.1932,[476]6.1951,[477]6.1885,[478]6.1876,[479]6.1937,[480]6.1986,[481]6.2004,[482]6.1957,[483]6.1916,[484]6.1939,[485]6.1918,[486]6.1860,[487]6.1857,[488]6.1833,[489]6.1784,[490]6.1761,[491]6.1733,[492]6.1679,[493]6.1653,[494]6.1636,[495]6.1633,[496]6.1596,[497]6.1541,[498]6.1525,[499]6.1480,[500]6.1388,[501]6.1325,[502]6.1326,[503]6.1324,[504]6.1239,[505]6.1260,[506]6.1270,[507]6.1215,[508]6.1175,[509]6.1169,[510]6.1202,[511]6.1252,[512]6.1287,[513]6.1309,[514]6.1376,[515]6.1321,[516]6.1309,[517]6.1319,[518]6.1314,[519]6.1347,[520]6.1371,[521]6.1388,[522]6.1417,[523]6.1425,[524]6.1481,[525]6.1515,[526]6.1524,[527]6.1541,[528]6.1491,[529]6.1499,[530]6.1451,[531]6.1440,[532]6.1489,[533]6.1512,[534]6.1494,[535]6.1519,[536]6.1467,[537]6.1446,[538]6.1496,[539]6.1505,[540]6.1543,[541]6.1545,[542]6.1555,[543]6.1571,[544]6.1582,[545]6.1561,[546]6.1567,[547]6.1525,[548]6.1479,[549]6.1478,[550]6.1451,[551]6.1414,[552]6.1394,[553]6.1356,[554]6.1334,[555]6.1304,[556]6.1302,[557]6.1327,[558]6.1289,[559]6.1289,[560]6.1289,[561]6.1293,[562]6.1267,[563]6.1263,[564]6.1304,[565]6.1326,[566]6.1325,[567]6.1304,[568]6.1309,[569]6.1297,[570]6.1322,[571]6.1325,[572]6.1333,[573]6.1330,[574]6.1293,[575]6.1288,[576]6.1287,[577]6.1273,[578]6.1253,[579]6.1261,[580]6.1198,[581]6.1160,[582]6.1149,[583]6.1156,[584]6.1158,[585]6.1085,[586]6.1018,[587]6.1023,[588]6.1072,[589]6.1126,[590]6.1157,[591]6.1179,[592]6.1167,[593]6.1135,[594]6.1146,[595]6.1123,[596]6.1156,[597]6.1135,[598]6.1104,[599]6.1125,[600]6.1121,[601]6.1109,[602]6.1121,[603]6.1150,[604]6.1156,[605]6.1192,[606]6.1213,[607]6.1197,[608]6.1162,[609]6.1170,[610]6.1204,[611]6.1186,[612]6.1211,[613]6.1175,[614]6.1127,[615]6.1054,[616]6.1081,[617]6.1020,[618]6.0971,[619]6.0919,[620]6.0780,[621]6.0712,[622]6.0696,[623]6.0713,[624]6.0718,[625]6.0718,[626]6.0708,[627]6.0729,[628]6.0730,[629]6.0728,[630]6.0758,[631]6.0814,[632]6.0873,[633]6.0858,[634]6.0894,[635]6.0900,[636]6.0867,[637]6.0834,[638]6.0860,[639]6.0834,[640]6.0841,[641]6.0841,[642]6.0908,[643]6.0928,[644]6.0942,[645]6.0925,[646]6.0966,[647]6.0925,[648]6.0938,[649]6.0941,[650]6.0981,[651]6.1034,[652]6.1044,[653]6.1084,[654]6.1019,[655]6.1014,
SCM @ 4bpw[1]4.4865,[2]5.0649,[3]5.9553,[4]6.5977,[5]6.6739,[6]6.6580,[7]6.8753,[8]6.9782,[9]7.3275,[10]7.5823,[11]7.8070,[12]7.8091,[13]7.7428,[14]7.8138,[15]8.0488,[16]7.6375,[17]7.5148,[18]7.4648,[19]7.0842,[20]7.0715,[21]6.9824,[22]6.7951,[23]6.7563,[24]6.6686,[25]6.6695,[26]6.5019,[27]6.3196,[28]6.2198,[29]6.1306,[30]5.9768,[31]5.9504,[32]5.9709,[33]5.9172,[34]5.9501,[35]5.9790,[36]6.0208,[37]6.0232,[38]6.0346,[39]6.0646,[40]6.1271,[41]6.1429,[42]6.1844,[43]6.1433,[44]6.1992,[45]6.2006,[46]6.1722,[47]6.1918,[48]6.1613,[49]6.1626,[50]6.1177,[51]6.1108,[52]6.0969,[53]6.1424,[54]6.1246,[55]6.0978,[56]6.1274,[57]6.1486,[58]6.1747,[59]6.1911,[60]6.2333,[61]6.2236,[62]6.2859,[63]6.3197,[64]6.3336,[65]6.3800,[66]6.3886,[67]6.4047,[68]6.4183,[69]6.4427,[70]6.4764,[71]6.4994,[72]6.5315,[73]6.5948,[74]6.5980,[75]6.6105,[76]6.6247,[77]6.6375,[78]6.6222,[79]6.6487,[80]6.6393,[81]6.6492,[82]6.6578,[83]6.6023,[84]6.5835,[85]6.5726,[86]6.5491,[87]6.4848,[88]6.4559,[89]6.4371,[90]6.4232,[91]6.4475,[92]6.4427,[93]6.4444,[94]6.4396,[95]6.4721,[96]6.4705,[97]6.4670,[98]6.4608,[99]6.4449,[100]6.4465,[101]6.4708,[102]6.4650,[103]6.4874,[104]6.4948,[105]6.4960,[106]6.5094,[107]6.5056,[108]6.5188,[109]6.5147,[110]6.5108,[111]6.5338,[112]6.5558,[113]6.5567,[114]6.5527,[115]6.5603,[116]6.5519,[117]6.5591,[118]6.5893,[119]6.6121,[120]6.6472,[121]6.6638,[122]6.6879,[123]6.7271,[124]6.7447,[125]6.7346,[126]6.7749,[127]6.8140,[128]6.8449,[129]6.8264,[130]6.8337,[131]6.8300,[132]6.8214,[133]6.8049,[134]6.8161,[135]6.8122,[136]6.8005,[137]6.7928,[138]6.7776,[139]6.7681,[140]6.7639,[141]6.7347,[142]6.7303,[143]6.7020,[144]6.6827,[145]6.6753,[146]6.6635,[147]6.6711,[148]6.6727,[149]6.6658,[150]6.6624,[151]6.6642,[152]6.6546,[153]6.6366,[154]6.6264,[155]6.6329,[156]6.6280,[157]6.6460,[158]6.6501,[159]6.6555,[160]6.6571,[161]6.6693,[162]6.6388,[163]6.6283,[164]6.6030,[165]6.5695,[166]6.5411,[167]6.5028,[168]6.4699,[169]6.4558,[170]6.4442,[171]6.4149,[172]6.3963,[173]6.3787,[174]6.3465,[175]6.3240,[176]6.3120,[177]6.2899,[178]6.2652,[179]6.2476,[180]6.2379,[181]6.2162,[182]6.1963,[183]6.1824,[184]6.1813,[185]6.1737,[186]6.1750,[187]6.1810,[188]6.1763,[189]6.1954,[190]6.1957,[191]6.2179,[192]6.2345,[193]6.2537,[194]6.2664,[195]6.2863,[196]6.3023,[197]6.3251,[198]6.3400,[199]6.3420,[200]6.3466,[201]6.3417,[202]6.3628,[203]6.3698,[204]6.3706,[205]6.3825,[206]6.3900,[207]6.3856,[208]6.3932,[209]6.3972,[210]6.4021,[211]6.4116,[212]6.4189,[213]6.4300,[214]6.4333,[215]6.4391,[216]6.4542,[217]6.4729,[218]6.4869,[219]6.4874,[220]6.4833,[221]6.4799,[222]6.4768,[223]6.4660,[224]6.4588,[225]6.4549,[226]6.4761,[227]6.4848,[228]6.4900,[229]6.4969,[230]6.4930,[231]6.5098,[232]6.4969,[233]6.4798,[234]6.4643,[235]6.4483,[236]6.4412,[237]6.4313,[238]6.4342,[239]6.4189,[240]6.4082,[241]6.4117,[242]6.4153,[243]6.4133,[244]6.4018,[245]6.3985,[246]6.3859,[247]6.3737,[248]6.3653,[249]6.3635,[250]6.3669,[251]6.3596,[252]6.3565,[253]6.3455,[254]6.3408,[255]6.3287,[256]6.3095,[257]6.2969,[258]6.2874,[259]6.2854,[260]6.2765,[261]6.2724,[262]6.2671,[263]6.2621,[264]6.2421,[265]6.2416,[266]6.2398,[267]6.2325,[268]6.2414,[269]6.2398,[270]6.2404,[271]6.2479,[272]6.2513,[273]6.2512,[274]6.2533,[275]6.2623,[276]6.2689,[277]6.2850,[278]6.2953,[279]6.3048,[280]6.3078,[281]6.3173,[282]6.3227,[283]6.3370,[284]6.3457,[285]6.3542,[286]6.3688,[287]6.3682,[288]6.3751,[289]6.3660,[290]6.3502,[291]6.3347,[292]6.3190,[293]6.3051,[294]6.3072,[295]6.3070,[296]6.3115,[297]6.3108,[298]6.3145,[299]6.3118,[300]6.3008,[301]6.3005,[302]6.2931,[303]6.2848,[304]6.2756,[305]6.2725,[306]6.2592,[307]6.2610,[308]6.2652,[309]6.2488,[310]6.2425,[311]6.2361,[312]6.2387,[313]6.2332,[314]6.2312,[315]6.2145,[316]6.2104,[317]6.1937,[318]6.1723,[319]6.1852,[320]6.1978,[321]6.2019,[322]6.1974,[323]6.1912,[324]6.1881,[325]6.1987,[326]6.1982,[327]6.2005,[328]6.2050,[329]6.2111,[330]6.2140,[331]6.2262,[332]6.2238,[333]6.2312,[334]6.2253,[335]6.2190,[336]6.2227,[337]6.2197,[338]6.2190,[339]6.2133,[340]6.2084,[341]6.2162,[342]6.2181,[343]6.2239,[344]6.2240,[345]6.2242,[346]6.2212,[347]6.2255,[348]6.2295,[349]6.2320,[350]6.2281,[351]6.2288,[352]6.2294,[353]6.2231,[354]6.2236,[355]6.2290,[356]6.2320,[357]6.2285,[358]6.2377,[359]6.2408,[360]6.2366,[361]6.2362,[362]6.2436,[363]6.2547,[364]6.2605,[365]6.2664,[366]6.2672,[367]6.2766,[368]6.2743,[369]6.2753,[370]6.2764,[371]6.2706,[372]6.2758,[373]6.2812,[374]6.2797,[375]6.2795,[376]6.2873,[377]6.2827,[378]6.2850,[379]6.2916,[380]6.2830,[381]6.2790,[382]6.2740,[383]6.2731,[384]6.2725,[385]6.2722,[386]6.2722,[387]6.2718,[388]6.2669,[389]6.2612,[390]6.2532,[391]6.2447,[392]6.2405,[393]6.2386,[394]6.2409,[395]6.2391,[396]6.2314,[397]6.2392,[398]6.2424,[399]6.2500,[400]6.2496,[401]6.2517,[402]6.2525,[403]6.2542,[404]6.2616,[405]6.2521,[406]6.2487,[407]6.2483,[408]6.2502,[409]6.2621,[410]6.2730,[411]6.2844,[412]6.3009,[413]6.3125,[414]6.3214,[415]6.3272,[416]6.3350,[417]6.3478,[418]6.3518,[419]6.3588,[420]6.3683,[421]6.3818,[422]6.3865,[423]6.3937,[424]6.4058,[425]6.4150,[426]6.4217,[427]6.4262,[428]6.4345,[429]6.4389,[430]6.4481,[431]6.4618,[432]6.4659,[433]6.4649,[434]6.4604,[435]6.4608,[436]6.4631,[437]6.4736,[438]6.4814,[439]6.4782,[440]6.4773,[441]6.4720,[442]6.4707,[443]6.4726,[444]6.4729,[445]6.4709,[446]6.4730,[447]6.4759,[448]6.4804,[449]6.4774,[450]6.4774,[451]6.4731,[452]6.4614,[453]6.4529,[454]6.4467,[455]6.4477,[456]6.4528,[457]6.4545,[458]6.4522,[459]6.4526,[460]6.4610,[461]6.4584,[462]6.4560,[463]6.4606,[464]6.4595,[465]6.4570,[466]6.4487,[467]6.4493,[468]6.4491,[469]6.4507,[470]6.4510,[471]6.4459,[472]6.4505,[473]6.4448,[474]6.4462,[475]6.4403,[476]6.4424,[477]6.4345,[478]6.4337,[479]6.4401,[480]6.4453,[481]6.4468,[482]6.4425,[483]6.4380,[484]6.4402,[485]6.4389,[486]6.4329,[487]6.4324,[488]6.4300,[489]6.4249,[490]6.4223,[491]6.4191,[492]6.4128,[493]6.4097,[494]6.4081,[495]6.4078,[496]6.4046,[497]6.3986,[498]6.3968,[499]6.3920,[500]6.3821,[501]6.3748,[502]6.3747,[503]6.3745,[504]6.3658,[505]6.3684,[506]6.3694,[507]6.3645,[508]6.3605,[509]6.3597,[510]6.3635,[511]6.3679,[512]6.3715,[513]6.3734,[514]6.3800,[515]6.3745,[516]6.3732,[517]6.3748,[518]6.3746,[519]6.3774,[520]6.3798,[521]6.3814,[522]6.3839,[523]6.3841,[524]6.3900,[525]6.3935,[526]6.3941,[527]6.3962,[528]6.3912,[529]6.3917,[530]6.3868,[531]6.3856,[532]6.3909,[533]6.3937,[534]6.3918,[535]6.3948,[536]6.3891,[537]6.3868,[538]6.3919,[539]6.3930,[540]6.3964,[541]6.3970,[542]6.3981,[543]6.3991,[544]6.4000,[545]6.3979,[546]6.3990,[547]6.3940,[548]6.3884,[549]6.3884,[550]6.3853,[551]6.3815,[552]6.3795,[553]6.3754,[554]6.3727,[555]6.3695,[556]6.3690,[557]6.3710,[558]6.3671,[559]6.3668,[560]6.3665,[561]6.3662,[562]6.3641,[563]6.3641,[564]6.3684,[565]6.3704,[566]6.3700,[567]6.3677,[568]6.3681,[569]6.3663,[570]6.3693,[571]6.3694,[572]6.3705,[573]6.3703,[574]6.3664,[575]6.3663,[576]6.3664,[577]6.3650,[578]6.3628,[579]6.3637,[580]6.3568,[581]6.3529,[582]6.3520,[583]6.3526,[584]6.3527,[585]6.3449,[586]6.3383,[587]6.3389,[588]6.3436,[589]6.3492,[590]6.3520,[591]6.3541,[592]6.3527,[593]6.3486,[594]6.3496,[595]6.3471,[596]6.3511,[597]6.3488,[598]6.3458,[599]6.3479,[600]6.3476,[601]6.3459,[602]6.3479,[603]6.3511,[604]6.3520,[605]6.3555,[606]6.3574,[607]6.3560,[608]6.3527,[609]6.3532,[610]6.3566,[611]6.3548,[612]6.3575,[613]6.3534,[614]6.3482,[615]6.3406,[616]6.3432,[617]6.3368,[618]6.3315,[619]6.3259,[620]6.3113,[621]6.3041,[622]6.3022,[623]6.3034,[624]6.3042,[625]6.3041,[626]6.3031,[627]6.3053,[628]6.3058,[629]6.3053,[630]6.3084,[631]6.3146,[632]6.3201,[633]6.3186,[634]6.3220,[635]6.3224,[636]6.3191,[637]6.3159,[638]6.3190,[639]6.3158,[640]6.3167,[641]6.3167,[642]6.3236,[643]6.3258,[644]6.3272,[645]6.3249,[646]6.3293,[647]6.3258,[648]6.3266,[649]6.3265,[650]6.3302,[651]6.3358,[652]6.3367,[653]6.3407,[654]6.3341,[655]6.3334,
SCM @ 3bpw[1]6.0942,[2]6.7662,[3]7.6038,[4]8.3524,[5]8.2915,[6]8.2419,[7]8.4662,[8]8.5407,[9]8.9734,[10]9.3232,[11]9.6352,[12]9.6208,[13]9.5982,[14]9.7618,[15]10.0687,[16]9.5089,[17]9.3348,[18]9.3130,[19]8.7935,[20]8.7400,[21]8.6163,[22]8.4302,[23]8.3799,[24]8.2907,[25]8.2851,[26]8.0742,[27]7.8200,[28]7.7135,[29]7.6222,[30]7.4274,[31]7.4010,[32]7.4245,[33]7.3594,[34]7.3935,[35]7.4241,[36]7.5015,[37]7.5204,[38]7.5385,[39]7.5784,[40]7.6567,[41]7.6877,[42]7.7363,[43]7.6716,[44]7.7293,[45]7.7264,[46]7.6833,[47]7.7103,[48]7.6628,[49]7.6592,[50]7.5926,[51]7.5754,[52]7.5577,[53]7.6009,[54]7.5788,[55]7.5386,[56]7.5680,[57]7.5956,[58]7.6293,[59]7.6458,[60]7.7060,[61]7.6870,[62]7.7644,[63]7.8026,[64]7.8172,[65]7.8757,[66]7.8862,[67]7.9111,[68]7.9323,[69]7.9644,[70]8.0057,[71]8.0350,[72]8.0740,[73]8.1527,[74]8.1452,[75]8.1574,[76]8.1727,[77]8.1926,[78]8.1813,[79]8.2132,[80]8.2059,[81]8.2202,[82]8.2356,[83]8.1614,[84]8.1447,[85]8.1409,[86]8.1103,[87]8.0484,[88]8.0102,[89]7.9842,[90]7.9620,[91]8.0019,[92]7.9939,[93]7.9899,[94]7.9864,[95]8.0278,[96]8.0277,[97]8.0191,[98]8.0111,[99]7.9845,[100]7.9780,[101]8.0098,[102]8.0020,[103]8.0304,[104]8.0333,[105]8.0344,[106]8.0579,[107]8.0568,[108]8.0709,[109]8.0685,[110]8.0648,[111]8.0911,[112]8.1192,[113]8.1227,[114]8.1192,[115]8.1330,[116]8.1195,[117]8.1280,[118]8.1694,[119]8.1971,[120]8.2447,[121]8.2704,[122]8.3007,[123]8.3528,[124]8.3769,[125]8.3618,[126]8.4117,[127]8.4544,[128]8.4892,[129]8.4616,[130]8.4697,[131]8.4624,[132]8.4480,[133]8.4311,[134]8.4462,[135]8.4397,[136]8.4282,[137]8.4204,[138]8.4024,[139]8.3956,[140]8.3884,[141]8.3676,[142]8.3623,[143]8.3396,[144]8.3209,[145]8.3169,[146]8.3003,[147]8.3116,[148]8.3126,[149]8.3067,[150]8.3081,[151]8.3101,[152]8.2988,[153]8.2741,[154]8.2627,[155]8.2697,[156]8.2616,[157]8.2823,[158]8.2877,[159]8.2935,[160]8.2932,[161]8.3068,[162]8.2680,[163]8.2548,[164]8.2194,[165]8.1754,[166]8.1387,[167]8.0855,[168]8.0441,[169]8.0257,[170]8.0104,[171]7.9727,[172]7.9486,[173]7.9269,[174]7.8846,[175]7.8547,[176]7.8342,[177]7.8085,[178]7.7779,[179]7.7565,[180]7.7442,[181]7.7159,[182]7.6891,[183]7.6711,[184]7.6641,[185]7.6563,[186]7.6562,[187]7.6613,[188]7.6530,[189]7.6788,[190]7.6767,[191]7.7025,[192]7.7204,[193]7.7429,[194]7.7624,[195]7.7856,[196]7.8058,[197]7.8328,[198]7.8502,[199]7.8540,[200]7.8571,[201]7.8510,[202]7.8813,[203]7.8903,[204]7.8973,[205]7.9138,[206]7.9242,[207]7.9183,[208]7.9300,[209]7.9340,[210]7.9371,[211]7.9514,[212]7.9609,[213]7.9722,[214]7.9795,[215]7.9847,[216]8.0041,[217]8.0265,[218]8.0424,[219]8.0414,[220]8.0366,[221]8.0298,[222]8.0248,[223]8.0090,[224]8.0012,[225]7.9969,[226]8.0198,[227]8.0342,[228]8.0421,[229]8.0493,[230]8.0485,[231]8.0668,[232]8.0503,[233]8.0279,[234]8.0076,[235]7.9944,[236]7.9857,[237]7.9726,[238]7.9767,[239]7.9548,[240]7.9407,[241]7.9492,[242]7.9537,[243]7.9534,[244]7.9381,[245]7.9340,[246]7.9183,[247]7.9039,[248]7.8908,[249]7.8898,[250]7.8929,[251]7.8840,[252]7.8798,[253]7.8658,[254]7.8590,[255]7.8460,[256]7.8218,[257]7.8062,[258]7.7943,[259]7.7919,[260]7.7803,[261]7.7746,[262]7.7672,[263]7.7592,[264]7.7408,[265]7.7394,[266]7.7401,[267]7.7302,[268]7.7402,[269]7.7371,[270]7.7373,[271]7.7471,[272]7.7536,[273]7.7517,[274]7.7548,[275]7.7663,[276]7.7733,[277]7.7928,[278]7.8073,[279]7.8191,[280]7.8232,[281]7.8359,[282]7.8408,[283]7.8568,[284]7.8677,[285]7.8786,[286]7.8955,[287]7.8936,[288]7.9036,[289]7.8939,[290]7.8754,[291]7.8536,[292]7.8324,[293]7.8133,[294]7.8144,[295]7.8116,[296]7.8156,[297]7.8145,[298]7.8204,[299]7.8160,[300]7.8020,[301]7.8014,[302]7.7913,[303]7.7797,[304]7.7671,[305]7.7634,[306]7.7465,[307]7.7468,[308]7.7509,[309]7.7298,[310]7.7215,[311]7.7144,[312]7.7159,[313]7.7076,[314]7.7068,[315]7.6859,[316]7.6841,[317]7.6650,[318]7.6366,[319]7.6549,[320]7.6693,[321]7.6754,[322]7.6688,[323]7.6630,[324]7.6598,[325]7.6713,[326]7.6703,[327]7.6741,[328]7.6798,[329]7.6892,[330]7.6935,[331]7.7098,[332]7.7047,[333]7.7171,[334]7.7100,[335]7.7006,[336]7.7031,[337]7.6972,[338]7.6971,[339]7.6904,[340]7.6860,[341]7.6953,[342]7.6974,[343]7.7051,[344]7.7057,[345]7.7049,[346]7.7016,[347]7.7060,[348]7.7119,[349]7.7144,[350]7.7094,[351]7.7087,[352]7.7093,[353]7.7014,[354]7.7039,[355]7.7111,[356]7.7149,[357]7.7117,[358]7.7232,[359]7.7273,[360]7.7192,[361]7.7180,[362]7.7272,[363]7.7396,[364]7.7480,[365]7.7544,[366]7.7544,[367]7.7650,[368]7.7614,[369]7.7634,[370]7.7642,[371]7.7557,[372]7.7626,[373]7.7681,[374]7.7655,[375]7.7658,[376]7.7760,[377]7.7690,[378]7.7718,[379]7.7807,[380]7.7695,[381]7.7639,[382]7.7575,[383]7.7548,[384]7.7526,[385]7.7528,[386]7.7535,[387]7.7529,[388]7.7460,[389]7.7382,[390]7.7299,[391]7.7202,[392]7.7156,[393]7.7154,[394]7.7176,[395]7.7144,[396]7.7044,[397]7.7126,[398]7.7159,[399]7.7248,[400]7.7240,[401]7.7266,[402]7.7291,[403]7.7306,[404]7.7405,[405]7.7308,[406]7.7271,[407]7.7284,[408]7.7301,[409]7.7433,[410]7.7576,[411]7.7708,[412]7.7912,[413]7.8041,[414]7.8138,[415]7.8216,[416]7.8309,[417]7.8459,[418]7.8497,[419]7.8587,[420]7.8696,[421]7.8874,[422]7.8922,[423]7.9027,[424]7.9171,[425]7.9283,[426]7.9378,[427]7.9431,[428]7.9530,[429]7.9582,[430]7.9689,[431]7.9871,[432]7.9905,[433]7.9884,[434]7.9804,[435]7.9792,[436]7.9808,[437]7.9941,[438]8.0029,[439]7.9992,[440]7.9960,[441]7.9884,[442]7.9850,[443]7.9869,[444]7.9875,[445]7.9849,[446]7.9866,[447]7.9894,[448]7.9931,[449]7.9890,[450]7.9878,[451]7.9826,[452]7.9743,[453]7.9646,[454]7.9582,[455]7.9588,[456]7.9654,[457]7.9689,[458]7.9673,[459]7.9670,[460]7.9759,[461]7.9721,[462]7.9691,[463]7.9758,[464]7.9752,[465]7.9720,[466]7.9633,[467]7.9652,[468]7.9675,[469]7.9699,[470]7.9708,[471]7.9652,[472]7.9707,[473]7.9631,[474]7.9663,[475]7.9617,[476]7.9658,[477]7.9567,[478]7.9553,[479]7.9661,[480]7.9735,[481]7.9758,[482]7.9713,[483]7.9656,[484]7.9694,[485]7.9689,[486]7.9614,[487]7.9608,[488]7.9583,[489]7.9503,[490]7.9475,[491]7.9441,[492]7.9361,[493]7.9317,[494]7.9284,[495]7.9285,[496]7.9240,[497]7.9165,[498]7.9147,[499]7.9074,[500]7.8953,[501]7.8868,[502]7.8868,[503]7.8858,[504]7.8748,[505]7.8775,[506]7.8780,[507]7.8736,[508]7.8685,[509]7.8679,[510]7.8724,[511]7.8778,[512]7.8813,[513]7.8836,[514]7.8924,[515]7.8852,[516]7.8835,[517]7.8862,[518]7.8858,[519]7.8890,[520]7.8910,[521]7.8928,[522]7.8958,[523]7.8961,[524]7.9030,[525]7.9072,[526]7.9088,[527]7.9119,[528]7.9057,[529]7.9064,[530]7.8991,[531]7.8963,[532]7.9030,[533]7.9074,[534]7.9037,[535]7.9083,[536]7.9025,[537]7.8983,[538]7.9056,[539]7.9061,[540]7.9109,[541]7.9136,[542]7.9145,[543]7.9169,[544]7.9185,[545]7.9175,[546]7.9181,[547]7.9125,[548]7.9039,[549]7.9034,[550]7.9008,[551]7.8952,[552]7.8933,[553]7.8881,[554]7.8836,[555]7.8795,[556]7.8788,[557]7.8825,[558]7.8783,[559]7.8778,[560]7.8761,[561]7.8754,[562]7.8718,[563]7.8715,[564]7.8764,[565]7.8795,[566]7.8792,[567]7.8757,[568]7.8764,[569]7.8734,[570]7.8766,[571]7.8762,[572]7.8776,[573]7.8760,[574]7.8720,[575]7.8716,[576]7.8719,[577]7.8690,[578]7.8667,[579]7.8665,[580]7.8576,[581]7.8526,[582]7.8514,[583]7.8522,[584]7.8514,[585]7.8437,[586]7.8359,[587]7.8363,[588]7.8421,[589]7.8488,[590]7.8522,[591]7.8542,[592]7.8524,[593]7.8479,[594]7.8488,[595]7.8457,[596]7.8522,[597]7.8495,[598]7.8459,[599]7.8480,[600]7.8468,[601]7.8443,[602]7.8485,[603]7.8524,[604]7.8534,[605]7.8571,[606]7.8580,[607]7.8575,[608]7.8525,[609]7.8527,[610]7.8566,[611]7.8549,[612]7.8580,[613]7.8534,[614]7.8477,[615]7.8371,[616]7.8404,[617]7.8315,[618]7.8244,[619]7.8167,[620]7.7970,[621]7.7866,[622]7.7837,[623]7.7845,[624]7.7866,[625]7.7862,[626]7.7850,[627]7.7888,[628]7.7887,[629]7.7871,[630]7.7914,[631]7.7982,[632]7.8050,[633]7.8036,[634]7.8078,[635]7.8089,[636]7.8061,[637]7.8033,[638]7.8082,[639]7.8039,[640]7.8056,[641]7.8059,[642]7.8140,[643]7.8159,[644]7.8171,[645]7.8146,[646]7.8204,[647]7.8180,[648]7.8202,[649]7.8205,[650]7.8255,[651]7.8332,[652]7.8352,[653]7.8398,[654]7.8316,[655]7.8300, SCM @ 3bpw sample inferences
|
These are the latest results using The RMSE results are from PR #1106 Vanilla
RMSE opt
Full range
Logs: ppl-7b-q4_0.txt |
@MarcioPais : can you share your code? I'd be interested in how well the NLM/4.75bpw method lends itself to SIMD optimization. |
Thanks, I've updated my post with your results.
Sure, I fully intend to, it's just that for now it's mostly a badly jumbled together mess from trying so many ideas. As for NLM, it should be easily amenable to SIMD optimization as it's basically just Q4_1 with a lookup table: /* quantization */
// loop to find max, min and index of min
...
ggml_fp16_t const range_fp16 = GGML_FP32_TO_FP16((max - min) / 2.f) & 0x7FE0u; // 10 bits, implicit sign
ggml_fp16_t const min_fp16 = x[i*QK4_1 + index_min] & 0xFFF0u; // 12 bits
float const range = GGML_FP16_TO_FP32(range_fp16) * 2.f;
float const min_w = GGML_FP16_TO_FP32(min_fp16);
// try the 4 candidate mappings, keep track of square errors (or max absolute error), pick the best
// finally encode range_fp16, min_fp16 and best mapping in 24 bits
/* dequantization */
// decode range, min and best mapping
...
float const v0 = range * LUT[map_id][vi0] + min_w; // should possibly be pre-computed and just indexed into
float const v1 = range * LUT[map_id][vi1] + min_w; It is however not worth it at all, as the non-linear mappings help with RMSE and MAE, but do basically nothing for improving perplexity, which is disappointing. My next run will simply be vanilla Q4_1 at 4.75bpw also (11 bits for range and 13 bits for min_w), just to confirm it. I plan on testing it out at QK=128 (hence 4.1875bpw), but I doubt it will be worth it against Q4_1 at the same bpw, and I'm not sure increasing QK is on the menu. Though it would certainly open up more possibilities to get a lower bpw. Now for some preliminary remarks, based on my limited testing so far:
|
But your perplexity measurements seemed to show that NLM is better at comparable bit rates? q4_0 -> 6.2103, NLM -> 6.1014 |
Yes, but you shouldn't compare it to Q4_0 (which is really just useful for it's speed) nor Q4_2 (probably not really needed), but to Q4_1 at the same 4.75bpw (like I mentioned, it's my next run, using 11 bits for the range and 13 bits for the minimum weight value), which should be almost identical in perplexity (probably actually better once RMSE-optimized) whilst being faster (no need for a LUT). And if you want to skip the odd bit-packing, just use Q4_1 at 5bpw (2x FP16 instead of 2x FP32). |
@sw Now, we could also use RMSE-optimization for NLM, tweak the non-linear mappings and probably get it to about 6.07 ppl or thereabouts, which at 4.75bpw is not bad, but I don't think an extra 0.03 ppl gain is enough to justify having the additional complexity of a lookup table. Considering that something similar is possible for the other quantization methods, we'll probably be looking at something like this:
|
Here are the latest results after implementing the new The times per token are measured on M1 Pro, after restart with no background software running, 2 runs per config, using the following parameters: Commit: 982bfce # 4 threads
-p "I believe the meaning of life is" --no-mmap -c 2048 --ignore-eos -s 3 -n 64 -t 4
# 8 threads
-p "I believe the meaning of life is" --no-mmap -c 2048 --ignore-eos -s 3 -n 64 -t 8
# bpw - effective bits per weight
I plan to merge the Everyone is welcome to continue to contribute and explore alternative quantization approaches, but I think the methods above will serve as the baseline for now. Really want to start applying them to other models and see how they perform. Things that are left to wrap-up the project:
I'm looking for contributions for the first part. I hope we can finish these in the next few days. |
Not sure if this is the best place to put this, but I've been experimenting with compression and seeing if quantized models could be compressed losslessly for storage (decompression on the fly wouldn't work well). As expected, the answer is not really. Using t-ans compression (a version of entropy coding) on the integer parts of the symbols and compressing each tensor with its own distribution and decoding table, I can get fairly close to the theoretical optimal compression given the distribution's entropy, but that theoretical optimal isn't very good (the distribution of quantized symbols is pretty much flat with some spikes near the edges and a slight rise in the middle). I might have been able to get a few fractions of a percent closer to optimal by varying the t-ans block size per-tensor instead of per quantization type, but I didn't feel it was worth it. The block sizes were choosen based on whatever gave the largest decrease overall in the model for that quant format. Here are some of the per-tensor data for different quantizations of Q2_K (41.20 MiB saved)
Q3_K_M (58.98 MiB saved)
Q4_0 (198.81 MiB saved)
Q4_K_M (120.49 MiB saved)
Q5_K_M (42.68 MiB saved)
I also tried a larger model to see if it made a difference. The compression rates were similar to the smaller model, but there are some outlier layers which can be compressed a fair bit ( Qwen2.5-Coder-32B-Instruct Q4_K_M (576.59 MiB saved)
Overall this didn't really go anywhere, as the distribution of symbols doesn't leave alot to be compressed, but it was worth a shot. |
Currently, in Q4_0 quantization we choose the scaling factor for each 32 group of weights as
abs(max(x_i))/7
. It is easy to see that this is suboptimal.Consider quantization of the following 4 numbers:
0.1 0.2 0.3 0.6
Currently, we would determine a scaling factor of
0.6 / 7 ~= 0.0857
and the dequantized numbers will be:0.0857 0.1714 0.3428 0.6
So the RMS between the dequantized and original values will be non-zero:
sqrt((0.1 - 0.0857)^2 + (0.2 - 0.1714)^2 + (0.3 - 0.3428)^2 + (0.6 - 0.6)^2) > 0.0
However, if we choose the scaling factor to be
0.1
instead, then it is easy to see that the original numbers will be quantized perfectly.So the scaling factor is better to be chosen as the one that minimises some error (e.g. RMS or whatever is more meaningful and easy to compute). Doing that we will certainly achieve better accuracy compared to the existing approach. The question is - how much better?
The goal of this task is to implement the described quantization above and evaluate the perplexity using the new approach. The approach in simple terms boils down to making a linear regression of the data with a fixed zero point. This new quantization might be a bit heavier to compute compared to
Q4_0
, so for start we can do it just on the model tensors. The intermediate tensors during the evaluation can remain quantized using the existing approach, so that the evaluation is efficient. If the results look promising, we can put effort into optimising the new approach and replacing completelyQ4_0
with it.Whoever demonstrates the results of this quantization will get the chance to give it a name and publish a paper (just kidding 😆 )
Similar strategy for determining the scale factor and offset factor can be applied to
Q4_1
.The text was updated successfully, but these errors were encountered: