AVX2 implementation of ggml_vec_dot_q4_1_q8_0 #1051

slaren · 2023-04-19T01:17:04Z

This is about 20%-30% slower than q4_0_q8_0, depending on the batch size.

slaren · 2023-04-19T09:18:28Z

Perplexity 7B q4_1: 6.1293
Seems a bit higher than expected.

./perplexity -m models/7B/ggml-model-q4_1.bin -f wikitext-2-raw/wiki.test.raw -t 12 main: seed = 1681867565 llama.cpp: loading model from models/7B/ggml-model-q4_1.bin llama_model_load_internal: format = ggjt v1 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 512 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 3 (mostly Q4_1) llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 59.11 KB llama_model_load_internal: mem required = 6612.57 MB (+ 1026.00 MB per state) llama_init_from_file: kv self size = 256.00 MB

system_info: n_threads = 12 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
35.00 seconds per pass - ETA 6.37 hours
[1]4.4236,[2]4.8796,[3]5.7700,[4]6.3763,[5]6.4859,[6]6.4581,[7]6.6521,[8]6.7559,[9]7.0840,[10]7.3382,[11]7.5618,[12]7.6052,[13]7.5327,[14]7.5960,[15]7.8411,[16]7.4414,[17]7.3184,[18]7.2612,[19]6.8884,[20]6.8677,[21]6.7710,[22]6.5991,[23]6.5700,[24]6.4803,[25]6.4823,[26]6.3179,[27]6.1374,[28]6.0309,[29]5.9410,[30]5.7791,[31]5.7497,[32]5.7677,[33]5.7104,[34]5.7440,[35]5.7658,[36]5.8059,[37]5.8089,[38]5.8125,[39]5.8471,[40]5.8951,[41]5.9080,[42]5.9488,[43]5.9081,[44]5.9667,[45]5.9737,[46]5.9468,[47]5.9661,[48]5.9397,[49]5.9383,[50]5.8972,[51]5.8914,[52]5.8794,[53]5.9281,[54]5.9112,[55]5.8898,[56]5.9180,[57]5.9370,[58]5.9556,[59]5.9739,[60]6.0172,[61]6.0062,[62]6.0632,[63]6.0930,[64]6.1035,[65]6.1483,[66]6.1586,[67]6.1775,[68]6.1908,[69]6.2143,[70]6.2426,[71]6.2631,[72]6.2945,[73]6.3512,[74]6.3546,[75]6.3698,[76]6.3830,[77]6.3949,[78]6.3813,[79]6.4100,[80]6.4041,[81]6.4209,[82]6.4265,[83]6.3741,[84]6.3583,[85]6.3459,[86]6.3247,[87]6.2633,[88]6.2404,[89]6.2191,[90]6.2035,[91]6.2272,[92]6.2207,[93]6.2198,[94]6.2167,[95]6.2456,[96]6.2447,[97]6.2391,[98]6.2328,[99]6.2182,[100]6.2163,[101]6.2412,[102]6.2350,[103]6.2547,[104]6.2630,[105]6.2621,[106]6.2788,[107]6.2791,[108]6.2905,[109]6.2842,[110]6.2803,[111]6.3018,[112]6.3221,[113]6.3258,[114]6.3222,[115]6.3277,[116]6.3181,[117]6.3235,[118]6.3511,[119]6.3737,[120]6.4096,[121]6.4245,[122]6.4485,[123]6.4858,[124]6.5043,[125]6.4947,[126]6.5342,[127]6.5710,[128]6.6031,[129]6.5870,[130]6.5969,[131]6.5929,[132]6.5842,[133]6.5717,[134]6.5814,[135]6.5778,[136]6.5665,[137]6.5595,[138]6.5430,[139]6.5320,[140]6.5280,[141]6.4994,[142]6.4971,[143]6.4690,[144]6.4480,[145]6.4400,[146]6.4282,[147]6.4322,[148]6.4325,[149]6.4282,[150]6.4242,[151]6.4271,[152]6.4162,[153]6.4006,[154]6.3919,[155]6.3982,[156]6.3933,[157]6.4097,[158]6.4130,[159]6.4185,[160]6.4217,[161]6.4337,[162]6.4060,[163]6.3946,[164]6.3711,[165]6.3397,[166]6.3126,[167]6.2747,[168]6.2445,[169]6.2317,[170]6.2203,[171]6.1943,[172]6.1773,[173]6.1616,[174]6.1318,[175]6.1110,[176]6.0994,[177]6.0797,[178]6.0565,[179]6.0399,[180]6.0300,[181]6.0085,[182]5.9912,[183]5.9779,[184]5.9771,[185]5.9699,[186]5.9705,[187]5.9762,[188]5.9721,[189]5.9900,[190]5.9917,[191]6.0128,[192]6.0285,[193]6.0453,[194]6.0569,[195]6.0787,[196]6.0946,[197]6.1156,[198]6.1315,[199]6.1346,[200]6.1399,[201]6.1354,[202]6.1543,[203]6.1621,[204]6.1611,[205]6.1720,[206]6.1790,[207]6.1757,[208]6.1842,[209]6.1887,[210]6.1931,[211]6.2041,[212]6.2121,[213]6.2223,[214]6.2255,[215]6.2279,[216]6.2421,[217]6.2608,[218]6.2747,[219]6.2752,[220]6.2713,[221]6.2654,[222]6.2632,[223]6.2529,[224]6.2461,[225]6.2422,[226]6.2625,[227]6.2713,[228]6.2771,[229]6.2830,[230]6.2799,[231]6.2963,[232]6.2846,[233]6.2679,[234]6.2525,[235]6.2338,[236]6.2277,[237]6.2179,[238]6.2201,[239]6.2049,[240]6.1942,[241]6.1966,[242]6.1995,[243]6.1975,[244]6.1864,[245]6.1834,[246]6.1726,[247]6.1607,[248]6.1532,[249]6.1497,[250]6.1539,[251]6.1470,[252]6.1429,[253]6.1336,[254]6.1290,[255]6.1179,[256]6.1001,[257]6.0875,[258]6.0791,[259]6.0768,[260]6.0686,[261]6.0641,[262]6.0586,[263]6.0526,[264]6.0322,[265]6.0316,[266]6.0303,[267]6.0235,[268]6.0315,[269]6.0304,[270]6.0303,[271]6.0381,[272]6.0415,[273]6.0418,[274]6.0442,[275]6.0528,[276]6.0586,[277]6.0737,[278]6.0836,[279]6.0923,[280]6.0949,[281]6.1053,[282]6.1110,[283]6.1261,[284]6.1337,[285]6.1417,[286]6.1546,[287]6.1538,[288]6.1597,[289]6.1509,[290]6.1350,[291]6.1194,[292]6.1044,[293]6.0915,[294]6.0934,[295]6.0927,[296]6.0977,[297]6.0971,[298]6.1007,[299]6.0984,[300]6.0873,[301]6.0869,[302]6.0792,[303]6.0703,[304]6.0617,[305]6.0581,[306]6.0458,[307]6.0478,[308]6.0506,[309]6.0346,[310]6.0288,[311]6.0226,[312]6.0249,[313]6.0190,[314]6.0173,[315]6.0016,[316]5.9970,[317]5.9807,[318]5.9602,[319]5.9721,[320]5.9843,[321]5.9885,[322]5.9843,[323]5.9774,[324]5.9741,[325]5.9852,[326]5.9851,[327]5.9872,[328]5.9904,[329]5.9961,[330]5.9990,[331]6.0111,[332]6.0084,[333]6.0155,[334]6.0098,[335]6.0035,[336]6.0067,[337]6.0044,[338]6.0034,[339]5.9981,[340]5.9940,[341]6.0019,[342]6.0047,[343]6.0094,[344]6.0096,[345]6.0097,[346]6.0067,[347]6.0107,[348]6.0144,[349]6.0166,[350]6.0138,[351]6.0146,[352]6.0147,[353]6.0085,[354]6.0089,[355]6.0142,[356]6.0172,[357]6.0141,[358]6.0234,[359]6.0259,[360]6.0228,[361]6.0223,[362]6.0292,[363]6.0403,[364]6.0467,[365]6.0517,[366]6.0535,[367]6.0622,[368]6.0595,[369]6.0607,[370]6.0625,[371]6.0572,[372]6.0623,[373]6.0668,[374]6.0653,[375]6.0655,[376]6.0722,[377]6.0677,[378]6.0701,[379]6.0760,[380]6.0682,[381]6.0649,[382]6.0604,[383]6.0595,[384]6.0590,[385]6.0580,[386]6.0577,[387]6.0578,[388]6.0541,[389]6.0489,[390]6.0424,[391]6.0347,[392]6.0305,[393]6.0291,[394]6.0318,[395]6.0304,[396]6.0230,[397]6.0297,[398]6.0335,[399]6.0412,[400]6.0408,[401]6.0423,[402]6.0435,[403]6.0454,[404]6.0518,[405]6.0427,[406]6.0397,[407]6.0393,[408]6.0412,[409]6.0527,[410]6.0638,[411]6.0752,[412]6.0912,[413]6.1023,[414]6.1100,[415]6.1151,[416]6.1229,[417]6.1351,[418]6.1385,[419]6.1459,[420]6.1550,[421]6.1664,[422]6.1704,[423]6.1773,[424]6.1877,[425]6.1964,[426]6.2031,[427]6.2077,[428]6.2159,[429]6.2214,[430]6.2294,[431]6.2432,[432]6.2472,[433]6.2465,[434]6.2419,[435]6.2429,[436]6.2455,[437]6.2553,[438]6.2629,[439]6.2596,[440]6.2586,[441]6.2537,[442]6.2518,[443]6.2527,[444]6.2534,[445]6.2513,[446]6.2535,[447]6.2565,[448]6.2607,[449]6.2583,[450]6.2591,[451]6.2552,[452]6.2430,[453]6.2347,[454]6.2289,[455]6.2296,[456]6.2348,[457]6.2371,[458]6.2351,[459]6.2357,[460]6.2442,[461]6.2415,[462]6.2401,[463]6.2444,[464]6.2432,[465]6.2404,[466]6.2330,[467]6.2337,[468]6.2335,[469]6.2358,[470]6.2363,[471]6.2316,[472]6.2366,[473]6.2313,[474]6.2326,[475]6.2269,[476]6.2288,[477]6.2218,[478]6.2209,[479]6.2265,[480]6.2309,[481]6.2327,[482]6.2281,[483]6.2240,[484]6.2257,[485]6.2238,[486]6.2177,[487]6.2174,[488]6.2155,[489]6.2106,[490]6.2084,[491]6.2057,[492]6.2001,[493]6.1973,[494]6.1955,[495]6.1952,[496]6.1915,[497]6.1859,[498]6.1844,[499]6.1799,[500]6.1704,[501]6.1640,[502]6.1640,[503]6.1635,[504]6.1547,[505]6.1569,[506]6.1578,[507]6.1524,[508]6.1486,[509]6.1479,[510]6.1515,[511]6.1563,[512]6.1601,[513]6.1619,[514]6.1683,[515]6.1629,[516]6.1622,[517]6.1631,[518]6.1628,[519]6.1660,[520]6.1680,[521]6.1695,[522]6.1723,[523]6.1731,[524]6.1790,[525]6.1823,[526]6.1831,[527]6.1846,[528]6.1797,[529]6.1803,[530]6.1750,[531]6.1734,[532]6.1783,[533]6.1806,[534]6.1790,[535]6.1812,[536]6.1760,[537]6.1738,[538]6.1789,[539]6.1797,[540]6.1834,[541]6.1836,[542]6.1842,[543]6.1858,[544]6.1869,[545]6.1849,[546]6.1857,[547]6.1818,[548]6.1769,[549]6.1767,[550]6.1739,[551]6.1702,[552]6.1679,[553]6.1643,[554]6.1620,[555]6.1589,[556]6.1584,[557]6.1606,[558]6.1568,[559]6.1566,[560]6.1565,[561]6.1570,[562]6.1546,[563]6.1544,[564]6.1589,[565]6.1611,[566]6.1611,[567]6.1592,[568]6.1596,[569]6.1581,[570]6.1609,[571]6.1612,[572]6.1617,[573]6.1614,[574]6.1579,[575]6.1574,[576]6.1573,[577]6.1554,[578]6.1532,[579]6.1533,[580]6.1470,[581]6.1433,[582]6.1425,[583]6.1433,[584]6.1436,[585]6.1361,[586]6.1293,[587]6.1298,[588]6.1345,[589]6.1401,[590]6.1430,[591]6.1452,[592]6.1440,[593]6.1406,[594]6.1417,[595]6.1393,[596]6.1427,[597]6.1405,[598]6.1380,[599]6.1402,[600]6.1402,[601]6.1389,[602]6.1408,[603]6.1433,[604]6.1442,[605]6.1479,[606]6.1500,[607]6.1484,[608]6.1448,[609]6.1453,[610]6.1489,[611]6.1475,[612]6.1500,[613]6.1464,[614]6.1417,[615]6.1341,[616]6.1367,[617]6.1306,[618]6.1258,[619]6.1202,[620]6.1064,[621]6.0996,[622]6.0980,[623]6.0997,[624]6.1002,[625]6.1003,[626]6.0994,[627]6.1020,[628]6.1022,[629]6.1018,[630]6.1048,[631]6.1104,[632]6.1162,[633]6.1147,[634]6.1181,[635]6.1186,[636]6.1151,[637]6.1117,[638]6.1143,[639]6.1111,[640]6.1121,[641]6.1122,[642]6.1187,[643]6.1206,[644]6.1217,[645]6.1200,[646]6.1243,[647]6.1205,[648]6.1216,[649]6.1218,[650]6.1259,[651]6.1313,[652]6.1325,[653]6.1363,[654]6.1300,[655]6.1293,

llama_print_timings: load time = 35775.80 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 22707221.62 ms / 335360 tokens ( 67.71 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 22741802.40 ms

ggerganov · 2023-04-19T10:09:41Z

Perplexity 7B q4_1: 6.1293
Seems a bit higher than expected.

It's just about right.

The ppl delta for Q4_0 between BLAS and non-BLAS is -0.006 (ref #951)

The ppl for Q4_1 using BLAS (i.e. using dequantization to F32) is 6.1286.
So applying the same delta as observed for Q4_0, the expected value for Q4_1 without BLAS (using 8-bit quantization) is:
6.1286 + 0.0060 = 6.1346

So, it is actually better than what I expected.

I'll run the perplexity later on my M1 to confirm the results.

* ggml : use 8-bit precision for Q4_1 intermediate results (ARM) * ggml : optimize ggml_vec_dot_q4_1_q8_0() via vmalq_n_f32 56 ms/token with Q4_1 ! * ggml : AVX2 implementation of ggml_vec_dot_q4_1_q8_0 (#1051) * gitignore : ignore ppl-*.txt files --------- Co-authored-by: slaren <[email protected]>

AVX2 implementation of ggml_vec_dot_q4_1_q8_0

142c38a

ggerganov merged commit 04a6b36 into ggerganov:q4_1xq8_0 Apr 19, 2023

ggerganov pushed a commit that referenced this pull request Apr 19, 2023

ggml : AVX2 implementation of ggml_vec_dot_q4_1_q8_0 (#1051)

ad7007a

slaren deleted the pr1047 branch April 19, 2023 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AVX2 implementation of ggml_vec_dot_q4_1_q8_0 #1051

AVX2 implementation of ggml_vec_dot_q4_1_q8_0 #1051

slaren commented Apr 19, 2023

slaren commented Apr 19, 2023

ggerganov commented Apr 19, 2023 •

edited

Loading

AVX2 implementation of ggml_vec_dot_q4_1_q8_0 #1051

AVX2 implementation of ggml_vec_dot_q4_1_q8_0 #1051

Conversation

slaren commented Apr 19, 2023

slaren commented Apr 19, 2023

ggerganov commented Apr 19, 2023 • edited Loading

ggerganov commented Apr 19, 2023 •

edited

Loading