Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVX2 implementation of ggml_vec_dot_q4_1_q8_0 #1051

Merged
merged 1 commit into from
Apr 19, 2023

Conversation

slaren
Copy link
Collaborator

@slaren slaren commented Apr 19, 2023

This is about 20%-30% slower than q4_0_q8_0, depending on the batch size.

@slaren
Copy link
Collaborator Author

slaren commented Apr 19, 2023

Perplexity 7B q4_1: 6.1293
Seems a bit higher than expected.

./perplexity -m models/7B/ggml-model-q4_1.bin -f wikitext-2-raw/wiki.test.raw -t 12 main: seed = 1681867565 llama.cpp: loading model from models/7B/ggml-model-q4_1.bin llama_model_load_internal: format = ggjt v1 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 512 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 3 (mostly Q4_1) llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 59.11 KB llama_model_load_internal: mem required = 6612.57 MB (+ 1026.00 MB per state) llama_init_from_file: kv self size = 256.00 MB

system_info: n_threads = 12 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
35.00 seconds per pass - ETA 6.37 hours
[1]4.4236,[2]4.8796,[3]5.7700,[4]6.3763,[5]6.4859,[6]6.4581,[7]6.6521,[8]6.7559,[9]7.0840,[10]7.3382,[11]7.5618,[12]7.6052,[13]7.5327,[14]7.5960,[15]7.8411,[16]7.4414,[17]7.3184,[18]7.2612,[19]6.8884,[20]6.8677,[21]6.7710,[22]6.5991,[23]6.5700,[24]6.4803,[25]6.4823,[26]6.3179,[27]6.1374,[28]6.0309,[29]5.9410,[30]5.7791,[31]5.7497,[32]5.7677,[33]5.7104,[34]5.7440,[35]5.7658,[36]5.8059,[37]5.8089,[38]5.8125,[39]5.8471,[40]5.8951,[41]5.9080,[42]5.9488,[43]5.9081,[44]5.9667,[45]5.9737,[46]5.9468,[47]5.9661,[48]5.9397,[49]5.9383,[50]5.8972,[51]5.8914,[52]5.8794,[53]5.9281,[54]5.9112,[55]5.8898,[56]5.9180,[57]5.9370,[58]5.9556,[59]5.9739,[60]6.0172,[61]6.0062,[62]6.0632,[63]6.0930,[64]6.1035,[65]6.1483,[66]6.1586,[67]6.1775,[68]6.1908,[69]6.2143,[70]6.2426,[71]6.2631,[72]6.2945,[73]6.3512,[74]6.3546,[75]6.3698,[76]6.3830,[77]6.3949,[78]6.3813,[79]6.4100,[80]6.4041,[81]6.4209,[82]6.4265,[83]6.3741,[84]6.3583,[85]6.3459,[86]6.3247,[87]6.2633,[88]6.2404,[89]6.2191,[90]6.2035,[91]6.2272,[92]6.2207,[93]6.2198,[94]6.2167,[95]6.2456,[96]6.2447,[97]6.2391,[98]6.2328,[99]6.2182,[100]6.2163,[101]6.2412,[102]6.2350,[103]6.2547,[104]6.2630,[105]6.2621,[106]6.2788,[107]6.2791,[108]6.2905,[109]6.2842,[110]6.2803,[111]6.3018,[112]6.3221,[113]6.3258,[114]6.3222,[115]6.3277,[116]6.3181,[117]6.3235,[118]6.3511,[119]6.3737,[120]6.4096,[121]6.4245,[122]6.4485,[123]6.4858,[124]6.5043,[125]6.4947,[126]6.5342,[127]6.5710,[128]6.6031,[129]6.5870,[130]6.5969,[131]6.5929,[132]6.5842,[133]6.5717,[134]6.5814,[135]6.5778,[136]6.5665,[137]6.5595,[138]6.5430,[139]6.5320,[140]6.5280,[141]6.4994,[142]6.4971,[143]6.4690,[144]6.4480,[145]6.4400,[146]6.4282,[147]6.4322,[148]6.4325,[149]6.4282,[150]6.4242,[151]6.4271,[152]6.4162,[153]6.4006,[154]6.3919,[155]6.3982,[156]6.3933,[157]6.4097,[158]6.4130,[159]6.4185,[160]6.4217,[161]6.4337,[162]6.4060,[163]6.3946,[164]6.3711,[165]6.3397,[166]6.3126,[167]6.2747,[168]6.2445,[169]6.2317,[170]6.2203,[171]6.1943,[172]6.1773,[173]6.1616,[174]6.1318,[175]6.1110,[176]6.0994,[177]6.0797,[178]6.0565,[179]6.0399,[180]6.0300,[181]6.0085,[182]5.9912,[183]5.9779,[184]5.9771,[185]5.9699,[186]5.9705,[187]5.9762,[188]5.9721,[189]5.9900,[190]5.9917,[191]6.0128,[192]6.0285,[193]6.0453,[194]6.0569,[195]6.0787,[196]6.0946,[197]6.1156,[198]6.1315,[199]6.1346,[200]6.1399,[201]6.1354,[202]6.1543,[203]6.1621,[204]6.1611,[205]6.1720,[206]6.1790,[207]6.1757,[208]6.1842,[209]6.1887,[210]6.1931,[211]6.2041,[212]6.2121,[213]6.2223,[214]6.2255,[215]6.2279,[216]6.2421,[217]6.2608,[218]6.2747,[219]6.2752,[220]6.2713,[221]6.2654,[222]6.2632,[223]6.2529,[224]6.2461,[225]6.2422,[226]6.2625,[227]6.2713,[228]6.2771,[229]6.2830,[230]6.2799,[231]6.2963,[232]6.2846,[233]6.2679,[234]6.2525,[235]6.2338,[236]6.2277,[237]6.2179,[238]6.2201,[239]6.2049,[240]6.1942,[241]6.1966,[242]6.1995,[243]6.1975,[244]6.1864,[245]6.1834,[246]6.1726,[247]6.1607,[248]6.1532,[249]6.1497,[250]6.1539,[251]6.1470,[252]6.1429,[253]6.1336,[254]6.1290,[255]6.1179,[256]6.1001,[257]6.0875,[258]6.0791,[259]6.0768,[260]6.0686,[261]6.0641,[262]6.0586,[263]6.0526,[264]6.0322,[265]6.0316,[266]6.0303,[267]6.0235,[268]6.0315,[269]6.0304,[270]6.0303,[271]6.0381,[272]6.0415,[273]6.0418,[274]6.0442,[275]6.0528,[276]6.0586,[277]6.0737,[278]6.0836,[279]6.0923,[280]6.0949,[281]6.1053,[282]6.1110,[283]6.1261,[284]6.1337,[285]6.1417,[286]6.1546,[287]6.1538,[288]6.1597,[289]6.1509,[290]6.1350,[291]6.1194,[292]6.1044,[293]6.0915,[294]6.0934,[295]6.0927,[296]6.0977,[297]6.0971,[298]6.1007,[299]6.0984,[300]6.0873,[301]6.0869,[302]6.0792,[303]6.0703,[304]6.0617,[305]6.0581,[306]6.0458,[307]6.0478,[308]6.0506,[309]6.0346,[310]6.0288,[311]6.0226,[312]6.0249,[313]6.0190,[314]6.0173,[315]6.0016,[316]5.9970,[317]5.9807,[318]5.9602,[319]5.9721,[320]5.9843,[321]5.9885,[322]5.9843,[323]5.9774,[324]5.9741,[325]5.9852,[326]5.9851,[327]5.9872,[328]5.9904,[329]5.9961,[330]5.9990,[331]6.0111,[332]6.0084,[333]6.0155,[334]6.0098,[335]6.0035,[336]6.0067,[337]6.0044,[338]6.0034,[339]5.9981,[340]5.9940,[341]6.0019,[342]6.0047,[343]6.0094,[344]6.0096,[345]6.0097,[346]6.0067,[347]6.0107,[348]6.0144,[349]6.0166,[350]6.0138,[351]6.0146,[352]6.0147,[353]6.0085,[354]6.0089,[355]6.0142,[356]6.0172,[357]6.0141,[358]6.0234,[359]6.0259,[360]6.0228,[361]6.0223,[362]6.0292,[363]6.0403,[364]6.0467,[365]6.0517,[366]6.0535,[367]6.0622,[368]6.0595,[369]6.0607,[370]6.0625,[371]6.0572,[372]6.0623,[373]6.0668,[374]6.0653,[375]6.0655,[376]6.0722,[377]6.0677,[378]6.0701,[379]6.0760,[380]6.0682,[381]6.0649,[382]6.0604,[383]6.0595,[384]6.0590,[385]6.0580,[386]6.0577,[387]6.0578,[388]6.0541,[389]6.0489,[390]6.0424,[391]6.0347,[392]6.0305,[393]6.0291,[394]6.0318,[395]6.0304,[396]6.0230,[397]6.0297,[398]6.0335,[399]6.0412,[400]6.0408,[401]6.0423,[402]6.0435,[403]6.0454,[404]6.0518,[405]6.0427,[406]6.0397,[407]6.0393,[408]6.0412,[409]6.0527,[410]6.0638,[411]6.0752,[412]6.0912,[413]6.1023,[414]6.1100,[415]6.1151,[416]6.1229,[417]6.1351,[418]6.1385,[419]6.1459,[420]6.1550,[421]6.1664,[422]6.1704,[423]6.1773,[424]6.1877,[425]6.1964,[426]6.2031,[427]6.2077,[428]6.2159,[429]6.2214,[430]6.2294,[431]6.2432,[432]6.2472,[433]6.2465,[434]6.2419,[435]6.2429,[436]6.2455,[437]6.2553,[438]6.2629,[439]6.2596,[440]6.2586,[441]6.2537,[442]6.2518,[443]6.2527,[444]6.2534,[445]6.2513,[446]6.2535,[447]6.2565,[448]6.2607,[449]6.2583,[450]6.2591,[451]6.2552,[452]6.2430,[453]6.2347,[454]6.2289,[455]6.2296,[456]6.2348,[457]6.2371,[458]6.2351,[459]6.2357,[460]6.2442,[461]6.2415,[462]6.2401,[463]6.2444,[464]6.2432,[465]6.2404,[466]6.2330,[467]6.2337,[468]6.2335,[469]6.2358,[470]6.2363,[471]6.2316,[472]6.2366,[473]6.2313,[474]6.2326,[475]6.2269,[476]6.2288,[477]6.2218,[478]6.2209,[479]6.2265,[480]6.2309,[481]6.2327,[482]6.2281,[483]6.2240,[484]6.2257,[485]6.2238,[486]6.2177,[487]6.2174,[488]6.2155,[489]6.2106,[490]6.2084,[491]6.2057,[492]6.2001,[493]6.1973,[494]6.1955,[495]6.1952,[496]6.1915,[497]6.1859,[498]6.1844,[499]6.1799,[500]6.1704,[501]6.1640,[502]6.1640,[503]6.1635,[504]6.1547,[505]6.1569,[506]6.1578,[507]6.1524,[508]6.1486,[509]6.1479,[510]6.1515,[511]6.1563,[512]6.1601,[513]6.1619,[514]6.1683,[515]6.1629,[516]6.1622,[517]6.1631,[518]6.1628,[519]6.1660,[520]6.1680,[521]6.1695,[522]6.1723,[523]6.1731,[524]6.1790,[525]6.1823,[526]6.1831,[527]6.1846,[528]6.1797,[529]6.1803,[530]6.1750,[531]6.1734,[532]6.1783,[533]6.1806,[534]6.1790,[535]6.1812,[536]6.1760,[537]6.1738,[538]6.1789,[539]6.1797,[540]6.1834,[541]6.1836,[542]6.1842,[543]6.1858,[544]6.1869,[545]6.1849,[546]6.1857,[547]6.1818,[548]6.1769,[549]6.1767,[550]6.1739,[551]6.1702,[552]6.1679,[553]6.1643,[554]6.1620,[555]6.1589,[556]6.1584,[557]6.1606,[558]6.1568,[559]6.1566,[560]6.1565,[561]6.1570,[562]6.1546,[563]6.1544,[564]6.1589,[565]6.1611,[566]6.1611,[567]6.1592,[568]6.1596,[569]6.1581,[570]6.1609,[571]6.1612,[572]6.1617,[573]6.1614,[574]6.1579,[575]6.1574,[576]6.1573,[577]6.1554,[578]6.1532,[579]6.1533,[580]6.1470,[581]6.1433,[582]6.1425,[583]6.1433,[584]6.1436,[585]6.1361,[586]6.1293,[587]6.1298,[588]6.1345,[589]6.1401,[590]6.1430,[591]6.1452,[592]6.1440,[593]6.1406,[594]6.1417,[595]6.1393,[596]6.1427,[597]6.1405,[598]6.1380,[599]6.1402,[600]6.1402,[601]6.1389,[602]6.1408,[603]6.1433,[604]6.1442,[605]6.1479,[606]6.1500,[607]6.1484,[608]6.1448,[609]6.1453,[610]6.1489,[611]6.1475,[612]6.1500,[613]6.1464,[614]6.1417,[615]6.1341,[616]6.1367,[617]6.1306,[618]6.1258,[619]6.1202,[620]6.1064,[621]6.0996,[622]6.0980,[623]6.0997,[624]6.1002,[625]6.1003,[626]6.0994,[627]6.1020,[628]6.1022,[629]6.1018,[630]6.1048,[631]6.1104,[632]6.1162,[633]6.1147,[634]6.1181,[635]6.1186,[636]6.1151,[637]6.1117,[638]6.1143,[639]6.1111,[640]6.1121,[641]6.1122,[642]6.1187,[643]6.1206,[644]6.1217,[645]6.1200,[646]6.1243,[647]6.1205,[648]6.1216,[649]6.1218,[650]6.1259,[651]6.1313,[652]6.1325,[653]6.1363,[654]6.1300,[655]6.1293,

llama_print_timings: load time = 35775.80 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 22707221.62 ms / 335360 tokens ( 67.71 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 22741802.40 ms

@ggerganov
Copy link
Owner

ggerganov commented Apr 19, 2023

Perplexity 7B q4_1: 6.1293
Seems a bit higher than expected.

It's just about right.

The ppl delta for Q4_0 between BLAS and non-BLAS is -0.006 (ref #951)

The ppl for Q4_1 using BLAS (i.e. using dequantization to F32) is 6.1286.
So applying the same delta as observed for Q4_0, the expected value for Q4_1 without BLAS (using 8-bit quantization) is:
6.1286 + 0.0060 = 6.1346

So, it is actually better than what I expected.

I'll run the perplexity later on my M1 to confirm the results.

@ggerganov ggerganov merged commit 04a6b36 into ggerganov:q4_1xq8_0 Apr 19, 2023
@slaren slaren deleted the pr1047 branch April 19, 2023 16:39
ggerganov added a commit that referenced this pull request Apr 19, 2023
* ggml : use 8-bit precision for Q4_1 intermediate results (ARM)

* ggml : optimize ggml_vec_dot_q4_1_q8_0() via vmalq_n_f32

56 ms/token with Q4_1 !

* ggml : AVX2 implementation of ggml_vec_dot_q4_1_q8_0 (#1051)

* gitignore : ignore ppl-*.txt files

---------

Co-authored-by: slaren <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants