Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure perplexity delta between Q4_0 and F16 "output" tensor #1003

Closed
ggerganov opened this issue Apr 15, 2023 · 2 comments
Closed

Measure perplexity delta between Q4_0 and F16 "output" tensor #1003

ggerganov opened this issue Apr 15, 2023 · 2 comments
Assignees
Labels
generation quality Quality of model output good first issue Good for newcomers help wanted Extra attention is needed high priority Very important issue

Comments

@ggerganov
Copy link
Owner

ggerganov commented Apr 15, 2023

The last tensor of the transformer (called output in llama.cpp) is one of the biggest ones:

model.output = ml->get_tensor("output.weight", {n_embd, n_vocab});

I wonder how the perplexity improves by keeping it in F16 format instead of quantizing that particular tensor

Results

Q4_0 M1 Pro (with BLAS) [655]6.2838 (i.e. reference)
$  make clean && make -j perplexity && time ./perplexity -m ./models/7B/ggml-model-q4_0.bin -f ./build/wiki.test.raw -t 8
I llama.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.3 (clang-1403.0.22.14.1)
I CXX:      Apple clang version 14.0.3 (clang-1403.0.22.14.1)

rm -vf *.o main quantize quantize-stats perplexity embedding benchmark-q4_0-matmult
common.o
ggml.o
llama.o
main
quantize
perplexity
embedding
I llama.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.3 (clang-1403.0.22.14.1)
I CXX:      Apple clang version 14.0.3 (clang-1403.0.22.14.1)

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE   -c ggml.c -o ggml.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -c llama.cpp -o llama.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -c examples/common.cpp -o common.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread examples/perplexity/perplexity.cpp ggml.o llama.o common.o -o perplexity  -framework Accelerate
main: seed = 1681463663
llama.cpp: loading model from ./models/7B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  59.11 KB
llama_model_load_internal: mem required  = 5809.32 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 
perplexity : calculating perplexity over 655 chunks, batch_size=512
10.60 seconds per pass - ETA 1.93 hours
[1]4.3802,[2]4.9555,[3]5.8269,[4]6.4692,[5]6.5435,[6]6.5411,[7]6.7174,[8]6.8069,[9]7.1756,[10]7.4121,[11]7.6567,[12]7.6957,[13]7.6058,[14]7.6820,[15]7.9366,[16]7.5419,[17]7.4189,[18]7.3798,[19]7.0077,[20]6.9948,[21]6.8969,[22]6.7125,[23]6.6744,[24]6.5868,[25]6.5871,[26]6.4149,[27]6.2349,[28]6.1341,[29]6.0498,[30]5.8938,[31]5.8659,[32]5.8839,[33]5.8189,[34]5.8537,[35]5.8795,[36]5.9232,[37]5.9272,[38]5.9443,[39]5.9825,[40]6.0412,[41]6.0482,[42]6.0826,[43]6.0397,[44]6.0944,[45]6.0989,[46]6.0729,[47]6.0967,[48]6.0674,[49]6.0745,[50]6.0351,[51]6.0309,[52]6.0200,[53]6.0641,[54]6.0476,[55]6.0250,[56]6.0593,[57]6.0824,[58]6.1043,[59]6.1182,[60]6.1647,[61]6.1536,[62]6.2166,[63]6.2502,[64]6.2653,[65]6.3119,[66]6.3220,[67]6.3401,[68]6.3541,[69]6.3790,[70]6.4113,[71]6.4327,[72]6.4625,[73]6.5276,[74]6.5330,[75]6.5474,[76]6.5637,[77]6.5770,[78]6.5618,[79]6.5914,[80]6.5839,[81]6.5967,[82]6.6005,[83]6.5468,[84]6.5322,[85]6.5208,[86]6.4997,[87]6.4344,[88]6.4059,[89]6.3853,[90]6.3687,[91]6.3948,[92]6.3909,[93]6.3935,[94]6.3910,[95]6.4198,[96]6.4177,[97]6.4105,[98]6.4035,[99]6.3895,[100]6.3895,[101]6.4154,[102]6.4091,[103]6.4308,[104]6.4376,[105]6.4361,[106]6.4538,[107]6.4525,[108]6.4648,[109]6.4595,[110]6.4550,[111]6.4779,[112]6.4969,[113]6.4983,[114]6.4949,[115]6.5032,[116]6.4958,[117]6.5014,[118]6.5298,[119]6.5507,[120]6.5872,[121]6.6035,[122]6.6282,[123]6.6672,[124]6.6850,[125]6.6762,[126]6.7153,[127]6.7524,[128]6.7798,[129]6.7629,[130]6.7725,[131]6.7672,[132]6.7584,[133]6.7456,[134]6.7568,[135]6.7534,[136]6.7402,[137]6.7322,[138]6.7151,[139]6.7035,[140]6.7005,[141]6.6707,[142]6.6658,[143]6.6379,[144]6.6178,[145]6.6092,[146]6.5957,[147]6.6031,[148]6.6054,[149]6.5994,[150]6.5953,[151]6.5965,[152]6.5870,[153]6.5703,[154]6.5613,[155]6.5680,[156]6.5630,[157]6.5813,[158]6.5849,[159]6.5890,[160]6.5916,[161]6.6041,[162]6.5739,[163]6.5619,[164]6.5357,[165]6.5039,[166]6.4751,[167]6.4377,[168]6.4051,[169]6.3916,[170]6.3791,[171]6.3502,[172]6.3322,[173]6.3136,[174]6.2829,[175]6.2607,[176]6.2505,[177]6.2295,[178]6.2059,[179]6.1887,[180]6.1798,[181]6.1574,[182]6.1382,[183]6.1239,[184]6.1238,[185]6.1165,[186]6.1182,[187]6.1236,[188]6.1200,[189]6.1384,[190]6.1393,[191]6.1597,[192]6.1760,[193]6.1938,[194]6.2054,[195]6.2263,[196]6.2434,[197]6.2655,[198]6.2810,[199]6.2840,[200]6.2885,[201]6.2844,[202]6.3049,[203]6.3115,[204]6.3114,[205]6.3224,[206]6.3302,[207]6.3262,[208]6.3346,[209]6.3398,[210]6.3449,[211]6.3547,[212]6.3620,[213]6.3727,[214]6.3762,[215]6.3802,[216]6.3951,[217]6.4129,[218]6.4264,[219]6.4267,[220]6.4231,[221]6.4168,[222]6.4133,[223]6.4024,[224]6.3958,[225]6.3910,[226]6.4125,[227]6.4212,[228]6.4271,[229]6.4338,[230]6.4294,[231]6.4462,[232]6.4332,[233]6.4160,[234]6.4004,[235]6.3845,[236]6.3768,[237]6.3664,[238]6.3698,[239]6.3536,[240]6.3433,[241]6.3466,[242]6.3503,[243]6.3488,[244]6.3368,[245]6.3342,[246]6.3221,[247]6.3097,[248]6.3030,[249]6.3010,[250]6.3057,[251]6.2980,[252]6.2946,[253]6.2844,[254]6.2804,[255]6.2688,[256]6.2496,[257]6.2385,[258]6.2299,[259]6.2279,[260]6.2197,[261]6.2154,[262]6.2095,[263]6.2050,[264]6.1858,[265]6.1850,[266]6.1835,[267]6.1766,[268]6.1862,[269]6.1843,[270]6.1850,[271]6.1928,[272]6.1974,[273]6.1969,[274]6.1983,[275]6.2073,[276]6.2128,[277]6.2288,[278]6.2397,[279]6.2483,[280]6.2518,[281]6.2617,[282]6.2678,[283]6.2825,[284]6.2902,[285]6.2997,[286]6.3144,[287]6.3138,[288]6.3198,[289]6.3107,[290]6.2956,[291]6.2802,[292]6.2644,[293]6.2505,[294]6.2530,[295]6.2524,[296]6.2567,[297]6.2553,[298]6.2579,[299]6.2551,[300]6.2439,[301]6.2440,[302]6.2359,[303]6.2282,[304]6.2204,[305]6.2180,[306]6.2047,[307]6.2072,[308]6.2104,[309]6.1941,[310]6.1880,[311]6.1816,[312]6.1838,[313]6.1782,[314]6.1769,[315]6.1604,[316]6.1562,[317]6.1395,[318]6.1179,[319]6.1298,[320]6.1428,[321]6.1466,[322]6.1421,[323]6.1355,[324]6.1331,[325]6.1431,[326]6.1430,[327]6.1451,[328]6.1494,[329]6.1554,[330]6.1579,[331]6.1703,[332]6.1671,[333]6.1741,[334]6.1682,[335]6.1617,[336]6.1655,[337]6.1625,[338]6.1612,[339]6.1555,[340]6.1511,[341]6.1589,[342]6.1613,[343]6.1669,[344]6.1668,[345]6.1667,[346]6.1638,[347]6.1686,[348]6.1727,[349]6.1746,[350]6.1712,[351]6.1717,[352]6.1717,[353]6.1665,[354]6.1664,[355]6.1718,[356]6.1749,[357]6.1712,[358]6.1802,[359]6.1833,[360]6.1795,[361]6.1791,[362]6.1858,[363]6.1970,[364]6.2035,[365]6.2093,[366]6.2100,[367]6.2188,[368]6.2165,[369]6.2175,[370]6.2185,[371]6.2125,[372]6.2178,[373]6.2234,[374]6.2220,[375]6.2217,[376]6.2301,[377]6.2252,[378]6.2277,[379]6.2338,[380]6.2254,[381]6.2211,[382]6.2154,[383]6.2144,[384]6.2137,[385]6.2124,[386]6.2119,[387]6.2111,[388]6.2066,[389]6.2012,[390]6.1943,[391]6.1862,[392]6.1821,[393]6.1802,[394]6.1828,[395]6.1812,[396]6.1738,[397]6.1814,[398]6.1852,[399]6.1935,[400]6.1931,[401]6.1944,[402]6.1950,[403]6.1969,[404]6.2032,[405]6.1937,[406]6.1903,[407]6.1895,[408]6.1905,[409]6.2029,[410]6.2139,[411]6.2264,[412]6.2427,[413]6.2542,[414]6.2618,[415]6.2670,[416]6.2750,[417]6.2881,[418]6.2916,[419]6.2990,[420]6.3076,[421]6.3197,[422]6.3255,[423]6.3326,[424]6.3446,[425]6.3537,[426]6.3602,[427]6.3647,[428]6.3730,[429]6.3775,[430]6.3865,[431]6.4011,[432]6.4054,[433]6.4041,[434]6.3995,[435]6.4002,[436]6.4026,[437]6.4121,[438]6.4200,[439]6.4164,[440]6.4158,[441]6.4108,[442]6.4099,[443]6.4112,[444]6.4115,[445]6.4095,[446]6.4118,[447]6.4147,[448]6.4190,[449]6.4164,[450]6.4167,[451]6.4124,[452]6.4005,[453]6.3922,[454]6.3862,[455]6.3869,[456]6.3917,[457]6.3934,[458]6.3912,[459]6.3922,[460]6.4009,[461]6.3981,[462]6.3965,[463]6.4015,[464]6.4006,[465]6.3976,[466]6.3895,[467]6.3898,[468]6.3897,[469]6.3919,[470]6.3924,[471]6.3876,[472]6.3922,[473]6.3866,[474]6.3880,[475]6.3821,[476]6.3844,[477]6.3773,[478]6.3764,[479]6.3827,[480]6.3879,[481]6.3899,[482]6.3854,[483]6.3813,[484]6.3835,[485]6.3818,[486]6.3763,[487]6.3763,[488]6.3744,[489]6.3694,[490]6.3667,[491]6.3637,[492]6.3579,[493]6.3548,[494]6.3530,[495]6.3528,[496]6.3493,[497]6.3440,[498]6.3422,[499]6.3372,[500]6.3275,[501]6.3206,[502]6.3204,[503]6.3202,[504]6.3109,[505]6.3133,[506]6.3142,[507]6.3081,[508]6.3038,[509]6.3027,[510]6.3066,[511]6.3113,[512]6.3148,[513]6.3166,[514]6.3232,[515]6.3177,[516]6.3169,[517]6.3180,[518]6.3181,[519]6.3211,[520]6.3238,[521]6.3255,[522]6.3283,[523]6.3294,[524]6.3357,[525]6.3393,[526]6.3405,[527]6.3426,[528]6.3372,[529]6.3376,[530]6.3329,[531]6.3319,[532]6.3367,[533]6.3390,[534]6.3372,[535]6.3395,[536]6.3341,[537]6.3318,[538]6.3366,[539]6.3378,[540]6.3417,[541]6.3426,[542]6.3433,[543]6.3447,[544]6.3459,[545]6.3437,[546]6.3443,[547]6.3398,[548]6.3343,[549]6.3345,[550]6.3318,[551]6.3280,[552]6.3260,[553]6.3217,[554]6.3195,[555]6.3166,[556]6.3163,[557]6.3186,[558]6.3146,[559]6.3142,[560]6.3137,[561]6.3139,[562]6.3120,[563]6.3120,[564]6.3163,[565]6.3180,[566]6.3177,[567]6.3155,[568]6.3160,[569]6.3144,[570]6.3170,[571]6.3176,[572]6.3186,[573]6.3188,[574]6.3151,[575]6.3147,[576]6.3145,[577]6.3135,[578]6.3114,[579]6.3122,[580]6.3056,[581]6.3018,[582]6.3008,[583]6.3016,[584]6.3020,[585]6.2943,[586]6.2875,[587]6.2877,[588]6.2927,[589]6.2985,[590]6.3015,[591]6.3037,[592]6.3022,[593]6.2985,[594]6.2996,[595]6.2973,[596]6.3010,[597]6.2987,[598]6.2949,[599]6.2971,[600]6.2969,[601]6.2954,[602]6.2972,[603]6.3001,[604]6.3012,[605]6.3044,[606]6.3065,[607]6.3048,[608]6.3013,[609]6.3019,[610]6.3056,[611]6.3037,[612]6.3062,[613]6.3026,[614]6.2975,[615]6.2898,[616]6.2928,[617]6.2865,[618]6.2814,[619]6.2757,[620]6.2615,[621]6.2542,[622]6.2525,[623]6.2540,[624]6.2545,[625]6.2544,[626]6.2529,[627]6.2550,[628]6.2555,[629]6.2552,[630]6.2586,[631]6.2650,[632]6.2704,[633]6.2687,[634]6.2721,[635]6.2726,[636]6.2694,[637]6.2659,[638]6.2686,[639]6.2657,[640]6.2667,[641]6.2669,[642]6.2738,[643]6.2760,[644]6.2772,[645]6.2751,[646]6.2793,[647]6.2755,[648]6.2762,[649]6.2763,[650]6.2801,[651]6.2858,[652]6.2865,[653]6.2908,[654]6.2844,[655]6.2838,

llama_print_timings:        load time = 11216.03 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 4989892.61 ms / 335360 tokens (   14.88 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 5024616.43 ms

real	83m45.024s
user	126m54.284s
sys	4m10.884s
Q4_0 + F16 "output" M1 Pro (with BLAS) [655]6.2355
$  make clean && make -j perplexity && time ./perplexity -m ./models/7B/ggml-model-q4_0-output-f16.bin -f ./build/wiki.test.raw -t 8
I llama.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.3 (clang-1403.0.22.14.1)
I CXX:      Apple clang version 14.0.3 (clang-1403.0.22.14.1)

rm -vf *.o main quantize quantize-stats perplexity embedding benchmark-q4_0-matmult
common.o
ggml.o
llama.o
perplexity
I llama.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.3 (clang-1403.0.22.14.1)
I CXX:      Apple clang version 14.0.3 (clang-1403.0.22.14.1)

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE   -c ggml.c -o ggml.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -c llama.cpp -o llama.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -c examples/common.cpp -o common.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread examples/perplexity/perplexity.cpp ggml.o llama.o common.o -o perplexity  -framework Accelerate
main: seed = 1681643028
llama.cpp: loading model from ./models/7B/ggml-model-q4_0-output-f16.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  59.11 KB
llama_model_load_internal: mem required  = 5981.20 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 
perplexity : calculating perplexity over 655 chunks, batch_size=512
8.00 seconds per pass - ETA 1.46 hours
[1]4.3820,[2]4.9320,[3]5.7997,[4]6.4555,[5]6.5526,[6]6.5317,[7]6.6993,[8]6.7881,[9]7.1519,[10]7.3934,[11]7.6302,[12]7.6638,[13]7.5714,[14]7.6539,[15]7.9047,[16]7.5068,[17]7.3838,[18]7.3480,[19]6.9755,[20]6.9635,[21]6.8675,[22]6.6824,[23]6.6426,[24]6.5537,[25]6.5529,[26]6.3839,[27]6.2037,[28]6.1026,[29]6.0173,[30]5.8628,[31]5.8366,[32]5.8546,[33]5.7921,[34]5.8260,[35]5.8504,[36]5.8945,[37]5.9009,[38]5.9162,[39]5.9532,[40]6.0109,[41]6.0177,[42]6.0521,[43]6.0101,[44]6.0655,[45]6.0683,[46]6.0426,[47]6.0653,[48]6.0340,[49]6.0399,[50]6.0009,[51]5.9970,[52]5.9861,[53]6.0296,[54]6.0120,[55]5.9897,[56]6.0228,[57]6.0441,[58]6.0658,[59]6.0792,[60]6.1234,[61]6.1133,[62]6.1763,[63]6.2096,[64]6.2248,[65]6.2699,[66]6.2788,[67]6.2958,[68]6.3095,[69]6.3344,[70]6.3670,[71]6.3873,[72]6.4174,[73]6.4828,[74]6.4876,[75]6.5007,[76]6.5153,[77]6.5276,[78]6.5126,[79]6.5407,[80]6.5326,[81]6.5463,[82]6.5502,[83]6.4981,[84]6.4830,[85]6.4714,[86]6.4496,[87]6.3847,[88]6.3565,[89]6.3362,[90]6.3209,[91]6.3459,[92]6.3416,[93]6.3440,[94]6.3410,[95]6.3697,[96]6.3678,[97]6.3612,[98]6.3548,[99]6.3409,[100]6.3413,[101]6.3673,[102]6.3601,[103]6.3811,[104]6.3873,[105]6.3867,[106]6.4034,[107]6.4011,[108]6.4136,[109]6.4074,[110]6.4026,[111]6.4245,[112]6.4435,[113]6.4448,[114]6.4416,[115]6.4494,[116]6.4420,[117]6.4473,[118]6.4765,[119]6.4978,[120]6.5334,[121]6.5496,[122]6.5749,[123]6.6139,[124]6.6311,[125]6.6222,[126]6.6614,[127]6.6977,[128]6.7255,[129]6.7096,[130]6.7191,[131]6.7134,[132]6.7047,[133]6.6922,[134]6.7029,[135]6.6991,[136]6.6870,[137]6.6790,[138]6.6625,[139]6.6515,[140]6.6482,[141]6.6195,[142]6.6150,[143]6.5875,[144]6.5680,[145]6.5592,[146]6.5460,[147]6.5535,[148]6.5555,[149]6.5491,^[[B^[[B^[[B^[[B^[[B^[[B[150]6.5451,[151]6.5464,[152]6.5363,[153]6.5191,[154]6.5099,[155]6.5168,[156]6.5117,[157]6.5297,[158]6.5331,[159]6.5372,[160]6.5394,[161]6.5517,[162]6.5219,[163]6.5102,[164]6.4844,[165]6.4525,[166]6.4245,[167]6.3876,[168]6.3551,[169]6.3419,[170]6.3298,[171]6.3012,[172]6.2834,[173]6.2650,[174]6.2349,[175]6.2133,[176]6.2031,[177]6.1820,[178]6.1591,[179]6.1420,[180]6.1333,[181]6.1111,[182]6.0922,[183]6.0782,[184]6.0781,[185]6.0709,[186]6.0727,[187]6.0783,[188]6.0745,[189]6.0930,[190]6.0941,[191]6.1149,[192]6.1311,[193]6.1486,[194]6.1601,[195]6.1811,[196]6.1974,[197]6.2193,[198]6.2349,[199]6.2384,[200]6.2431,[201]6.2386,[202]6.2590,[203]6.2657,[204]6.2656,[205]6.2766,[206]6.2843,[207]6.2805,[208]6.2886,[209]6.2935,[210]6.2989,[211]6.3091,[212]6.3164,[213]6.3269,[214]6.3305,[215]6.3342,[216]6.3496,[217]6.3675,[218]6.3807,[219]6.3809,[220]6.3774,[221]6.3711,[222]6.3680,[223]6.3574,[224]6.3502,[225]6.3459,[226]6.3671,[227]6.3756,[228]6.3817,[229]6.3879,[230]6.3838,[231]6.4007,[232]6.3874,[233]6.3703,[234]6.3544,[235]6.3387,[236]6.3314,[237]6.3205,[238]6.3237,[239]6.3076,[240]6.2971,[241]6.3001,[242]6.3037,[243]6.3022,[244]6.2904,[245]6.2879,[246]6.2758,[247]6.2636,[248]6.2570,[249]6.2548,[250]6.2590,[251]6.2516,[252]6.2482,[253]6.2383,[254]6.2341,[255]6.2226,[256]6.2035,[257]6.1923,[258]6.1839,[259]6.1820,[260]6.1740,[261]6.1699,[262]6.1641,[263]6.1597,[264]6.1406,[265]6.1398,^[[B^[[B[266]6.1385,^R
[267]6.1316,[268]6.1408,[269]6.1385,[270]6.1393,[271]6.1470,[272]6.1509,[273]6.1505,[274]6.1521,[275]6.1611,[276]6.1664,[277]6.1824,[278]6.1929,[279]6.2015,[280]6.2048,[281]6.2142,[282]6.2199,[283]6.2342,[284]6.2422,[285]6.2512,[286]6.2658,[287]6.2654,[288]6.2714,[289]6.2624,[290]6.2471,[291]6.2316,[292]6.2164,[293]6.2029,[294]6.2052,[295]6.2047,[296]6.2090,[297]6.2076,[298]6.2104,[299]6.2073,[300]6.1962,[301]6.1960,[302]6.1882,[303]6.1805,[304]6.1727,[305]6.1703,[306]6.1573,[307]6.1594,[308]6.1631,[309]6.1469,[310]6.1408,[311]6.1346,[312]6.1368,[313]6.1314,[314]6.1301,[315]6.1137,[316]6.1092,[317]6.0928,[318]6.0714,[319]6.0833,[320]6.0960,[321]6.0998,[322]6.0953,[323]6.0888,[324]6.0866,[325]6.0963,[326]6.0964,[327]6.0985,[328]6.1025,[329]6.1084,[330]6.1112,[331]6.1233,[332]6.1204,[333]6.1275,[334]6.1218,[335]6.1153,[336]6.1190,[337]6.1161,[338]6.1149,[339]6.1091,[340]6.1047,[341]6.1127,[342]6.1152,[343]6.1207,[344]6.1209,[345]6.1209,[346]6.1181,[347]6.1226,[348]6.1266,[349]6.1285,[350]6.1251,[351]6.1258,[352]6.1258,[353]6.1204,[354]6.1204,[355]6.1256,[356]6.1286,[357]6.1249,[358]6.1337,[359]6.1366,[360]6.1329,[361]6.1324,[362]6.1391,[363]6.1503,[364]6.1564,[365]6.1621,[366]6.1629,[367]6.1715,[368]6.1690,[369]6.1699,[370]6.1713,[371]6.1655,[372]6.1705,[373]6.1760,[374]6.1746,[375]6.1742,[376]6.1823,[377]6.1774,[378]6.1798,[379]6.1859,[380]6.1776,[381]6.1735,[382]6.1681,[383]6.1671,[384]6.1664,[385]6.1653,[386]6.1647,[387]6.1640,[388]6.1596,[389]6.1542,[390]6.1473,[391]6.1394,[392]6.1355,[393]6.1340,[394]6.1364,[395]6.1347,[396]6.1273,[397]6.1348,[398]6.1387,[399]6.1469,[400]6.1465,[401]6.1481,[402]6.1487,[403]6.1504,[404]6.1567,[405]6.1474,[406]6.1442,[407]6.1434,[408]6.1446,[409]6.1569,[410]6.1678,[411]6.1800,[412]6.1962,[413]6.2076,[414]6.2152,[415]6.2203,[416]6.2281,[417]6.2409,[418]6.2445,[419]6.2519,[420]6.2605,[421]6.2724,[422]6.2779,[423]6.2848,[424]6.2968,[425]6.3056,[426]6.3121,[427]6.3166,[428]6.3249,[429]6.3297,[430]6.3385,[431]6.3528,[432]6.3569,[433]6.3558,[434]6.3512,[435]6.3519,[436]6.3545,[437]6.3639,[438]6.3717,[439]6.3684,[440]6.3674,[441]6.3625,[442]6.3614,[443]6.3627,[444]6.3630,[445]6.3610,[446]6.3632,[447]6.3660,[448]6.3703,[449]6.3677,[450]6.3680,[451]6.3638,[452]6.3522,[453]6.3438,[454]6.3377,[455]6.3386,[456]6.3433,[457]6.3450,[458]6.3429,[459]6.3436,[460]6.3522,[461]6.3495,[462]6.3479,[463]6.3529,[464]6.3519,[465]6.3489,[466]6.3410,[467]6.3414,[468]6.3413,[469]6.3435,[470]6.3440,[471]6.3390,[472]6.3435,[473]6.3379,[474]6.3393,[475]6.3335,[476]6.3353,[477]6.3282,[478]6.3274,[479]6.3338,[480]6.3389,[481]6.3408,[482]6.3364,[483]6.3322,[484]6.3343,[485]6.3329,[486]6.3274,[487]6.3274,[488]6.3255,[489]6.3205,[490]6.3179,[491]6.3145,[492]6.3086,[493]6.3057,[494]6.3041,[495]6.3037,[496]6.3003,[497]6.2949,[498]6.2930,[499]6.2881,[500]6.2787,[501]6.2720,[502]6.2719,[503]6.2714,[504]6.2622,[505]6.2649,[506]6.2658,[507]6.2600,[508]6.2560,[509]6.2550,[510]6.2591,[511]6.2636,[512]6.2669,[513]6.2686,[514]6.2751,[515]6.2697,[516]6.2690,[517]6.2701,[518]6.2702,[519]6.2731,[520]6.2760,[521]6.2776,[522]6.2806,[523]6.2815,[524]6.2875,[525]6.2910,[526]6.2922,[527]6.2943,[528]6.2891,[529]6.2897,[530]6.2849,[531]6.2839,[532]6.2887,[533]6.2910,[534]6.2893,[535]6.2916,[536]6.2861,[537]6.2838,[538]6.2888,[539]6.2900,[540]6.2940,[541]6.2948,[542]6.2956,[543]6.2970,[544]6.2981,[545]6.2960,[546]6.2967,[547]6.2922,[548]6.2868,[549]6.2868,[550]6.2841,[551]6.2805,[552]6.2784,[553]6.2743,[554]6.2720,[555]6.2691,[556]6.2689,[557]6.2712,[558]6.2674,[559]6.2667,[560]6.2662,[561]6.2662,[562]6.2640,[563]6.2640,[564]6.2682,[565]6.2698,[566]6.2696,[567]6.2675,[568]6.2679,[569]6.2663,[570]6.2689,[571]6.2694,[572]6.2703,[573]6.2703,[574]6.2667,[575]6.2664,[576]6.2661,[577]6.2651,[578]6.2630,[579]6.2639,[580]6.2572,[581]6.2534,[582]6.2524,[583]6.2531,[584]6.2534,[585]6.2457,[586]6.2390,[587]6.2393,[588]6.2442,[589]6.2499,[590]6.2530,[591]6.2549,[592]6.2535,[593]6.2501,[594]6.2511,[595]6.2488,[596]6.2525,[597]6.2503,[598]6.2466,[599]6.2486,[600]6.2482,[601]6.2469,[602]6.2486,[603]6.2516,[604]6.2526,[605]6.2557,[606]6.2576,[607]6.2561,[608]6.2527,[609]6.2531,[610]6.2568,[611]6.2549,[612]6.2574,[613]6.2537,[614]6.2487,[615]6.2411,[616]6.2440,[617]6.2379,[618]6.2330,[619]6.2274,[620]6.2133,[621]6.2061,[622]6.2043,[623]6.2058,[624]6.2062,[625]6.2061,[626]6.2047,[627]6.2068,[628]6.2073,[629]6.2069,[630]6.2102,[631]6.2166,[632]6.2219,[633]6.2203,[634]6.2238,[635]6.2244,[636]6.2212,[637]6.2179,[638]6.2206,[639]6.2176,[640]6.2187,[641]6.2189,[642]6.2256,[643]6.2279,[644]6.2290,[645]6.2271,[646]6.2314,[647]6.2275,[648]6.2282,[649]6.2282,[650]6.2320,[651]6.2377,[652]6.2384,[653]6.2426,[654]6.2361,[655]6.2355,

llama_print_timings:        load time =  8543.28 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 4971705.35 ms / 335360 tokens (   14.82 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 5006225.35 ms

real	83m26.348s
user	126m29.113s
sys	4m14.981s

Perplexity delta: -0.0542

Q4_0 + F16 "tok_embd" M1 Pro (with BLAS) [655]6.2838
make clean && make -j perplexity && time ./perplexity -m ./models/7B/ggml-model-q4_0-tok-f16.bin -f ./build/wiki.test.raw -t 8
I llama.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.3 (clang-1403.0.22.14.1)
I CXX:      Apple clang version 14.0.3 (clang-1403.0.22.14.1)
--------
rm -vf *.o main quantize quantize-stats perplexity embedding benchmark-q4_0-matmult
common.o
ggml.o
llama.o
perplexity
I llama.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.3 (clang-1403.0.22.14.1)
I CXX:      Apple clang version 14.0.3 (clang-1403.0.22.14.1)

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE   -c ggml.c -o ggml.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -c llama.cpp -o llama.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -c examples/common.cpp -o common.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread examples/perplexity/perplexity.cpp ggml.o llama.o common.o -o perplexity  -framework Accelerate
main: seed = 1681660693
llama.cpp: loading model from ./models/7B/ggml-model-q4_0-tok-f16.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  59.11 KB
llama_model_load_internal: mem required  = 5981.20 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 
perplexity : calculating perplexity over 655 chunks, batch_size=512
7.95 seconds per pass - ETA 1.45 hours
[1]4.3739,[2]4.9529,[3]5.8213,[4]6.4643,[5]6.5415,[6]6.5390,[7]6.7169,[8]6.8060,[9]7.1773,[10]7.4132,[11]7.6578,[12]7.6972,[13]7.6086,[14]7.6865,[15]7.9420,[16]7.5476,[17]7.4239,[18]7.3840,[19]7.0114,[20]6.9981,[21]6.9015,[22]6.7166,[23]6.6774,[24]6.5905,[25]6.5920,[26]6.4192,[27]6.2383,[28]6.1364,[29]6.0522,[30]5.8955,[31]5.8674,[32]5.8854,[33]5.8205,[34]5.8555,[35]5.8813,[36]5.9244,[37]5.9287,[38]5.9455,[39]5.9833,[40]6.0422,[41]6.0498,[42]6.0843,[43]6.0413,[44]6.0960,[45]6.1002,[46]6.0737,[47]6.0975,[48]6.0685,[49]6.0755,[50]6.0361,[51]6.0320,[52]6.0210,[53]6.0650,[54]6.0485,[55]6.0261,[56]6.0603,[57]6.0835,[58]6.1052,[59]6.1190,[60]6.1657,[61]6.1545,[62]6.2175,[63]6.2511,[64]6.2662,[65]6.3128,[66]6.3228,[67]6.3410,[68]6.3551,[69]6.3804,[70]6.4129,[71]6.4345,[72]6.4645,[73]6.5293,[74]6.5346,[75]6.5489,[76]6.5648,[77]6.5779,[78]6.5630,[79]6.5924,[80]6.5849,[81]6.5977,[82]6.6011,[83]6.5474,[84]6.5327,[85]6.5212,[86]6.5001,[87]6.4346,[88]6.4062,[89]6.3857,[90]6.3691,[91]6.3952,[92]6.3912,[93]6.3939,[94]6.3915,[95]6.4203,[96]6.4183,[97]6.4113,[98]6.4044,[99]6.3902,[100]6.3903,[101]6.4162,[102]6.4100,[103]6.4319,[104]6.4385,[105]6.4372,[106]6.4550,[107]6.4537,[108]6.4665,[109]6.4612,[110]6.4567,[111]6.4796,[112]6.4986,[113]6.5000,[114]6.4966,[115]6.5048,[116]6.4974,[117]6.5029,[118]6.5313,[119]6.5523,[120]6.5888,[121]6.6050,[122]6.6297,[123]6.6688,[124]6.6867,[125]6.6778,[126]6.7170,[127]6.7540,[128]6.7816,[129]6.7644,[130]6.7740,[131]6.7687,[132]6.7600,[133]6.7470,[134]6.7581,[135]6.7546,[136]6.7414,[137]6.7333,[138]6.7160,[139]6.7043,[140]6.7012,[141]6.6713,[142]6.6666,[143]6.6387,[144]6.6187,[145]6.6100,[146]6.5964,[147]6.6037,[148]6.6060,[149]6.5999,[150]6.5957,[151]6.5970,[152]6.5875,[153]6.5708,[154]6.5616,[155]6.5684,[156]6.5634,[157]6.5818,[158]6.5855,[159]6.5897,[160]6.5923,[161]6.6047,[162]6.5744,[163]6.5625,[164]6.5363,[165]6.5045,[166]6.4757,[167]6.4382,[168]6.4056,[169]6.3921,[170]6.3796,[171]6.3507,[172]6.3326,[173]6.3140,[174]6.2833,[175]6.2612,[176]6.2510,[177]6.2299,[178]6.2064,[179]6.1893,[180]6.1803,[181]6.1579,[182]6.1388,[183]6.1245,[184]6.1242,[185]6.1170,[186]6.1187,[187]6.1241,[188]6.1205,[189]6.1390,[190]6.1398,[191]6.1601,[192]6.1765,[193]6.1943,[194]6.2060,[195]6.2269,[196]6.2439,[197]6.2659,[198]6.2815,[199]6.2843,[200]6.2889,[201]6.2848,[202]6.3055,[203]6.3122,[204]6.3121,[205]6.3231,[206]6.3309,[207]6.3268,[208]6.3353,[209]6.3404,[210]6.3455,[211]6.3550,[212]6.3623,[213]6.3731,[214]6.3768,[215]6.3809,[216]6.3957,[217]6.4137,[218]6.4273,[219]6.4276,[220]6.4240,[221]6.4179,[222]6.4145,[223]6.4036,[224]6.3969,[225]6.3922,[226]6.4136,[227]6.4223,[228]6.4281,[229]6.4348,[230]6.4305,[231]6.4474,[232]6.4344,[233]6.4172,[234]6.4016,[235]6.3858,[236]6.3780,[237]6.3676,[238]6.3710,[239]6.3548,[240]6.3445,[241]6.3477,[242]6.3515,[243]6.3499,[244]6.3380,[245]6.3354,[246]6.3233,[247]6.3108,[248]6.3040,[249]6.3020,[250]6.3067,[251]6.2991,[252]6.2957,[253]6.2855,[254]6.2814,[255]6.2698,[256]6.2507,[257]6.2396,[258]6.2310,[259]6.2289,[260]6.2208,[261]6.2165,[262]6.2106,[263]6.2061,[264]6.1868,[265]6.1861,[266]6.1845,[267]6.1776,[268]6.1874,[269]6.1855,[270]6.1861,[271]6.1940,[272]6.1986,[273]6.1981,[274]6.1995,[275]6.2085,[276]6.2140,[277]6.2300,[278]6.2408,[279]6.2495,[280]6.2530,[281]6.2629,[282]6.2689,[283]6.2836,[284]6.2914,[285]6.3008,[286]6.3155,[287]6.3148,[288]6.3208,[289]6.3116,[290]6.2965,[291]6.2811,[292]6.2652,[293]6.2514,[294]6.2539,[295]6.2533,[296]6.2575,[297]6.2561,[298]6.2587,[299]6.2559,[300]6.2446,[301]6.2448,[302]6.2367,[303]6.2290,[304]6.2211,[305]6.2187,[306]6.2055,[307]6.2078,[308]6.2111,[309]6.1947,[310]6.1887,[311]6.1823,[312]6.1845,[313]6.1790,[314]6.1777,[315]6.1612,[316]6.1569,[317]6.1402,[318]6.1186,[319]6.1304,[320]6.1436,[321]6.1474,[322]6.1431,[323]6.1365,[324]6.1340,[325]6.1439,[326]6.1439,[327]6.1460,[328]6.1503,[329]6.1564,[330]6.1589,[331]6.1712,[332]6.1682,[333]6.1751,[334]6.1693,[335]6.1628,[336]6.1665,[337]6.1635,[338]6.1623,[339]6.1565,[340]6.1521,[341]6.1598,[342]6.1623,[343]6.1678,[344]6.1677,[345]6.1676,[346]6.1647,[347]6.1695,[348]6.1736,[349]6.1755,[350]6.1721,[351]6.1726,[352]6.1726,[353]6.1674,[354]6.1673,[355]6.1727,[356]6.1757,[357]6.1721,[358]6.1812,[359]6.1842,[360]6.1804,[361]6.1800,[362]6.1867,[363]6.1979,[364]6.2044,[365]6.2102,[366]6.2109,[367]6.2196,[368]6.2174,[369]6.2183,[370]6.2194,[371]6.2134,[372]6.2186,[373]6.2242,[374]6.2229,[375]6.2225,[376]6.2309,[377]6.2260,[378]6.2285,[379]6.2345,[380]6.2261,[381]6.2219,[382]6.2161,[383]6.2151,[384]6.2144,[385]6.2132,[386]6.2126,[387]6.2118,[388]6.2073,[389]6.2019,[390]6.1949,[391]6.1869,[392]6.1828,[393]6.1809,[394]6.1835,[395]6.1819,[396]6.1745,[397]6.1821,[398]6.1859,[399]6.1942,[400]6.1937,[401]6.1951,[402]6.1957,[403]6.1976,[404]6.2039,[405]6.1943,[406]6.1909,[407]6.1902,[408]6.1912,[409]6.2035,[410]6.2145,[411]6.2269,[412]6.2432,[413]6.2546,[414]6.2622,[415]6.2674,[416]6.2755,[417]6.2886,[418]6.2921,[419]6.2994,[420]6.3081,[421]6.3202,[422]6.3259,[423]6.3331,[424]6.3451,[425]6.3542,[426]6.3606,[427]6.3651,[428]6.3734,[429]6.3779,[430]6.3869,[431]6.4015,[432]6.4059,[433]6.4045,[434]6.3999,[435]6.4006,[436]6.4030,[437]6.4124,[438]6.4203,[439]6.4167,[440]6.4161,[441]6.4111,[442]6.4102,[443]6.4115,[444]6.4118,[445]6.4098,[446]6.4122,[447]6.4151,[448]6.4194,[449]6.4167,[450]6.4170,[451]6.4127,[452]6.4009,[453]6.3925,[454]6.3865,[455]6.3872,[456]6.3919,[457]6.3936,[458]6.3914,[459]6.3924,[460]6.4011,[461]6.3982,[462]6.3966,[463]6.4018,[464]6.4009,[465]6.3978,[466]6.3898,[467]6.3901,[468]6.3900,[469]6.3922,[470]6.3927,[471]6.3879,[472]6.3925,[473]6.3869,[474]6.3883,[475]6.3824,[476]6.3848,[477]6.3776,[478]6.3767,[479]6.3830,[480]6.3882,[481]6.3902,[482]6.3856,[483]6.3815,[484]6.3838,[485]6.3821,[486]6.3765,[487]6.3765,[488]6.3747,[489]6.3697,[490]6.3670,[491]6.3640,[492]6.3582,[493]6.3552,[494]6.3534,[495]6.3532,[496]6.3497,[497]6.3444,[498]6.3426,[499]6.3376,[500]6.3279,[501]6.3209,[502]6.3208,[503]6.3206,[504]6.3112,[505]6.3138,[506]6.3147,[507]6.3086,[508]6.3042,[509]6.3031,[510]6.3070,[511]6.3116,[512]6.3152,[513]6.3170,[514]6.3236,[515]6.3181,[516]6.3173,[517]6.3184,[518]6.3185,[519]6.3215,[520]6.3242,[521]6.3258,[522]6.3287,[523]6.3297,[524]6.3360,[525]6.3397,[526]6.3408,[527]6.3428,[528]6.3375,[529]6.3379,[530]6.3331,[531]6.3321,[532]6.3370,[533]6.3393,[534]6.3375,[535]6.3399,[536]6.3345,[537]6.3321,[538]6.3369,[539]6.3381,[540]6.3420,[541]6.3429,[542]6.3436,[543]6.3450,[544]6.3462,[545]6.3440,[546]6.3446,[547]6.3402,[548]6.3347,[549]6.3348,[550]6.3321,[551]6.3283,[552]6.3262,[553]6.3220,[554]6.3197,[555]6.3168,[556]6.3165,[557]6.3188,[558]6.3148,[559]6.3143,[560]6.3138,[561]6.3140,[562]6.3121,[563]6.3121,[564]6.3164,[565]6.3181,[566]6.3178,[567]6.3156,[568]6.3161,[569]6.3145,[570]6.3171,[571]6.3177,[572]6.3187,[573]6.3189,[574]6.3152,[575]6.3148,[576]6.3147,[577]6.3136,[578]6.3115,[579]6.3123,[580]6.3056,[581]6.3018,[582]6.3009,[583]6.3016,[584]6.3020,[585]6.2944,[586]6.2876,[587]6.2878,[588]6.2928,[589]6.2985,[590]6.3015,[591]6.3037,[592]6.3022,[593]6.2985,[594]6.2996,[595]6.2973,[596]6.3010,[597]6.2987,[598]6.2949,[599]6.2971,[600]6.2968,[601]6.2953,[602]6.2971,[603]6.3001,[604]6.3011,[605]6.3044,[606]6.3065,[607]6.3048,[608]6.3013,[609]6.3019,[610]6.3055,[611]6.3037,[612]6.3062,[613]6.3026,[614]6.2975,[615]6.2898,[616]6.2927,[617]6.2865,[618]6.2813,[619]6.2757,[620]6.2614,[621]6.2542,[622]6.2525,[623]6.2540,[624]6.2544,[625]6.2543,[626]6.2529,[627]6.2550,[628]6.2555,[629]6.2552,[630]6.2586,[631]6.2650,[632]6.2704,[633]6.2686,[634]6.2720,[635]6.2726,[636]6.2694,[637]6.2659,[638]6.2686,[639]6.2657,[640]6.2666,[641]6.2669,[642]6.2738,[643]6.2759,[644]6.2772,[645]6.2750,[646]6.2793,[647]6.2755,[648]6.2761,[649]6.2762,[650]6.2801,[651]6.2858,[652]6.2865,[653]6.2908,[654]6.2844,[655]6.2838,

llama_print_timings:        load time =  8491.14 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 4938363.87 ms / 335360 tokens (   14.73 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 4971712.41 ms

real	82m51.839s
user	125m49.041s
sys	4m12.038s

Perplexity delta: -0.0059

Q4_0 + F16 "output" + F16 "tok_embd" M1 Pro (with BLAS) [655]6.2357
$  make clean && make -j perplexity && time ./perplexity -m ./models/7B/ggml-model-q4_0-output-f16-tok-f16.bin -f ./build/wiki.test.raw -t 8
I llama.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.3 (clang-1403.0.22.14.1)
I CXX:      Apple clang version 14.0.3 (clang-1403.0.22.14.1)
v1_0l:        97      106      128      151
rm -vf *.o main quantize quantize-stats perplexity embedding benchmark-q4_0-matmult
common.o
ggml.o
llama.o
perplexity
I llama.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  arm
I UNAME_M:  arm64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 14.0.3 (clang-1403.0.22.14.1)
I CXX:      Apple clang version 14.0.3 (clang-1403.0.22.14.1)

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -DGGML_USE_ACCELERATE   -c ggml.c -o ggml.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -c llama.cpp -o llama.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -c examples/common.cpp -o common.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread examples/perplexity/perplexity.cpp ggml.o llama.o common.o -o perplexity  -framework Accelerate
main: seed = 1681648653
llama.cpp: loading model from ./models/7B/ggml-model-q4_0-output-f16-tok-f16.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  59.11 KB
llama_model_load_internal: mem required  = 6153.07 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 
perplexity : calculating perplexity over 655 chunks, batch_size=512
15.48 seconds per pass - ETA 2.82 hours
[1]4.3765,[2]4.9302,[3]5.7950,[4]6.4520,[5]6.5518,[6]6.5306,[7]6.6998,[8]6.7882,[9]7.1546,[10]7.3953,[11]7.6318,[12]7.6658,[13]7.5747,[14]7.6587,[15]7.9104,[16]7.5129,[17]7.3892,[18]7.3527,[19]6.9798,[20]6.9674,[21]6.8726,[22]6.6872,[23]6.6463,[24]6.5580,[25]6.5584,[26]6.3886,[27]6.2075,[28]6.1053,[29]6.0200,[30]5.8647,[31]5.8386,[32]5.8566,[33]5.7941,[34]5.8282,[35]5.8526,[36]5.8961,[37]5.9029,[38]5.9178,[39]5.9545,[40]6.0124,[41]6.0197,[42]6.0541,[43]6.0122,[44]6.0675,[45]6.0701,[46]6.0438,[47]6.0666,[48]6.0354,[49]6.0414,[50]6.0023,[51]5.9984,[52]5.9875,[53]6.0309,[54]6.0133,[55]5.9911,[56]6.0240,[57]6.0455,[58]6.0671,[59]6.0803,[60]6.1247,[61]6.1146,[62]6.1776,[63]6.2109,[64]6.2261,[65]6.2713,[66]6.2799,[67]6.2971,[68]6.3109,[69]6.3362,[70]6.3690,[71]6.3895,[72]6.4198,[73]6.4849,[74]6.4895,[75]6.5025,[76]6.5167,[77]6.5289,[78]6.5140,[79]6.5421,[80]6.5340,[81]6.5476,[82]6.5512,[83]6.4990,[84]6.4838,[85]6.4722,[86]6.4504,[87]6.3854,[88]6.3571,[89]6.3370,[90]6.3217,[91]6.3466,[92]6.3422,[93]6.3448,[94]6.3418,[95]6.3706,[96]6.3687,[97]6.3623,[98]6.3560,[99]6.3420,[100]6.3425,[101]6.3685,[102]6.3614,[103]6.3825,[104]6.3886,[105]6.3881,[106]6.4049,[107]6.4026,[108]6.4156,[109]6.4094,[110]6.4046,[111]6.4266,[112]6.4455,[113]6.4468,[114]6.4436,[115]6.4513,[116]6.4439,[117]6.4491,[118]6.4783,[119]6.4997,[120]6.5353,[121]6.5514,[122]6.5767,[123]6.6158,[124]6.6331,[125]6.6241,[126]6.6633,[127]6.6996,[128]6.7275,[129]6.7114,[130]6.7209,[131]6.7152,[132]6.7066,[133]6.6939,[134]6.7044,[135]6.7006,[136]6.6885,[137]6.6804,[138]6.6637,[139]6.6526,[140]6.6492,[141]6.6204,[142]6.6161,[143]6.5886,[144]6.5692,[145]6.5603,[146]6.5471,[147]6.5544,[148]6.5564,[149]6.5499,[150]6.5458,[151]6.5472,[152]6.5370,[153]6.5199,[154]6.5106,[155]6.5175,[156]6.5124,[157]6.5304,[158]6.5340,[159]6.5381,[160]6.5404,[161]6.5526,[162]6.5227,[163]6.5111,[164]6.4853,[165]6.4534,[166]6.4254,[167]6.3884,[168]6.3559,[169]6.3427,[170]6.3306,[171]6.3019,[172]6.2840,[173]6.2657,[174]6.2357,[175]6.2140,[176]6.2038,[177]6.1827,[178]6.1598,[179]6.1428,[180]6.1340,[181]6.1119,[182]6.0930,[183]6.0791,[184]6.0789,[185]6.0717,[186]6.0734,[187]6.0791,[188]6.0753,[189]6.0939,[190]6.0949,[191]6.1156,[192]6.1319,[193]6.1494,[194]6.1610,[195]6.1818,[196]6.1981,[197]6.2201,[198]6.2356,[199]6.2390,[200]6.2437,[201]6.2393,[202]6.2599,[203]6.2666,[204]6.2665,[205]6.2775,[206]6.2852,[207]6.2814,[208]6.2895,[209]6.2944,[210]6.2998,[211]6.3097,[212]6.3169,[213]6.3276,[214]6.3313,[215]6.3351,[216]6.3505,[217]6.3684,[218]6.3818,[219]6.3821,[220]6.3785,[221]6.3725,[222]6.3694,[223]6.3587,[224]6.3516,[225]6.3472,[226]6.3684,[227]6.3769,[228]6.3829,[229]6.3891,[230]6.3850,[231]6.4020,[232]6.3888,[233]6.3716,[234]6.3558,[235]6.3401,[236]6.3328,[237]6.3219,[238]6.3252,[239]6.3091,[240]6.2985,[241]6.3015,[242]6.3051,[243]6.3035,[244]6.2918,[245]6.2893,[246]6.2772,[247]6.2648,[248]6.2582,[249]6.2559,[250]6.2602,[251]6.2529,[252]6.2495,[253]6.2396,[254]6.2354,[255]6.2239,[256]6.2048,[257]6.1935,[258]6.1852,[259]6.1833,[260]6.1753,[261]6.1711,[262]6.1654,[263]6.1610,[264]6.1418,[265]6.1411,[266]6.1397,[267]6.1329,[268]6.1421,[269]6.1399,[270]6.1406,[271]6.1484,[272]6.1524,[273]6.1519,[274]6.1535,[275]6.1624,[276]6.1677,[277]6.1837,[278]6.1943,[279]6.2029,[280]6.2061,[281]6.2156,[282]6.2212,[283]6.2356,[284]6.2435,[285]6.2525,[286]6.2670,[287]6.2665,[288]6.2726,[289]6.2635,[290]6.2482,[291]6.2327,[292]6.2175,[293]6.2039,[294]6.2063,[295]6.2057,[296]6.2100,[297]6.2086,[298]6.2113,[299]6.2083,[300]6.1971,[301]6.1970,[302]6.1892,[303]6.1815,[304]6.1736,[305]6.1713,[306]6.1582,[307]6.1603,[308]6.1640,[309]6.1478,[310]6.1417,[311]6.1355,[312]6.1377,[313]6.1324,[314]6.1310,[315]6.1147,[316]6.1102,[317]6.0938,[318]6.0723,[319]6.0841,[320]6.0970,[321]6.1008,[322]6.0964,[323]6.0899,[324]6.0876,[325]6.0974,[326]6.0974,[327]6.0995,[328]6.1037,[329]6.1096,[330]6.1124,[331]6.1245,[332]6.1216,[333]6.1287,[334]6.1230,[335]6.1165,[336]6.1202,[337]6.1173,[338]6.1161,[339]6.1103,[340]6.1059,[341]6.1138,[342]6.1163,[343]6.1218,[344]6.1220,[345]6.1220,[346]6.1191,[347]6.1236,[348]6.1276,[349]6.1296,[350]6.1262,[351]6.1269,[352]6.1268,[353]6.1215,[354]6.1214,[355]6.1267,[356]6.1296,[357]6.1259,[358]6.1347,[359]6.1376,[360]6.1340,[361]6.1334,[362]6.1401,[363]6.1513,[364]6.1575,[365]6.1632,[366]6.1639,[367]6.1725,[368]6.1701,[369]6.1710,[370]6.1723,[371]6.1665,[372]6.1715,[373]6.1770,[374]6.1756,[375]6.1751,[376]6.1832,[377]6.1783,[378]6.1807,[379]6.1868,[380]6.1785,[381]6.1744,[382]6.1690,[383]6.1679,[384]6.1672,[385]6.1661,[386]6.1656,[387]6.1649,[388]6.1604,[389]6.1550,[390]6.1481,[391]6.1402,[392]6.1363,[393]6.1348,[394]6.1373,[395]6.1356,[396]6.1281,[397]6.1356,[398]6.1396,[399]6.1478,[400]6.1473,[401]6.1489,[402]6.1495,[403]6.1512,[404]6.1575,[405]6.1482,[406]6.1450,[407]6.1442,[408]6.1454,[409]6.1576,[410]6.1686,[411]6.1807,[412]6.1968,[413]6.2082,[414]6.2157,[415]6.2208,[416]6.2287,[417]6.2415,[418]6.2451,[419]6.2525,[420]6.2611,[421]6.2730,[422]6.2785,[423]6.2854,[424]6.2974,[425]6.3062,[426]6.3127,[427]6.3172,[428]6.3255,[429]6.3302,[430]6.3390,[431]6.3533,[432]6.3575,[433]6.3564,[434]6.3518,[435]6.3524,[436]6.3550,[437]6.3643,[438]6.3721,[439]6.3688,[440]6.3678,[441]6.3629,[442]6.3618,[443]6.3631,[444]6.3634,[445]6.3615,[446]6.3636,[447]6.3664,[448]6.3708,[449]6.3681,[450]6.3685,[451]6.3642,[452]6.3527,[453]6.3442,[454]6.3381,[455]6.3390,[456]6.3436,[457]6.3454,[458]6.3432,[459]6.3439,[460]6.3525,[461]6.3498,[462]6.3482,[463]6.3532,[464]6.3522,[465]6.3493,[466]6.3414,[467]6.3418,[468]6.3417,[469]6.3439,[470]6.3444,[471]6.3394,[472]6.3440,[473]6.3383,[474]6.3398,[475]6.3339,[476]6.3358,[477]6.3287,[478]6.3279,[479]6.3343,[480]6.3393,[481]6.3412,[482]6.3368,[483]6.3326,[484]6.3347,[485]6.3333,[486]6.3277,[487]6.3278,[488]6.3259,[489]6.3210,[490]6.3183,[491]6.3150,[492]6.3091,[493]6.3062,[494]6.3046,[495]6.3042,[496]6.3009,[497]6.2954,[498]6.2935,[499]6.2886,[500]6.2792,[501]6.2724,[502]6.2724,[503]6.2718,[504]6.2627,[505]6.2654,[506]6.2664,[507]6.2605,[508]6.2565,[509]6.2556,[510]6.2596,[511]6.2640,[512]6.2675,[513]6.2692,[514]6.2757,[515]6.2703,[516]6.2695,[517]6.2707,[518]6.2707,[519]6.2737,[520]6.2765,[521]6.2781,[522]6.2811,[523]6.2820,[524]6.2880,[525]6.2915,[526]6.2926,[527]6.2947,[528]6.2896,[529]6.2901,[530]6.2853,[531]6.2843,[532]6.2891,[533]6.2914,[534]6.2897,[535]6.2921,[536]6.2866,[537]6.2843,[538]6.2892,[539]6.2905,[540]6.2944,[541]6.2952,[542]6.2960,[543]6.2974,[544]6.2985,[545]6.2965,[546]6.2971,[547]6.2926,[548]6.2873,[549]6.2873,[550]6.2846,[551]6.2809,[552]6.2788,[553]6.2747,[554]6.2723,[555]6.2694,[556]6.2692,[557]6.2716,[558]6.2676,[559]6.2670,[560]6.2665,[561]6.2665,[562]6.2643,[563]6.2642,[564]6.2685,[565]6.2701,[566]6.2698,[567]6.2677,[568]6.2682,[569]6.2666,[570]6.2691,[571]6.2696,[572]6.2705,[573]6.2706,[574]6.2669,[575]6.2666,[576]6.2664,[577]6.2653,[578]6.2632,[579]6.2640,[580]6.2574,[581]6.2536,[582]6.2526,[583]6.2532,[584]6.2536,[585]6.2459,[586]6.2392,[587]6.2395,[588]6.2444,[589]6.2501,[590]6.2531,[591]6.2551,[592]6.2537,[593]6.2502,[594]6.2512,[595]6.2489,[596]6.2526,[597]6.2504,[598]6.2467,[599]6.2487,[600]6.2483,[601]6.2470,[602]6.2487,[603]6.2517,[604]6.2526,[605]6.2558,[606]6.2577,[607]6.2562,[608]6.2528,[609]6.2533,[610]6.2569,[611]6.2550,[612]6.2575,[613]6.2538,[614]6.2488,[615]6.2412,[616]6.2441,[617]6.2380,[618]6.2330,[619]6.2275,[620]6.2134,[621]6.2062,[622]6.2044,[623]6.2060,[624]6.2063,[625]6.2062,[626]6.2048,[627]6.2070,[628]6.2075,[629]6.2071,[630]6.2104,[631]6.2166,[632]6.2220,[633]6.2204,[634]6.2239,[635]6.2245,[636]6.2214,[637]6.2181,[638]6.2207,[639]6.2178,[640]6.2188,[641]6.2190,[642]6.2258,[643]6.2280,[644]6.2291,[645]6.2272,[646]6.2315,[647]6.2276,[648]6.2283,[649]6.2283,[650]6.2321,[651]6.2379,[652]6.2386,[653]6.2427,[654]6.2362,[655]6.2357,

llama_print_timings:        load time = 16073.04 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 4977968.90 ms / 335360 tokens (   14.84 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 5012024.06 ms

real	83m32.235s
user	126m43.253s
sys	4m15.665s

Perplexity delta: -0.0540

M1 Pro results

tok_embd output Perplexity (with BLAS) Delta Size (MB)
Q4_0 Q4_0 6.2897 0.0000 3.9G
Q4_0 F16 6.2355 -0.0542 4.1G
F16 Q4_0 6.2838 -0.0059 4.1G
F16 F16 6.2357 -0.0540 4.3G
@ggerganov ggerganov added help wanted Extra attention is needed good first issue Good for newcomers generation quality Quality of model output labels Apr 15, 2023
@ggerganov ggerganov added the high priority Very important issue label Apr 15, 2023
@MarkSchmidty
Copy link

Leaving 1 or 2 layers fully uncompressed has very good results in the literature, from what I recall. This is a good idea to test.

@ggerganov
Copy link
Owner Author

Looks like the quantization of the output tensor has more significant impact on the quality compared to the tok_embeddings tensor (which is of the same size, but at the start of the transformer). It might be worth keeping output in F16 format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
generation quality Quality of model output good first issue Good for newcomers help wanted Extra attention is needed high priority Very important issue
Development

No branches or pull requests

2 participants