builds on jetson orin failure #2004

malv-c · 2023-06-26T13:44:50Z

both llama.cpp with : % cmake .. -DLLAMA_CUBLAS=ON -DLLAMA_CUDA_DMMV_F16=ON -DLLAMA_CUDA_DMMV_Y=16
and in koboldcpp with : % cmake .. -DLLAMA_CUBLAS=1
give : ggml.h(218): error: identifier "__fp16" is undefined

manbehindthemadness · 2023-06-26T15:00:22Z

I came here just for this:
Exact same problem on AGX Orin JP 5.1.1 L4T 35.3.1

/usr/src/llama.cpp/ggml.h(218): error: identifier "__fp16" is undefined

manbehindthemadness · 2023-06-26T16:29:38Z

Ahhhh, cortex 8+ processors no longer support neon, the library must be fully x64. They can support x32 but only when running within an x32 operating system / kernel.

manbehindthemadness · 2023-06-26T19:24:45Z

@malv-c If you replace __fp16 with uint16_t on line 218 of ggml.h the project builds and cuBLAS works without issue.

manbehindthemadness · 2023-06-26T21:55:42Z

Even though this successfully builds, it does seem to be attempting to use NEON, I am unsure if this will have a performance impact...

llama.cpp: loading model from /opt/gpt-models/vicuna-7b-1.1.ggmlv3.q8_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 7 (mostly Q8_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: using CUDA for GPU acceleration
llama_model_load_internal: mem required  = 1924.88 MB (+ 1026.00 MB per state)
llama_model_load_internal: allocating batch_size x 1 MB = 512 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 32 repeating layers to GPU
llama_model_load_internal: offloading non-repeating layers to GPU
llama_model_load_internal: offloading v cache to GPU
llama_model_load_internal: offloading k cache to GPU
llama_model_load_internal: offloaded 35/35 layers to GPU
llama_model_load_internal: total VRAM used: 8234 MB
llama_new_context_with_model: kv self size  =  256.00 MB
AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 
llama_print_timings:        load time =   593.09 ms

swittk · 2023-06-26T23:19:38Z

Does this thread help? #1455

manbehindthemadness · 2023-06-26T23:45:28Z

Oh! This here looks like it might be the silver bullet: #1455 (comment)

malv-c · 2023-06-27T06:27:56Z

thans Kevin Le lun. 26 juin 2023 à 21:24, Kevin Eales ***@***.***> a écrit :

…

@malv-c <https://github.com/malv-c> If you replace __fp16 with uint16_t on line 218 of ggml.h the project builds and cuBLAS works without issue. — Reply to this email directly, view it on GitHub <#2004 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AESIHJIZFLPGTIMX5FZKISTXNHOYTANCNFSM6AAAAAAZUGDJ4M> . You are receiving this because you were mentioned.Message ID: ***@***.***>

github-actions · 2024-04-10T01:06:46Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

KyL0N mentioned this issue Aug 19, 2023

ggml: support CUDA's half type for aarch64(#1455) #2670

Merged

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

builds on jetson orin failure #2004

builds on jetson orin failure #2004

malv-c commented Jun 26, 2023

manbehindthemadness commented Jun 26, 2023

manbehindthemadness commented Jun 26, 2023

manbehindthemadness commented Jun 26, 2023

manbehindthemadness commented Jun 26, 2023

swittk commented Jun 26, 2023

manbehindthemadness commented Jun 26, 2023

malv-c commented Jun 27, 2023 via email

github-actions bot commented Apr 10, 2024

builds on jetson orin failure #2004

builds on jetson orin failure #2004

Comments

malv-c commented Jun 26, 2023

manbehindthemadness commented Jun 26, 2023

manbehindthemadness commented Jun 26, 2023

manbehindthemadness commented Jun 26, 2023

manbehindthemadness commented Jun 26, 2023

swittk commented Jun 26, 2023

manbehindthemadness commented Jun 26, 2023

malv-c commented Jun 27, 2023 via email

github-actions bot commented Apr 10, 2024