Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Segmentation fault after #5411 #5469

Closed
qnixsynapse opened this issue Feb 13, 2024 · 30 comments · Fixed by #5624
Closed

[SYCL] Segmentation fault after #5411 #5469

qnixsynapse opened this issue Feb 13, 2024 · 30 comments · Fixed by #5624

Comments

@qnixsynapse
Copy link
Contributor

qnixsynapse commented Feb 13, 2024

System: Arch Linux,
CPU: Intel i3 12th gen
GPU: Intel Arc A750
RAM: 16GB

llama.cpp version: b2134

Previously the build was failing with -DLLAMA_SYCL_F16=ON which has been fixed in #5411. Upon running this build, it crashes with segmentation fault.

logs:

bin/main -m ~/Public/Models/Weights/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf  -p "hello " -n 1000 -ngl 99
Log start
main: build = 2134 (099afc62)
main: built with Intel(R) oneAPI DPC++/C++ Compiler 2024.0.0 (2024.0.0.20231017) for x86_64-unknown-linux-gnu
main: seed  = 1707789832
GGML_SYCL_DEBUG=0
ggml_init_sycl: GGML_SYCL_F16:   yes
ggml_init_sycl: SYCL_USE_XMX: yes
found 4 SYCL devices:
  Device 0: Intel(R) Arc(TM) A750 Graphics,	compute capability 1.3,
	max compute_units 448,	max work group size 1024,	max sub group size 32,	global mem size 8096681984
  Device 1: Intel(R) FPGA Emulation Device,	compute capability 1.2,
	max compute_units 4,	max work group size 67108864,	max sub group size 64,	global mem size 16577347584
  Device 2: 12th Gen Intel(R) Core(TM) i3-12100F,	compute capability 3.0,
	max compute_units 4,	max work group size 8192,	max sub group size 64,	global mem size 16577347584
  Device 3: Intel(R) Arc(TM) A750 Graphics,	compute capability 3.0,
	max compute_units 448,	max work group size 1024,	max sub group size 32,	global mem size 8096681984
Using device 0 (Intel(R) Arc(TM) A750 Graphics) as main device
llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /home/tensorblast/Public/Models/Weights/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = tinyllama_tinyllama-1.1b-chat-v1.0
llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv   4:                          llama.block_count u32              = 22
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5632
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 4
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 15
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                      tokenizer.ggml.merges arr[str,61249]   = ["▁ t", "e r", "i n", "▁ a", "e n...
llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 2
llama_model_loader: - kv  21:                    tokenizer.chat_template str              = {% for message in messages %}\n{% if m...
llama_model_loader: - kv  22:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   45 tensors
llama_model_loader: - type q4_K:  135 tensors
llama_model_loader: - type q6_K:   21 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 2048
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 4
llm_load_print_meta: n_layer          = 22
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_embd_head_k    = 64
llm_load_print_meta: n_embd_head_v    = 64
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: n_embd_k_gqa     = 256
llm_load_print_meta: n_embd_v_gqa     = 256
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 5632
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 2048
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 1B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 1.10 B
llm_load_print_meta: model size       = 636.18 MiB (4.85 BPW) 
llm_load_print_meta: general.name     = tinyllama_tinyllama-1.1b-chat-v1.0
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 2 '</s>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.15 MiB
llm_load_tensors: offloading 22 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 23/23 layers to GPU
llm_load_tensors:            buffer size =   601.02 MiB
llm_load_tensors:        CPU buffer size =    35.16 MiB
.....................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:            KV buffer size =    11.00 MiB
llama_new_context_with_model: KV self size  =   11.00 MiB, K (f16):    5.50 MiB, V (f16):    5.50 MiB
llama_new_context_with_model:        CPU input buffer size   =     5.01 MiB
zsh: segmentation fault (core dumped)  bin/main -m  -p "hello " -n

The build without -DLLAMA_SYCL_F16=ON works.

Confirmed: This crash started happening after #5411

@qnixsynapse qnixsynapse changed the title [SYCL] Segmentation fault with GGML_SYCL_F16 [SYCL] Segmentation fault after #5411 Feb 13, 2024
@chsasank
Copy link

Can confirm that I too got a segfault when built with DLLAMA_SYCL_F16=ON. I will rebuild with OFF and report if it fails too.

@chsasank
Copy link

Segfaults even without that option.

@abhilash1910
Copy link
Collaborator

@akarshanbiswas @chsasank could you please re-try with this branch : https://github.com/abhilash1910/llama.cpp/tree/fix_sycl_arc (branch: fix_sycl_arc) and let me know if this addresses the issue ?

@abhilash1910 abhilash1910 self-assigned this Feb 14, 2024
@qnixsynapse
Copy link
Contributor Author

qnixsynapse commented Feb 14, 2024

@abhilash1910 Nope. Still crashing with segmentation fault with or without -DLLAMA_SYCL_F16=ON.

Here is what I can get:

(gdb) bt
#0  0x000000000060cd14 in ggml_backend_sycl_buffer_type_name(ggml_backend_buffer_type*)
    ()
#1  0x000000000046e830 in llama_new_context_with_model ()
#2  0x000000000042f660 in llama_init_from_gpt_params(gpt_params&) ()
#3  0x000000000041c211 in main ()

mudler added a commit to mudler/LocalAI that referenced this issue Feb 14, 2024
sycl support is broken otherwise.

See upstream issue: ggerganov/llama.cpp#5469

Signed-off-by: Ettore Di Giacinto <[email protected]>
@mudler
Copy link
Contributor

mudler commented Feb 14, 2024

I can confirm here, JFYI pinning to commit f026f81 seems to work for me (tested with Intel Arc a770)

mudler added a commit to mudler/LocalAI that referenced this issue Feb 14, 2024
sycl support is broken otherwise.

See upstream issue: ggerganov/llama.cpp#5469

Signed-off-by: Ettore Di Giacinto <[email protected]>
@abhilash1910
Copy link
Collaborator

I can confirm here, JFYI pinning to commit f026f81 seems to work for me (tested with Intel Arc a770)

Thanks @mudler , could you please check if this commit works? 4a46d2b
This should help in the resolution quicker.

@mudler
Copy link
Contributor

mudler commented Feb 14, 2024

@abhilash1910 that commit fails here with a core-dump

@qnixsynapse
Copy link
Contributor Author

qnixsynapse commented Feb 14, 2024

Got better backtrace this time:

(gdb) bt
#0  ggml_backend_sycl_buffer_type_name (
    buft=0xcd5f20 <ggml_backend_sycl_buffer_type::ggml_backend_sycl_buffer_types>)
    at /home/tensorblast/Public/Models/llama.cpp/ggml-sycl.cpp:14765
#1  0x0000000000519869 in ggml_backend_buft_name (
    buft=0xcd5f20 <ggml_backend_sycl_buffer_type::ggml_backend_sycl_buffer_types>)
    at /home/tensorblast/Public/Models/llama.cpp/ggml-backend.c:19
#2  0x0000000000517613 in ggml_gallocr_reserve_n (galloc=0x36ad020, 
    graph=0x7fffa420efe0, node_buffer_ids=0x39a7fe0)
    at /home/tensorblast/Public/Models/llama.cpp/ggml-alloc.c:707
#3  0x000000000051bcf3 in ggml_backend_sched_reserve (sched=0x7fffa4200010, 
    measure_graph=0x7fffc0200030)
    at /home/tensorblast/Public/Models/llama.cpp/ggml-backend.c:1563
#4  0x0000000000470293 in llama_new_context_with_model (model=0x3669dd0, params=...)
    at /home/tensorblast/Public/Models/llama.cpp/llama.cpp:11461
#5  0x000000000042f9a0 in llama_init_from_gpt_params (params=...)
    at /home/tensorblast/Public/Models/llama.cpp/common/common.cpp:1300
#6  0x000000000041bce9 in main (argc=<optimized out>, argv=0x7fffffffcfb8)
    at /home/tensorblast/Public/Models/llama.cpp/examples/main/main.cpp:198
    

@abhilash1910
Copy link
Collaborator

Got better backtrace this time:

(gdb) bt
#0  ggml_backend_sycl_buffer_type_name (
    buft=0xcd5f20 <ggml_backend_sycl_buffer_type::ggml_backend_sycl_buffer_types>)
    at /home/tensorblast/Public/Models/llama.cpp/ggml-sycl.cpp:14765
#1  0x0000000000519869 in ggml_backend_buft_name (
    buft=0xcd5f20 <ggml_backend_sycl_buffer_type::ggml_backend_sycl_buffer_types>)
    at /home/tensorblast/Public/Models/llama.cpp/ggml-backend.c:19
#2  0x0000000000517613 in ggml_gallocr_reserve_n (galloc=0x36ad020, 
    graph=0x7fffa420efe0, node_buffer_ids=0x39a7fe0)
    at /home/tensorblast/Public/Models/llama.cpp/ggml-alloc.c:707
#3  0x000000000051bcf3 in ggml_backend_sched_reserve (sched=0x7fffa4200010, 
    measure_graph=0x7fffc0200030)
    at /home/tensorblast/Public/Models/llama.cpp/ggml-backend.c:1563
#4  0x0000000000470293 in llama_new_context_with_model (model=0x3669dd0, params=...)
    at /home/tensorblast/Public/Models/llama.cpp/llama.cpp:11461
#5  0x000000000042f9a0 in llama_init_from_gpt_params (params=...)
    at /home/tensorblast/Public/Models/llama.cpp/common/common.cpp:1300
#6  0x000000000041bce9 in main (argc=<optimized out>, argv=0x7fffffffcfb8)
    at /home/tensorblast/Public/Models/llama.cpp/examples/main/main.cpp:198
    

Thanks for the traceback. As @mudler confirmed that f026f81 seems to be building correctly which includes #5411 already. For the time being I would recommend rolling back to the commit until a fix is applied.

@qnixsynapse
Copy link
Contributor Author

@abhilash1910 Yes. It builds correctly but ends up in segfault with or without -DLLAMA_SYCL_F16=ON . I am using a build from commit before that which seems to work well without the SYCL_F16 enabled. For build with -DLLAMA_SYCL_F16=ON, the build fails with a compilation error before #5411 .

If you need any help in testing. Please do ping me.

@mudler
Copy link
Contributor

mudler commented Feb 14, 2024

@abhilash1910 Yes. It builds correctly but ends up in segfault with or without -DLLAMA_SYCL_F16=ON . I am using a build from commit before that which seems to work well without the SYCL_F16 enabled. For build with -DLLAMA_SYCL_F16=ON, the build fails with a compilation error before #5411 .

If you need any help in testing. Please do ping me.

That is quite weird. It actually works here, it does not only build. If you want to try to reproduce, this is the LocalAI container image having llama.cpp pinned at f026f81 : quay.io/go-skynet/local-ai@sha256:c6b5dfaff64c24a02f1be8f8e1cb5c0837b130b438753e49b349d70e3d6d1916 and it can do inferencing correctly. Note I'm testing this with an Intel Arc a770, so might be maybe related to that, however, using llama.cpp current commit in main fails with segfaults also on my Arc a770.

You can run phi-2 configured for sycl with (f32):

docker run -e DEBUG=true -ti -v $PWD/models:/build/models -p 8080:8080  -v /dev/dri:/dev/dri quay.io/go-skynet/local-ai@sha256:c6b5dfaff64c24a02f1be8f8e1cb5c0837b130b438753e49b349d70e3d6d1916 https://gist.githubusercontent.com/mudler/103de2576a8fd4b583f9bd53f4e4cefd/raw/9181d4add553326806b8fdbf4ff0cd65d2145bff/phi-2-sycl.yaml

to test it:

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
          "model": "phi-2",
          "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
      }'

To double-check the version you can run in the container:

cat /build/Makefile  | grep CPPLLAMA_VERSION 
CPPLLAMA_VERSION?=f026f8120f97090d34a52b3dc023c82e0ede3f7d

I am actually running this in kubernetes, any images from master are pinned to that commit, leaving also my deployment for reference:

apiVersion: v1
kind: Namespace
metadata:
  name: local-ai
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: models-pvc
  namespace: local-ai
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: local-ai
  namespace: local-ai
  labels:
    app: local-ai
spec:
  selector:
    matchLabels:
      app: local-ai
  replicas: 1
  template:
    metadata:
      labels:
        app: local-ai
      name: local-ai
    spec:
      containers:
        - env:
          - name: DEBUG
            value: "true"
          name: local-ai
          args:
          # phi-2 configuration
          - https://gist.githubusercontent.com/mudler/103de2576a8fd4b583f9bd53f4e4cefd/raw/9181d4add553326806b8fdbf4ff0cd65d2145bff/phi-2-sycl.yaml
          image: quay.io/go-skynet/local-ai:master-sycl-f32-core
          imagePullPolicy: Always
          resources:
            limits:
              gpu.intel.com/i915: 1
          volumeMounts:
            - name: models-volume
              mountPath: /build/models
      volumes:
        - name: models-volume
          persistentVolumeClaim:
            claimName: models-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: local-ai
  namespace: local-ai
spec:
  selector:
    app: local-ai
  type: NodePort
  ports:
    - protocol: TCP
      port: 8080
      targetPort: 8080

@abhilash1910
Copy link
Collaborator

@akarshanbiswas @chsasank could you please re-try with this branch : https://github.com/abhilash1910/llama.cpp/tree/fix_sycl_arc (branch: fix_sycl_arc) and let me know if this addresses the issue ?

@mudler could you please try with this branch and let me know if this fixes the segfault issue. If not then there are changes in ggml backend which may have caused this.

@qnixsynapse
Copy link
Contributor Author

qnixsynapse commented Feb 15, 2024

I'm testing this with an Intel Arc a770, so might be maybe related to that,

@mudler Yes it is possible. I tried running with -ngl 0 and it worked. The error seems to be happening what I deduce from the backtrace is that buft object or its associated context (ctx) is not properly initialized or contains invalid data which is leading to segfault. Honestly, my knowledge in these areas is rusty. @abhilash1910 dada may know better. :)

@channeladam
Copy link

I too am getting the segfault. I don't know how to help though... I can test things if there is something to try.

@abhilash1910
Copy link
Collaborator

@akarshanbiswas @mudler @channeladam Could you please try to build from latest master and see if that works ? Thanks

@qnixsynapse
Copy link
Contributor Author

@abhilash1910 Still fails at the same location:

image

@abhilash1910
Copy link
Collaborator

Thanks @akarshanbiswas , could you again try building with : https://github.com/abhilash1910/llama.cpp/tree/fix_sycl_arc

@qnixsynapse
Copy link
Contributor Author

@abhilash1910
image

@channeladam
Copy link

channeladam commented Feb 20, 2024

@abhilash1910

./server -m /mnt/data/llama/models/Magicoder-S-DS-6.7B_q8_0.gguf -ngl 41 -c 4096

core dump backtrace: (of your fork fix_sycl_arc)

(gdb) backtrace
#0  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_data (this=0x8)
    at /usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/basic_string.h:223
#1  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::c_str (this=0x8)
    at /usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/basic_string.h:2584
#2  ggml_backend_sycl_buffer_type_name (buft=0xdd02e0 <ggml_backend_sycl_buffer_type::ggml_backend_sycl_buffer_types>)
    at /mnt/data/llama/forks/llama.cpp/ggml-sycl.cpp:14765
#3  0x00000000005d2a19 in ggml_backend_buft_name (buft=0xdd02e0 <ggml_backend_sycl_buffer_type::ggml_backend_sycl_buffer_types>)
    at /mnt/data/llama/forks/llama.cpp/ggml-backend.c:19
#4  0x00000000005d07c3 in ggml_gallocr_reserve_n (galloc=0x46160c0, graph=0x7488ac0, node_buffer_ids=0x4634720)
    at /mnt/data/llama/forks/llama.cpp/ggml-alloc.c:707
#5  0x00000000005d4ea3 in ggml_backend_sched_reserve (sched=0x7479af0, measure_graph=0x7149a70) at /mnt/data/llama/forks/llama.cpp/ggml-backend.c:1564
#6  0x00000000005288f3 in llama_new_context_with_model (model=0x464cb50, params=...) at /mnt/data/llama/forks/llama.cpp/llama.cpp:11540
#7  0x00000000005026b0 in llama_init_from_gpt_params (params=...) at /mnt/data/llama/forks/llama.cpp/common/common.cpp:1328
#8  0x00000000004232d4 in llama_server_context::load_model (this=0x7fff1c3ae270, params_=...)
    at /mnt/data/llama/forks/llama.cpp/examples/server/server.cpp:377
#9  main (argc=<optimized out>, argv=<optimized out>) at /mnt/data/llama/forks/llama.cpp/examples/server/server.cpp:2735

@abhilash1910
Copy link
Collaborator

Thanks @channeladam @akarshanbiswas , requesting you to re-try the same branch if possible? Thanks

@qnixsynapse
Copy link
Contributor Author

@abhilash1910 This time the build failed

[ 42%] Building CXX object CMakeFiles/ggml.dir/ggml-sycl.cpp.o
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:1123:55: warning: cast from 'const void *' to 'unsigned char *' drops const qualifier [-Wcast-qual]
 1123 |                 auto it = m_map.upper_bound((byte_t *)ptr);
      |                                                       ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:3659:31: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'const int' [-Wsign-compare]
 3659 |     if (item_ct1.get_group(0) < ne02) { // src0
      |         ~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:3701:31: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'const int' [-Wsign-compare]
 3701 |         item_ct1.get_group(0) < ne02) {
      |         ~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:3700:46: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'const int' [-Wsign-compare]
 3700 |     if (nidx < ne00 && item_ct1.get_group(1) < ne01 &&
      |                        ~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8330:23: error: use of undeclared identifier 'nb1'
 8330 |     const size_t s1 = nb1 / ggml_element_size(dst);
      |                       ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8331:23: error: use of undeclared identifier 'nb2'
 8331 |     const size_t s2 = nb2 / ggml_element_size(dst);
      |                       ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8332:23: error: use of undeclared identifier 'nb3'
 8332 |     const size_t s3 = nb3 / ggml_element_size(dst);
      |                       ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8383:23: error: use of undeclared identifier 'nb1'
 8383 |     const size_t s1 = nb1 / ggml_element_size(dst);
      |                       ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8384:23: error: use of undeclared identifier 'nb2'
 8384 |     const size_t s2 = nb2 / ggml_element_size(dst);
      |                       ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8385:23: error: use of undeclared identifier 'nb3'
 8385 |     const size_t s3 = nb3 / ggml_element_size(dst);
      |                       ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8435:24: error: use of undeclared identifier 'ne0'; did you mean 'new'?
 8435 |         int nr0 = ne10/ne0;
      |                        ^~~
      |                        new
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8435:27: error: expected a type
 8435 |         int nr0 = ne10/ne0;
      |                           ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8436:24: error: use of undeclared identifier 'ne1'; did you mean 'new'?
 8436 |         int nr1 = ne11/ne1;
      |                        ^~~
      |                        new
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8436:27: error: expected a type
 8436 |         int nr1 = ne11/ne1;
      |                           ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8437:24: error: use of undeclared identifier 'ne2'
 8437 |         int nr2 = ne12/ne2;
      |                        ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8438:19: error: use of undeclared identifier 'ne13'
 8438 |         int nr3 = ne13/ne3;
      |                   ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8438:24: error: use of undeclared identifier 'ne3'
 8438 |         int nr3 = ne13/ne3;
      |                        ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8443:27: error: use of undeclared identifier 'ne0'; did you mean 'new'?
 8443 |         int64_t cne0[] = {ne0, ne1, ne2, ne3};
      |                           ^~~
      |                           new
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8443:30: error: expected a type
 8443 |         int64_t cne0[] = {ne0, ne1, ne2, ne3};
      |                              ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8444:45: error: use of undeclared identifier 'ne13'
 8444 |         int64_t cne1[] = {ne10, ne11, ne12, ne13};
      |                                             ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8445:26: error: use of undeclared identifier 'nb0'
 8445 |         size_t cnb0[] = {nb0, nb1, nb2, nb3};
      |                          ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8445:31: error: use of undeclared identifier 'nb1'
 8445 |         size_t cnb0[] = {nb0, nb1, nb2, nb3};
      |                               ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8445:36: error: use of undeclared identifier 'nb2'
 8445 |         size_t cnb0[] = {nb0, nb1, nb2, nb3};
      |                                    ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
4 warnings and 20 errors generated.
make[3]: *** [CMakeFiles/ggml.dir/build.make:132: CMakeFiles/ggml.dir/ggml-sycl.cpp.o] Error 1
make[2]: *** [CMakeFiles/Makefile2:758: CMakeFiles/ggml.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:2491: examples/main/CMakeFiles/main.dir/rule] Error 2
make: *** [Makefile:998: main] Error 2

@abhilash1910
Copy link
Collaborator

Yes @akarshanbiswas this was an obvious error on my part for the recent build error, could you re-try with the latest commit on the branch ? Thanks . It might throw some other exception but the results shown above rule out the possibility of arising in sycl code. It might be incoming from another forced typecast inside core headers.

@qnixsynapse
Copy link
Contributor Author

@abhilash1910 It failed again with a different error this time.

/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:14538:12: error: no viable conversion from returned value of type 'std::string' (aka 'basic_string<char>') to function return type 'const char *'
 14538 |     return ctx->name;
       |            ^~~~~~~~~
/usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/basic_string.h:932:7: note: candidate function
  932 |       operator __sv_type() const noexcept
      |       ^
103 warnings and 1 error generated.
make[3]: *** [CMakeFiles/ggml.dir/build.make:132: CMakeFiles/ggml.dir/ggml-sycl.cpp.o] Error 1
make[2]: *** [CMakeFiles/Makefile2:758: CMakeFiles/ggml.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:2491: examples/main/CMakeFiles/main.dir/rule] Error 2
make: *** [Makefile:998: main] Error 2

@abhilash1910
Copy link
Collaborator

@akarshanbiswas @channeladam please try #5624 and let us know if issue persists?

@channeladam
Copy link

Works for me

@channeladam
Copy link

channeladam commented Feb 21, 2024

Actually with #5624 , I am getting another crash now upon usage (the previous crash was on startup).

Available slots:
 -> Slot 0 - max context: 512
{"timestamp":1708526744,"level":"INFO","function":"main","line":2713,"message":"model loaded"}
all slots are idle and system prompt is empty, clear the KV cache
slot 0 is processing [task id: 0]
slot 0 : kv cache rm - [0, end)
ggml_gallocr_needs_realloc: node inp_embd is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched: failed to allocate graph, reserving
/usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/random.tcc:2665: void std::discrete_distribution<>::param_type::_M_initialize() [_IntType = int]: Assertion '__sum > 0' failed.
zsh: IOT instruction (core dumped)  /mnt/data/llama/llama.cpp/build/bin/server -m  -ngl 33

Backtrace:

(gdb) backtrace
#0  0x00007f44b2cac83c in ?? () from /usr/lib/libc.so.6
#1  0x00007f44b2c5c668 in raise () from /usr/lib/libc.so.6
#2  0x00007f44b2c44542 in abort () from /usr/lib/libc.so.6
#3  0x00007f44b32dd3b2 in std::__glibcxx_assert_fail (file=<optimized out>, line=line@entry=2665, function=<optimized out>, condition=<optimized out>)
    at /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/debug.cc:61
#4  0x000000000051cef4 in std::discrete_distribution<int>::param_type::_M_initialize (this=0x7ffe716969b0)
    at /usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/random.tcc:2665
#5  std::discrete_distribution<int>::param_type::param_type<__gnu_cxx::__normal_iterator<float*, std::vector<float, std::allocator<float> > > > (
    this=<optimized out>, __wbegin=..., __wend=...) at /usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/random.h:5467
#6  std::discrete_distribution<int>::discrete_distribution<__gnu_cxx::__normal_iterator<float*, std::vector<float, std::allocator<float> > > > (
    this=<optimized out>, __wbegin=..., __wend=...) at /usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/random.h:5510
#7  llama_sample_token (ctx=0x3bff380, candidates=0x7ffe71696ab8) at /mnt/data/llama/llama.cpp/llama.cpp:9909
#8  0x000000000050d5f2 in llama_sampling_sample_impl (ctx_sampling=<optimized out>, ctx_main=<optimized out>, ctx_cfg=ctx_cfg@entry=0x0, idx=0, 
    is_resampling=false) at /mnt/data/llama/llama.cpp/common/sampling.cpp:256
#9  0x000000000050cc25 in llama_sampling_sample (ctx_sampling=0x11c115, ctx_main=0x11c115, ctx_cfg=0x6, ctx_cfg@entry=0x0, idx=-1295333316)
    at /mnt/data/llama/llama.cpp/common/sampling.cpp:304
#10 0x0000000000494983 in llama_server_context::update_slots (this=0x7ffe71697ba0) at /mnt/data/llama/llama.cpp/examples/server/server.cpp:1792
#11 0x000000000042751e in std::function<void ()>::operator()() const (this=<optimized out>)
    at /usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/std_function.h:591
#12 llama_server_queue::start_loop (this=<optimized out>) at /mnt/data/llama/llama.cpp/examples/server/utils.hpp:327
#13 main (argc=<optimized out>, argv=<optimized out>) at /mnt/data/llama/llama.cpp/examples/server/server.cpp:3197

@qnixsynapse
Copy link
Contributor Author

Seems unrelated to SYCL. (although symbols aren't properly loaded here) Please open a new issue.

@mudler
Copy link
Contributor

mudler commented Feb 23, 2024

I can confirm that works here too, just tested with my Arc a770 against 201294a - thanks @abhilash1910 @airMeng !

@djstraylight
Copy link

Does master have these fixes now or do I still need to use this specific commit?
I'm still getting a core dump with sycl on an Arc 770 on linux.

@airMeng
Copy link
Collaborator

airMeng commented Mar 25, 2024

Does master have these fixes now or do I still need to use this specific commit? I'm still getting a core dump with sycl on an Arc 770 on linux.

in the master already. You can open a new issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants