[SYCL] Segmentation fault after #5411 #5469

qnixsynapse · 2024-02-13T02:11:58Z

System: Arch Linux,
CPU: Intel i3 12th gen
GPU: Intel Arc A750
RAM: 16GB

llama.cpp version: b2134

Previously the build was failing with -DLLAMA_SYCL_F16=ON which has been fixed in #5411. Upon running this build, it crashes with segmentation fault.

logs:

bin/main -m ~/Public/Models/Weights/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf  -p "hello " -n 1000 -ngl 99
Log start
main: build = 2134 (099afc62)
main: built with Intel(R) oneAPI DPC++/C++ Compiler 2024.0.0 (2024.0.0.20231017) for x86_64-unknown-linux-gnu
main: seed  = 1707789832
GGML_SYCL_DEBUG=0
ggml_init_sycl: GGML_SYCL_F16:   yes
ggml_init_sycl: SYCL_USE_XMX: yes
found 4 SYCL devices:
  Device 0: Intel(R) Arc(TM) A750 Graphics,	compute capability 1.3,
	max compute_units 448,	max work group size 1024,	max sub group size 32,	global mem size 8096681984
  Device 1: Intel(R) FPGA Emulation Device,	compute capability 1.2,
	max compute_units 4,	max work group size 67108864,	max sub group size 64,	global mem size 16577347584
  Device 2: 12th Gen Intel(R) Core(TM) i3-12100F,	compute capability 3.0,
	max compute_units 4,	max work group size 8192,	max sub group size 64,	global mem size 16577347584
  Device 3: Intel(R) Arc(TM) A750 Graphics,	compute capability 3.0,
	max compute_units 448,	max work group size 1024,	max sub group size 32,	global mem size 8096681984
Using device 0 (Intel(R) Arc(TM) A750 Graphics) as main device
llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /home/tensorblast/Public/Models/Weights/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = tinyllama_tinyllama-1.1b-chat-v1.0
llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv   4:                          llama.block_count u32              = 22
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5632
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 4
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 15
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                      tokenizer.ggml.merges arr[str,61249]   = ["▁ t", "e r", "i n", "▁ a", "e n...
llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 2
llama_model_loader: - kv  21:                    tokenizer.chat_template str              = {% for message in messages %}\n{% if m...
llama_model_loader: - kv  22:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   45 tensors
llama_model_loader: - type q4_K:  135 tensors
llama_model_loader: - type q6_K:   21 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 2048
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 4
llm_load_print_meta: n_layer          = 22
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_embd_head_k    = 64
llm_load_print_meta: n_embd_head_v    = 64
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: n_embd_k_gqa     = 256
llm_load_print_meta: n_embd_v_gqa     = 256
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 5632
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 2048
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 1B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 1.10 B
llm_load_print_meta: model size       = 636.18 MiB (4.85 BPW) 
llm_load_print_meta: general.name     = tinyllama_tinyllama-1.1b-chat-v1.0
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 2 '</s>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.15 MiB
llm_load_tensors: offloading 22 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 23/23 layers to GPU
llm_load_tensors:            buffer size =   601.02 MiB
llm_load_tensors:        CPU buffer size =    35.16 MiB
.....................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:            KV buffer size =    11.00 MiB
llama_new_context_with_model: KV self size  =   11.00 MiB, K (f16):    5.50 MiB, V (f16):    5.50 MiB
llama_new_context_with_model:        CPU input buffer size   =     5.01 MiB
zsh: segmentation fault (core dumped)  bin/main -m  -p "hello " -n

~~The build without -DLLAMA_SYCL_F16=ON works.~~

Confirmed: This crash started happening after #5411

The text was updated successfully, but these errors were encountered:

chsasank · 2024-02-13T12:41:54Z

Can confirm that I too got a segfault when built with DLLAMA_SYCL_F16=ON. I will rebuild with OFF and report if it fails too.

chsasank · 2024-02-13T12:46:32Z

Segfaults even without that option.

abhilash1910 · 2024-02-14T06:44:56Z

@akarshanbiswas @chsasank could you please re-try with this branch : https://github.com/abhilash1910/llama.cpp/tree/fix_sycl_arc (branch: fix_sycl_arc) and let me know if this addresses the issue ?

qnixsynapse · 2024-02-14T07:34:26Z

@abhilash1910 Nope. Still crashing with segmentation fault with or without -DLLAMA_SYCL_F16=ON.

Here is what I can get:

(gdb) bt
#0  0x000000000060cd14 in ggml_backend_sycl_buffer_type_name(ggml_backend_buffer_type*)
    ()
#1  0x000000000046e830 in llama_new_context_with_model ()
#2  0x000000000042f660 in llama_init_from_gpt_params(gpt_params&) ()
#3  0x000000000041c211 in main ()

sycl support is broken otherwise. See upstream issue: ggerganov/llama.cpp#5469 Signed-off-by: Ettore Di Giacinto <[email protected]>

mudler · 2024-02-14T09:08:17Z

I can confirm here, JFYI pinning to commit f026f81 seems to work for me (tested with Intel Arc a770)

sycl support is broken otherwise. See upstream issue: ggerganov/llama.cpp#5469 Signed-off-by: Ettore Di Giacinto <[email protected]>

abhilash1910 · 2024-02-14T09:33:34Z

I can confirm here, JFYI pinning to commit f026f81 seems to work for me (tested with Intel Arc a770)

Thanks @mudler , could you please check if this commit works? 4a46d2b
This should help in the resolution quicker.

mudler · 2024-02-14T10:39:52Z

@abhilash1910 that commit fails here with a core-dump

qnixsynapse · 2024-02-14T11:11:44Z

Got better backtrace this time:

(gdb) bt
#0  ggml_backend_sycl_buffer_type_name (
    buft=0xcd5f20 <ggml_backend_sycl_buffer_type::ggml_backend_sycl_buffer_types>)
    at /home/tensorblast/Public/Models/llama.cpp/ggml-sycl.cpp:14765
#1  0x0000000000519869 in ggml_backend_buft_name (
    buft=0xcd5f20 <ggml_backend_sycl_buffer_type::ggml_backend_sycl_buffer_types>)
    at /home/tensorblast/Public/Models/llama.cpp/ggml-backend.c:19
#2  0x0000000000517613 in ggml_gallocr_reserve_n (galloc=0x36ad020, 
    graph=0x7fffa420efe0, node_buffer_ids=0x39a7fe0)
    at /home/tensorblast/Public/Models/llama.cpp/ggml-alloc.c:707
#3  0x000000000051bcf3 in ggml_backend_sched_reserve (sched=0x7fffa4200010, 
    measure_graph=0x7fffc0200030)
    at /home/tensorblast/Public/Models/llama.cpp/ggml-backend.c:1563
#4  0x0000000000470293 in llama_new_context_with_model (model=0x3669dd0, params=...)
    at /home/tensorblast/Public/Models/llama.cpp/llama.cpp:11461
#5  0x000000000042f9a0 in llama_init_from_gpt_params (params=...)
    at /home/tensorblast/Public/Models/llama.cpp/common/common.cpp:1300
#6  0x000000000041bce9 in main (argc=<optimized out>, argv=0x7fffffffcfb8)
    at /home/tensorblast/Public/Models/llama.cpp/examples/main/main.cpp:198

abhilash1910 · 2024-02-14T17:53:23Z

Got better backtrace this time:

(gdb) bt
#0  ggml_backend_sycl_buffer_type_name (
    buft=0xcd5f20 <ggml_backend_sycl_buffer_type::ggml_backend_sycl_buffer_types>)
    at /home/tensorblast/Public/Models/llama.cpp/ggml-sycl.cpp:14765
#1  0x0000000000519869 in ggml_backend_buft_name (
    buft=0xcd5f20 <ggml_backend_sycl_buffer_type::ggml_backend_sycl_buffer_types>)
    at /home/tensorblast/Public/Models/llama.cpp/ggml-backend.c:19
#2  0x0000000000517613 in ggml_gallocr_reserve_n (galloc=0x36ad020, 
    graph=0x7fffa420efe0, node_buffer_ids=0x39a7fe0)
    at /home/tensorblast/Public/Models/llama.cpp/ggml-alloc.c:707
#3  0x000000000051bcf3 in ggml_backend_sched_reserve (sched=0x7fffa4200010, 
    measure_graph=0x7fffc0200030)
    at /home/tensorblast/Public/Models/llama.cpp/ggml-backend.c:1563
#4  0x0000000000470293 in llama_new_context_with_model (model=0x3669dd0, params=...)
    at /home/tensorblast/Public/Models/llama.cpp/llama.cpp:11461
#5  0x000000000042f9a0 in llama_init_from_gpt_params (params=...)
    at /home/tensorblast/Public/Models/llama.cpp/common/common.cpp:1300
#6  0x000000000041bce9 in main (argc=<optimized out>, argv=0x7fffffffcfb8)
    at /home/tensorblast/Public/Models/llama.cpp/examples/main/main.cpp:198

Thanks for the traceback. As @mudler confirmed that f026f81 seems to be building correctly which includes #5411 already. For the time being I would recommend rolling back to the commit until a fix is applied.

qnixsynapse · 2024-02-14T18:03:22Z

@abhilash1910 Yes. It builds correctly but ends up in segfault with or without -DLLAMA_SYCL_F16=ON . I am using a build from commit before that which seems to work well without the SYCL_F16 enabled. For build with -DLLAMA_SYCL_F16=ON, the build fails with a compilation error before #5411 .

If you need any help in testing. Please do ping me.

mudler · 2024-02-14T19:56:13Z

@abhilash1910 Yes. It builds correctly but ends up in segfault with or without -DLLAMA_SYCL_F16=ON . I am using a build from commit before that which seems to work well without the SYCL_F16 enabled. For build with -DLLAMA_SYCL_F16=ON, the build fails with a compilation error before #5411 .

If you need any help in testing. Please do ping me.

That is quite weird. It actually works here, it does not only build. If you want to try to reproduce, this is the LocalAI container image having llama.cpp pinned at f026f81 : quay.io/go-skynet/local-ai@sha256:c6b5dfaff64c24a02f1be8f8e1cb5c0837b130b438753e49b349d70e3d6d1916 and it can do inferencing correctly. Note I'm testing this with an Intel Arc a770, so might be maybe related to that, however, using llama.cpp current commit in main fails with segfaults also on my Arc a770.

You can run phi-2 configured for sycl with (f32):

docker run -e DEBUG=true -ti -v $PWD/models:/build/models -p 8080:8080  -v /dev/dri:/dev/dri quay.io/go-skynet/local-ai@sha256:c6b5dfaff64c24a02f1be8f8e1cb5c0837b130b438753e49b349d70e3d6d1916 https://gist.githubusercontent.com/mudler/103de2576a8fd4b583f9bd53f4e4cefd/raw/9181d4add553326806b8fdbf4ff0cd65d2145bff/phi-2-sycl.yaml

to test it:

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
          "model": "phi-2",
          "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
      }'

To double-check the version you can run in the container:

cat /build/Makefile  | grep CPPLLAMA_VERSION 
CPPLLAMA_VERSION?=f026f8120f97090d34a52b3dc023c82e0ede3f7d

I am actually running this in kubernetes, any images from master are pinned to that commit, leaving also my deployment for reference:

apiVersion: v1
kind: Namespace
metadata:
  name: local-ai
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: models-pvc
  namespace: local-ai
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: local-ai
  namespace: local-ai
  labels:
    app: local-ai
spec:
  selector:
    matchLabels:
      app: local-ai
  replicas: 1
  template:
    metadata:
      labels:
        app: local-ai
      name: local-ai
    spec:
      containers:
        - env:
          - name: DEBUG
            value: "true"
          name: local-ai
          args:
          # phi-2 configuration
          - https://gist.githubusercontent.com/mudler/103de2576a8fd4b583f9bd53f4e4cefd/raw/9181d4add553326806b8fdbf4ff0cd65d2145bff/phi-2-sycl.yaml
          image: quay.io/go-skynet/local-ai:master-sycl-f32-core
          imagePullPolicy: Always
          resources:
            limits:
              gpu.intel.com/i915: 1
          volumeMounts:
            - name: models-volume
              mountPath: /build/models
      volumes:
        - name: models-volume
          persistentVolumeClaim:
            claimName: models-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: local-ai
  namespace: local-ai
spec:
  selector:
    app: local-ai
  type: NodePort
  ports:
    - protocol: TCP
      port: 8080
      targetPort: 8080

abhilash1910 · 2024-02-15T03:36:50Z

@akarshanbiswas @chsasank could you please re-try with this branch : https://github.com/abhilash1910/llama.cpp/tree/fix_sycl_arc (branch: fix_sycl_arc) and let me know if this addresses the issue ?

@mudler could you please try with this branch and let me know if this fixes the segfault issue. If not then there are changes in ggml backend which may have caused this.

qnixsynapse · 2024-02-15T03:39:57Z

I'm testing this with an Intel Arc a770, so might be maybe related to that,

@mudler Yes it is possible. I tried running with -ngl 0 and it worked. The error seems to be happening what I deduce from the backtrace is that buft object or its associated context (ctx) is not properly initialized or contains invalid data which is leading to segfault. Honestly, my knowledge in these areas is rusty. @abhilash1910 dada may know better. :)

channeladam · 2024-02-19T14:55:23Z

I too am getting the segfault. I don't know how to help though... I can test things if there is something to try.

abhilash1910 · 2024-02-20T07:02:33Z

@akarshanbiswas @mudler @channeladam Could you please try to build from latest master and see if that works ? Thanks

qnixsynapse · 2024-02-20T07:24:54Z

@abhilash1910 Still fails at the same location:

abhilash1910 · 2024-02-20T07:44:18Z

Thanks @akarshanbiswas , could you again try building with : https://github.com/abhilash1910/llama.cpp/tree/fix_sycl_arc

qnixsynapse · 2024-02-20T07:56:07Z

@abhilash1910

channeladam · 2024-02-20T08:35:41Z

@abhilash1910

./server -m /mnt/data/llama/models/Magicoder-S-DS-6.7B_q8_0.gguf -ngl 41 -c 4096

core dump backtrace: (of your fork fix_sycl_arc)

(gdb) backtrace
#0  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_data (this=0x8)
    at /usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/basic_string.h:223
#1  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::c_str (this=0x8)
    at /usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/basic_string.h:2584
#2  ggml_backend_sycl_buffer_type_name (buft=0xdd02e0 <ggml_backend_sycl_buffer_type::ggml_backend_sycl_buffer_types>)
    at /mnt/data/llama/forks/llama.cpp/ggml-sycl.cpp:14765
#3  0x00000000005d2a19 in ggml_backend_buft_name (buft=0xdd02e0 <ggml_backend_sycl_buffer_type::ggml_backend_sycl_buffer_types>)
    at /mnt/data/llama/forks/llama.cpp/ggml-backend.c:19
#4  0x00000000005d07c3 in ggml_gallocr_reserve_n (galloc=0x46160c0, graph=0x7488ac0, node_buffer_ids=0x4634720)
    at /mnt/data/llama/forks/llama.cpp/ggml-alloc.c:707
#5  0x00000000005d4ea3 in ggml_backend_sched_reserve (sched=0x7479af0, measure_graph=0x7149a70) at /mnt/data/llama/forks/llama.cpp/ggml-backend.c:1564
#6  0x00000000005288f3 in llama_new_context_with_model (model=0x464cb50, params=...) at /mnt/data/llama/forks/llama.cpp/llama.cpp:11540
#7  0x00000000005026b0 in llama_init_from_gpt_params (params=...) at /mnt/data/llama/forks/llama.cpp/common/common.cpp:1328
#8  0x00000000004232d4 in llama_server_context::load_model (this=0x7fff1c3ae270, params_=...)
    at /mnt/data/llama/forks/llama.cpp/examples/server/server.cpp:377
#9  main (argc=<optimized out>, argv=<optimized out>) at /mnt/data/llama/forks/llama.cpp/examples/server/server.cpp:2735

abhilash1910 · 2024-02-20T11:07:20Z

Thanks @channeladam @akarshanbiswas , requesting you to re-try the same branch if possible? Thanks

qnixsynapse · 2024-02-20T11:24:58Z

@abhilash1910 This time the build failed

[ 42%] Building CXX object CMakeFiles/ggml.dir/ggml-sycl.cpp.o
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:1123:55: warning: cast from 'const void *' to 'unsigned char *' drops const qualifier [-Wcast-qual]
 1123 |                 auto it = m_map.upper_bound((byte_t *)ptr);
      |                                                       ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:3659:31: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'const int' [-Wsign-compare]
 3659 |     if (item_ct1.get_group(0) < ne02) { // src0
      |         ~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:3701:31: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'const int' [-Wsign-compare]
 3701 |         item_ct1.get_group(0) < ne02) {
      |         ~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:3700:46: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'const int' [-Wsign-compare]
 3700 |     if (nidx < ne00 && item_ct1.get_group(1) < ne01 &&
      |                        ~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8330:23: error: use of undeclared identifier 'nb1'
 8330 |     const size_t s1 = nb1 / ggml_element_size(dst);
      |                       ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8331:23: error: use of undeclared identifier 'nb2'
 8331 |     const size_t s2 = nb2 / ggml_element_size(dst);
      |                       ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8332:23: error: use of undeclared identifier 'nb3'
 8332 |     const size_t s3 = nb3 / ggml_element_size(dst);
      |                       ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8383:23: error: use of undeclared identifier 'nb1'
 8383 |     const size_t s1 = nb1 / ggml_element_size(dst);
      |                       ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8384:23: error: use of undeclared identifier 'nb2'
 8384 |     const size_t s2 = nb2 / ggml_element_size(dst);
      |                       ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8385:23: error: use of undeclared identifier 'nb3'
 8385 |     const size_t s3 = nb3 / ggml_element_size(dst);
      |                       ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8435:24: error: use of undeclared identifier 'ne0'; did you mean 'new'?
 8435 |         int nr0 = ne10/ne0;
      |                        ^~~
      |                        new
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8435:27: error: expected a type
 8435 |         int nr0 = ne10/ne0;
      |                           ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8436:24: error: use of undeclared identifier 'ne1'; did you mean 'new'?
 8436 |         int nr1 = ne11/ne1;
      |                        ^~~
      |                        new
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8436:27: error: expected a type
 8436 |         int nr1 = ne11/ne1;
      |                           ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8437:24: error: use of undeclared identifier 'ne2'
 8437 |         int nr2 = ne12/ne2;
      |                        ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8438:19: error: use of undeclared identifier 'ne13'
 8438 |         int nr3 = ne13/ne3;
      |                   ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8438:24: error: use of undeclared identifier 'ne3'
 8438 |         int nr3 = ne13/ne3;
      |                        ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8443:27: error: use of undeclared identifier 'ne0'; did you mean 'new'?
 8443 |         int64_t cne0[] = {ne0, ne1, ne2, ne3};
      |                           ^~~
      |                           new
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8443:30: error: expected a type
 8443 |         int64_t cne0[] = {ne0, ne1, ne2, ne3};
      |                              ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8444:45: error: use of undeclared identifier 'ne13'
 8444 |         int64_t cne1[] = {ne10, ne11, ne12, ne13};
      |                                             ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8445:26: error: use of undeclared identifier 'nb0'
 8445 |         size_t cnb0[] = {nb0, nb1, nb2, nb3};
      |                          ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8445:31: error: use of undeclared identifier 'nb1'
 8445 |         size_t cnb0[] = {nb0, nb1, nb2, nb3};
      |                               ^
/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:8445:36: error: use of undeclared identifier 'nb2'
 8445 |         size_t cnb0[] = {nb0, nb1, nb2, nb3};
      |                                    ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
4 warnings and 20 errors generated.
make[3]: *** [CMakeFiles/ggml.dir/build.make:132: CMakeFiles/ggml.dir/ggml-sycl.cpp.o] Error 1
make[2]: *** [CMakeFiles/Makefile2:758: CMakeFiles/ggml.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:2491: examples/main/CMakeFiles/main.dir/rule] Error 2
make: *** [Makefile:998: main] Error 2

abhilash1910 · 2024-02-20T12:30:28Z

Yes @akarshanbiswas this was an obvious error on my part for the recent build error, could you re-try with the latest commit on the branch ? Thanks . It might throw some other exception but the results shown above rule out the possibility of arising in sycl code. It might be incoming from another forced typecast inside core headers.

qnixsynapse · 2024-02-20T12:35:51Z

@abhilash1910 It failed again with a different error this time.

/home/tensorblast/Public/Models/debug/llama.cpp/ggml-sycl.cpp:14538:12: error: no viable conversion from returned value of type 'std::string' (aka 'basic_string<char>') to function return type 'const char *'
 14538 |     return ctx->name;
       |            ^~~~~~~~~
/usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/basic_string.h:932:7: note: candidate function
  932 |       operator __sv_type() const noexcept
      |       ^
103 warnings and 1 error generated.
make[3]: *** [CMakeFiles/ggml.dir/build.make:132: CMakeFiles/ggml.dir/ggml-sycl.cpp.o] Error 1
make[2]: *** [CMakeFiles/Makefile2:758: CMakeFiles/ggml.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:2491: examples/main/CMakeFiles/main.dir/rule] Error 2
make: *** [Makefile:998: main] Error 2

abhilash1910 · 2024-02-21T07:49:20Z

@akarshanbiswas @channeladam please try #5624 and let us know if issue persists?

channeladam · 2024-02-21T11:35:56Z

Works for me

channeladam · 2024-02-21T14:48:24Z

Actually with #5624 , I am getting another crash now upon usage (the previous crash was on startup).

Available slots:
 -> Slot 0 - max context: 512
{"timestamp":1708526744,"level":"INFO","function":"main","line":2713,"message":"model loaded"}
all slots are idle and system prompt is empty, clear the KV cache
slot 0 is processing [task id: 0]
slot 0 : kv cache rm - [0, end)
ggml_gallocr_needs_realloc: node inp_embd is not valid
ggml_gallocr_alloc_graph: cannot reallocate multi buffer graph automatically, call reserve
ggml_backend_sched: failed to allocate graph, reserving
/usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/random.tcc:2665: void std::discrete_distribution<>::param_type::_M_initialize() [_IntType = int]: Assertion '__sum > 0' failed.
zsh: IOT instruction (core dumped)  /mnt/data/llama/llama.cpp/build/bin/server -m  -ngl 33

Backtrace:

(gdb) backtrace
#0  0x00007f44b2cac83c in ?? () from /usr/lib/libc.so.6
#1  0x00007f44b2c5c668 in raise () from /usr/lib/libc.so.6
#2  0x00007f44b2c44542 in abort () from /usr/lib/libc.so.6
#3  0x00007f44b32dd3b2 in std::__glibcxx_assert_fail (file=<optimized out>, line=line@entry=2665, function=<optimized out>, condition=<optimized out>)
    at /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/debug.cc:61
#4  0x000000000051cef4 in std::discrete_distribution<int>::param_type::_M_initialize (this=0x7ffe716969b0)
    at /usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/random.tcc:2665
#5  std::discrete_distribution<int>::param_type::param_type<__gnu_cxx::__normal_iterator<float*, std::vector<float, std::allocator<float> > > > (
    this=<optimized out>, __wbegin=..., __wend=...) at /usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/random.h:5467
#6  std::discrete_distribution<int>::discrete_distribution<__gnu_cxx::__normal_iterator<float*, std::vector<float, std::allocator<float> > > > (
    this=<optimized out>, __wbegin=..., __wend=...) at /usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/random.h:5510
#7  llama_sample_token (ctx=0x3bff380, candidates=0x7ffe71696ab8) at /mnt/data/llama/llama.cpp/llama.cpp:9909
#8  0x000000000050d5f2 in llama_sampling_sample_impl (ctx_sampling=<optimized out>, ctx_main=<optimized out>, ctx_cfg=ctx_cfg@entry=0x0, idx=0, 
    is_resampling=false) at /mnt/data/llama/llama.cpp/common/sampling.cpp:256
#9  0x000000000050cc25 in llama_sampling_sample (ctx_sampling=0x11c115, ctx_main=0x11c115, ctx_cfg=0x6, ctx_cfg@entry=0x0, idx=-1295333316)
    at /mnt/data/llama/llama.cpp/common/sampling.cpp:304
#10 0x0000000000494983 in llama_server_context::update_slots (this=0x7ffe71697ba0) at /mnt/data/llama/llama.cpp/examples/server/server.cpp:1792
#11 0x000000000042751e in std::function<void ()>::operator()() const (this=<optimized out>)
    at /usr/lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/std_function.h:591
#12 llama_server_queue::start_loop (this=<optimized out>) at /mnt/data/llama/llama.cpp/examples/server/utils.hpp:327
#13 main (argc=<optimized out>, argv=<optimized out>) at /mnt/data/llama/llama.cpp/examples/server/server.cpp:3197

qnixsynapse · 2024-02-21T15:19:07Z

Seems unrelated to SYCL. (although symbols aren't properly loaded here) Please open a new issue.

mudler · 2024-02-23T18:05:57Z

I can confirm that works here too, just tested with my Arc a770 against 201294a - thanks @abhilash1910 @airMeng !

djstraylight · 2024-03-25T19:19:44Z

Does master have these fixes now or do I still need to use this specific commit?
I'm still getting a core dump with sycl on an Arc 770 on linux.

airMeng · 2024-03-25T23:48:08Z

Does master have these fixes now or do I still need to use this specific commit? I'm still getting a core dump with sycl on an Arc 770 on linux.

in the master already. You can open a new issue?

qnixsynapse added the bug-unconfirmed label Feb 13, 2024

qnixsynapse changed the title ~~[SYCL] Segmentation fault with GGML_SYCL_F16~~ [SYCL] Segmentation fault after #5411 Feb 13, 2024

chsasank mentioned this issue Feb 13, 2024

Low performance with Sycl Backend #5480

Closed

abhilash1910 added the Intel GPU label Feb 14, 2024

abhilash1910 self-assigned this Feb 14, 2024

mudler added a commit to mudler/LocalAI that referenced this issue Feb 14, 2024

fix(llama.cpp): downgrade to a known working version

0cb31fb

sycl support is broken otherwise. See upstream issue: ggerganov/llama.cpp#5469 Signed-off-by: Ettore Di Giacinto <[email protected]>

mudler mentioned this issue Feb 14, 2024

fix(llama.cpp): downgrade to a known working version mudler/LocalAI#1706

Merged

1 task

mudler added a commit to mudler/LocalAI that referenced this issue Feb 14, 2024

fix(llama.cpp): downgrade to a known working version (#1706)

39a6b56

sycl support is broken otherwise. See upstream issue: ggerganov/llama.cpp#5469 Signed-off-by: Ettore Di Giacinto <[email protected]>

mudler mentioned this issue Feb 17, 2024

deps(llama.cpp): update mudler/LocalAI#1714

Merged

airMeng mentioned this issue Feb 21, 2024

[SYCL] conext add name #5624

Merged

airMeng closed this as completed in #5624 Feb 21, 2024

mlim15 mentioned this issue Apr 22, 2024

feat: enable OLLAMA Arc GPU support with SYCL backend ollama/ollama#3796

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Segmentation fault after #5411 #5469

[SYCL] Segmentation fault after #5411 #5469

qnixsynapse commented Feb 13, 2024 •

edited

Loading

chsasank commented Feb 13, 2024

chsasank commented Feb 13, 2024

abhilash1910 commented Feb 14, 2024

qnixsynapse commented Feb 14, 2024 •

edited

Loading

mudler commented Feb 14, 2024 •

edited

Loading

abhilash1910 commented Feb 14, 2024

mudler commented Feb 14, 2024

qnixsynapse commented Feb 14, 2024 •

edited

Loading

abhilash1910 commented Feb 14, 2024

qnixsynapse commented Feb 14, 2024

mudler commented Feb 14, 2024 •

edited

Loading

abhilash1910 commented Feb 15, 2024

qnixsynapse commented Feb 15, 2024 •

edited

Loading

channeladam commented Feb 19, 2024

abhilash1910 commented Feb 20, 2024

qnixsynapse commented Feb 20, 2024

abhilash1910 commented Feb 20, 2024

qnixsynapse commented Feb 20, 2024

channeladam commented Feb 20, 2024 •

edited

Loading

abhilash1910 commented Feb 20, 2024

qnixsynapse commented Feb 20, 2024

abhilash1910 commented Feb 20, 2024

qnixsynapse commented Feb 20, 2024

abhilash1910 commented Feb 21, 2024

channeladam commented Feb 21, 2024

channeladam commented Feb 21, 2024 •

edited

Loading

qnixsynapse commented Feb 21, 2024

mudler commented Feb 23, 2024

djstraylight commented Mar 25, 2024

airMeng commented Mar 25, 2024

[SYCL] Segmentation fault after #5411 #5469

[SYCL] Segmentation fault after #5411 #5469

Comments

qnixsynapse commented Feb 13, 2024 • edited Loading

chsasank commented Feb 13, 2024

chsasank commented Feb 13, 2024

abhilash1910 commented Feb 14, 2024

qnixsynapse commented Feb 14, 2024 • edited Loading

mudler commented Feb 14, 2024 • edited Loading

abhilash1910 commented Feb 14, 2024

mudler commented Feb 14, 2024

qnixsynapse commented Feb 14, 2024 • edited Loading

abhilash1910 commented Feb 14, 2024

qnixsynapse commented Feb 14, 2024

mudler commented Feb 14, 2024 • edited Loading

abhilash1910 commented Feb 15, 2024

qnixsynapse commented Feb 15, 2024 • edited Loading

channeladam commented Feb 19, 2024

abhilash1910 commented Feb 20, 2024

qnixsynapse commented Feb 20, 2024

abhilash1910 commented Feb 20, 2024

qnixsynapse commented Feb 20, 2024

channeladam commented Feb 20, 2024 • edited Loading

abhilash1910 commented Feb 20, 2024

qnixsynapse commented Feb 20, 2024

abhilash1910 commented Feb 20, 2024

qnixsynapse commented Feb 20, 2024

abhilash1910 commented Feb 21, 2024

channeladam commented Feb 21, 2024

channeladam commented Feb 21, 2024 • edited Loading

qnixsynapse commented Feb 21, 2024

mudler commented Feb 23, 2024

djstraylight commented Mar 25, 2024

airMeng commented Mar 25, 2024

qnixsynapse commented Feb 13, 2024 •

edited

Loading

qnixsynapse commented Feb 14, 2024 •

edited

Loading

mudler commented Feb 14, 2024 •

edited

Loading

qnixsynapse commented Feb 14, 2024 •

edited

Loading

mudler commented Feb 14, 2024 •

edited

Loading

qnixsynapse commented Feb 15, 2024 •

edited

Loading

channeladam commented Feb 20, 2024 •

edited

Loading

channeladam commented Feb 21, 2024 •

edited

Loading