You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered this problem when running benchmark_e2e_vllm_tp.py, loc("/ossfs/workspace/Minference/minference/ops/pit_sparse_flash_atloc(t"e/ntioon_vs2.psy":f110s/:w23)or: error: ksoperation scheduled before its operandsp loc(a"c/e/Moinsfserefnces/min/fwerenocre/opsk/psipatce/_Minfsepreanrscee_/flamsh_atitenntifon_evre2nce./pyo"ps:/pi110t:_s23p)a: rerror: soperation scheduled before its operandse_ flash_attention_v2.py":110:23): error: operation scheduled before its operands
but it still worked. I would like to ask if this will cause any problems
leoyuppieqnew
changed the title
[Bug]: loc("/ossfs/workspace/Minference/minference/ops/pit_sparse_flash_attention_v2.py":110:23): error: operation scheduled before its operands
[Bug]: loc("Minference/minference/ops/pit_sparse_flash_attention_v2.py":110:23): error: operation scheduled before its operands
Sep 18, 2024
Describe the bug
I encountered this problem when running benchmark_e2e_vllm_tp.py,
loc("/ossfs/workspace/Minference/minference/ops/pit_sparse_flash_atloc(t"e/ntioon_vs2.psy":f110s/:w23)or: error: ksoperation scheduled before its operandsp loc(a"c/e/Moinsfserefnces/min/fwerenocre/opsk/psipatce/_Minfsepreanrscee_/flamsh_atitenntifon_evre2nce./pyo"ps:/pi110t:_s23p)a: rerror: soperation scheduled before its operandse_ flash_attention_v2.py":110:23): error: operation scheduled before its operands
but it still worked. I would like to ask if this will cause any problems
Steps to reproduce
`$python benchmark_e2e_vllm_tp.py --model_name /mntfn/yanyi/qwen2_72b_tuwen_mix_per_tensor_dynamic --attn_type minference --context_window 100_000 --tensor_parallel_size 4
Loading safetensors checkpoint shards: 0% Completed | 0/16 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 6% Completed | 1/16 [00:53<13:23, 53.54s/it]
Loading safetensors checkpoint shards: 12% Completed | 2/16 [01:44<12:08, 52.05s/it]
Loading safetensors checkpoint shards: 19% Completed | 3/16 [02:34<11:06, 51.23s/it]
Loading safetensors checkpoint shards: 25% Completed | 4/16 [05:17<19:01, 95.11s/it]
Loading safetensors checkpoint shards: 31% Completed | 5/16 [06:32<16:07, 87.98s/it]
Loading safetensors checkpoint shards: 38% Completed | 6/16 [07:20<12:23, 74.38s/it]
Loading safetensors checkpoint shards: 44% Completed | 7/16 [09:15<13:10, 87.81s/it]
Loading safetensors checkpoint shards: 50% Completed | 8/16 [10:15<10:30, 78.80s/it]
Loading safetensors checkpoint shards: 56% Completed | 9/16 [11:17<08:33, 73.42s/it]
Loading safetensors checkpoint shards: 62% Completed | 10/16 [13:16<08:46, 87.76s/it]
Loading safetensors checkpoint shards: 69% Completed | 11/16 [17:17<11:13, 134.66s/it]
Loading safetensors checkpoint shards: 75% Completed | 12/16 [17:39<06:40, 100.20s/it]
Loading safetensors checkpoint shards: 81% Completed | 13/16 [21:17<06:47, 135.83s/it]
Loading safetensors checkpoint shards: 88% Completed | 14/16 [24:16<04:57, 148.99s/it]
Loading safetensors checkpoint shards: 94% Completed | 15/16 [27:16<02:38, 158.20s/it]
Loading safetensors checkpoint shards: 100% Completed | 16/16 [27:19<00:00, 111.50s/it]
Loading safetensors checkpoint shards: 100% Completed | 16/16 [27:19<00:00, 102.44s/it]
Patched model for minference with vLLM..
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]loc("/ossfs/workspace/Minference/minference/ops/pit_sparse_flash_atloc(t"e/ntioon_vs2.psy":f110s/:w23)or: error: ksoperation scheduled before its operandsp
loc(a"c/e/Moinsfserefnces/min/fwerenocre/opsk/psipatce/Minfsepreanrscee/flamsh_atitenntifon_evre2nce./pyo"ps:/pi110t:s23p)a: rerror: soperation scheduled before its operandse
flash_attention_v2.py":110:23): error: operation scheduled before its operands
loc("/ossfs/workspace/Minference/minference/ops/pit_sparse_flash_attention_v2.py":110:23): error: operation scheduled before its operands
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:57<00:00, 57.96s/it, est. speed input: 1724.83 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.87s/it, est. speed input: 1789.22 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.76s/it, est. speed input: 1792.78 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.78s/it, est. speed input: 1792.20 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.78s/it, est. speed input: 1792.07 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.79s/it, est. speed input: 1791.81 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.76s/it, est. speed input: 1792.69 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.77s/it, est. speed input: 1792.53 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.76s/it, est. speed input: 1792.75 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.78s/it, est. speed input: 1792.03 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.78s/it, est. speed input: 1792.13 toks/s, output: 0.02 toks/s]
minference 100000 55.92022843360901`
Expected Behavior
No response
Logs
No response
Additional Information
MInference Version:hjiang/support_vllm_tp
GPU: L20*4
Python Version:3.8
The text was updated successfully, but these errors were encountered: