[Bug] User side cancellation #179

sunggg · 2024-01-30T15:43:37Z

User side cancellation does not take effect. We also need to log properly when it has been cancelled.

This PR changes the GPT-NeoX KV cache creation function to create to full size at the beginning, so no memory allocation will be required when running on the fly.

yelite · 2024-01-31T21:01:51Z

How is the user side cancellation triggered? When I tried by ctrl-c a running curl command, I can see the cancellation gets processed.

script:

payload='{                                                                                                                                                                                                 
  "model": "llama-2",                                                                                                                                                                                      
  "messages": [                                                                                                                                                                                            
      {                                                                                                                                                                                                    
        "role": "user",                                                                                                                                                                                    
        "content": "Hello! what is the answer to life, the universe, and everything? give me a long answer"                                                                                                
      }                                                                                                                                                                                                    
    ],                                                                                                                                                                                                     
  "max_tokens": 1000,                                                                                                                                                                                      
  "stream": true,                                                                                                                                                                                          
  "temperature": 1.0,                                                                                                                                                                                      
  "top_p": 1,                                                                                                                                                                                              
  "presence_penalty": 0,                                                                                                                                                                                   
  "frequency_penalty": 0                                                                                                                                                                                   
}'                                                                                                                                                                                                         
                                                                                                                                                                                                           
echo "======="                                                                                                                                                                                             
echo "Request"                                                                                                                                                                                             
echo "======="                                                                                                                                                                                             
echo "$payload" | jq                                                                                                                                                                                       
                                                                                                                                                                                                           
echo "========"                                                                                                                                                                                            
echo "Response"                                                                                                                                                                                            
echo "========"                                                                                                                                                                                            
                                                                                                                                                                                                           
curl -s -X 'POST' \                                                                                                                                                                                        
  'http://127.0.0.1:8000/v1/chat/completions' \                                                                                                                                                            
  -H 'accept: application/json' \                                                                                                                                                                          
  -H 'Content-Type: application/json' \                                                                                                                                                                    
  -H "Authorization: Bearer abc" \                                                                                                                                                                         
  -d "$payload"

log:

2024-01-31 20:58:40 [info     ] StagingInferenceEngine.add     [mlc_serve.engine.staging_engine] func_name=add lineno=106 pathname=/opt/dlami/nvme/liteye/mlc-llm/serve/mlc_serve/engine/staging_engine.py process=2803754 requests=[Request(request_id='cmpl-71e9e27ce9f842108e3e820b1b6d63c8', messages=[ChatMessage(role='user', content='Hello! what is the answer to life, the universe, and everything? give me a long answer')], num_sequences=1, best_of=1, sampling_params=SamplingParams(presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, logit_bias=None, appeared_tokens_freq={}, logit_bias_index=None, logit_bias_value=None, logprobs=False, top_logprobs=0), stopping_criteria=StoppingCriteria(max_tokens=1000, stop_sequences=[]), debug_options=DebugOptions(ignore_eos=False, prompt=None, prompt_token_ids=None), validate_tokens=None, contextvars={})]
2024-01-31 20:58:40 [info     ] AsyncEngineConnector.generate iterator cancelled. [mlc_serve.engine.async_connector] func_name=generate lineno=90 pathname=/opt/dlami/nvme/liteye/mlc-llm/serve/mlc_serve/engine/async_connector.py process=2803754 request_id=cmpl-71e9e27ce9f842108e3e820b1b6d63c8
2024-01-31 20:58:40 [info     ] StagingInferenceEngine.cancel  [mlc_serve.engine.staging_engine] func_name=cancel lineno=133 pathname=/opt/dlami/nvme/liteye/mlc-llm/serve/mlc_serve/engine/staging_engine.py process=2803754 request_id=cmpl-71e9e27ce9f842108e3e820b1b6d63c8
2024-01-31 20:58:40 [info     ] AsyncEngineConnector.generate request sucessfully cancelled. [mlc_serve.engine.async_connector] func_name=generate lineno=93 pathname=/opt/dlami/nvme/liteye/mlc-llm/serve/mlc_serve/engine/async_connector.py process=2803754 request_id=cmpl-71e9e27ce9f842108e3e820b1b6d63c8
2024-01-31 20:58:40 [info     ] AsyncEngineConnector.generate removing request from result queue. [mlc_serve.engine.async_connector] func_name=generate lineno=98 pathname=/opt/dlami/nvme/liteye/mlc-llm/serve/mlc_serve/engine/async_connector.py process=2803754 request_id=cmpl-71e9e27ce9f842108e3e820b1b6d63c8

sunggg · 2024-01-31T21:06:47Z

Hmm interesting. That is pretty much what I did. I was printing the all the token_ids and saw it kept printing with new tokens even after cancellation. Is it possible that the request is cancelled correctly but somehow keep printing from the buffer?

yelite · 2024-02-01T16:53:05Z

Is it possible that the request is cancelled correctly but somehow keep printing from the buffer?

No it's not. If it's cancelled correctly, it shouldn't be able to print new tokens.

Can you show me your steps to trigger the problem? Then I can try to reproduce it on my side.

sunggg added the bug Something isn't working label Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] User side cancellation #179

[Bug] User side cancellation #179

sunggg commented Jan 30, 2024

yelite commented Jan 31, 2024

sunggg commented Jan 31, 2024

yelite commented Feb 1, 2024

[Bug] User side cancellation #179

[Bug] User side cancellation #179

Comments

sunggg commented Jan 30, 2024

yelite commented Jan 31, 2024

sunggg commented Jan 31, 2024

yelite commented Feb 1, 2024