-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] User side cancellation #179
Comments
This PR changes the GPT-NeoX KV cache creation function to create to full size at the beginning, so no memory allocation will be required when running on the fly.
How is the user side cancellation triggered? When I tried by ctrl-c a running curl command, I can see the cancellation gets processed. script: payload='{
"model": "llama-2",
"messages": [
{
"role": "user",
"content": "Hello! what is the answer to life, the universe, and everything? give me a long answer"
}
],
"max_tokens": 1000,
"stream": true,
"temperature": 1.0,
"top_p": 1,
"presence_penalty": 0,
"frequency_penalty": 0
}'
echo "======="
echo "Request"
echo "======="
echo "$payload" | jq
echo "========"
echo "Response"
echo "========"
curl -s -X 'POST' \
'http://127.0.0.1:8000/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer abc" \
-d "$payload" log:
|
Hmm interesting. That is pretty much what I did. I was printing the all the token_ids and saw it kept printing with new tokens even after cancellation. Is it possible that the request is cancelled correctly but somehow keep printing from the buffer? |
No it's not. If it's cancelled correctly, it shouldn't be able to print new tokens. Can you show me your steps to trigger the problem? Then I can try to reproduce it on my side. |
User side cancellation does not take effect. We also need to log properly when it has been cancelled.
The text was updated successfully, but these errors were encountered: