Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] TTL starts counting from the beginning of a request instead of the end #25

Closed
Mushoz opened this issue Dec 8, 2024 · 2 comments
Closed

Comments

@Mushoz
Copy link

Mushoz commented Dec 8, 2024

This bug has only recently come to my attention when I started using shorter TTL values, and using a chatty model (QwQ). But it's very easy to reproduce with a very short TTL (like 10 seconds) and a prompt that will take longer to run than the TTL.

Steps to reproduce:

  1. Set TTL to 10 of a sufficiently large model
  2. Ask the model to tell a story. Make sure it generates a story that takes longer than 10 seconds to generate

Expected outcome:

  1. The model finishes generating the story, and the TTL will then start to count, giving you 10 seconds to ask a followup question

Actual outcome:

  1. llama-swap prints a "!!! Unloading model Qwen2.5-Coder-32B-Instruct-Q4_K_S, TTL of 10 reached." message midway through the generation. Thankfully it does not unload the model while it's still generating.
  2. But it does instantly unload the model after the prompt is done, resulting in reloads of the model if you ask a followup question.

Suggested fix:

  1. Consider the model idle when it finishes processing all requests, and start counting towards the TTL when that happens.
  2. Consider the model busy as soon as a new request comes in. The model is considered busy until it finishes. Only then will the TTL start counting.
@mostlygeek
Copy link
Owner

Thanks for reporting this. I think I know exactly where the issue is. I’ll take a look

@mostlygeek
Copy link
Owner

this should be fixed in v0.1.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants