Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinity-Runpod-Hyperspeed #4

Merged
merged 2 commits into from
May 30, 2024
Merged

Infinity-Runpod-Hyperspeed #4

merged 2 commits into from
May 30, 2024

Conversation

michaelfeil
Copy link
Contributor

@michaelfeil michaelfeil commented May 19, 2024

Docker for testing: michaelf34/runpod-infinity-worker:0.0.4

I recently added multi-model deployment:

Adds:

  • pins the docker image
  • Multiple Model deployment works now nativly
  • Starting up the Async Event Loop once (at the first requests) -> Better performance.
  • No more warmups
  • Model path is cached
  • private models run by setting HF_TOKEN
  • Env variables are padded with ; for convenience
  • Optimum (Onnx) / CTranslate2 should also work, but are slightly less performance.
  • fp8 inference is supported, if you rent a L40s or a H100, or MI300x+. Needs nvidia compute capability sm>=89.

Something that could be useful:

  • Each engine has a queue. The .embed adds it to this queue. To handle backpressure, maybe better reject the requests to be added, and give the runpod-serverless runtime the opportunity to retry, potentially hitting a new worker, or scaling to more workers.
  • Something useful would be a "query" bypass -> potentially spawning a second model duplicate on CPU only, that can handle quick queries (that are latency sensitive). Let me know if this is a useful feature, and I try to prioritize this feature.

@michaelfeil michaelfeil mentioned this pull request May 19, 2024
@michaelfeil
Copy link
Contributor Author

@alpayariyak Ready for review / merge.

@alpayariyak
Copy link
Contributor

Incredible work, thank you so much @michaelfeil! Will review shortly

@michaelfeil
Copy link
Contributor Author

@alpayariyak Sorry for pinging, but it would great to merge this PR as is - and add any additional features if needed at a later point in time to not overload this PR

@alpayariyak alpayariyak merged commit f40926b into runpod-workers:main May 30, 2024
1 of 3 checks passed
@alpayariyak
Copy link
Contributor

Hey @michaelfeil, lmk if there's anything you'd like to see before I cut an official release, but should be all good!

@michaelfeil michaelfeil deleted the runpod-hyperspeed branch May 30, 2024 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants