Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement rate limiting #256

Open
2 tasks
Teneroy opened this issue Nov 13, 2024 · 1 comment
Open
2 tasks

Implement rate limiting #256

Teneroy opened this issue Nov 13, 2024 · 1 comment

Comments

@Teneroy
Copy link
Collaborator

Teneroy commented Nov 13, 2024

Reasoning and Description:
In order to prevent high spendings and hitting AI core rate limiting, we need to introduce our own rate limiting per cluster which uses our companion.

Tasks:

  1. Make sure our traces in Langfuse have identifiers of clusters which uses our companion
    Whenever we receive a request we need to:
  2. Pull number of tokens used by cluster from 00:00UTC to 23:59UTC (possible through Langfuse API) or within 24h period(whatever is easier).
  3. If the number of tokens is higher than a constant we set, return a message to the user that they over consumed their token usage and should come back after 23:59UTC or after X minutes(if we go with 24h approach)

Acceptance criteria:

  • Traces are being identified per cluster
  • Total token usage is being pulled within agreed time range and compared against the constant
@Teneroy
Copy link
Collaborator Author

Teneroy commented Nov 13, 2024

https://api.reference.langfuse.com/#get-/api/public/observations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant