-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Token Length Check Before Requests to Avoid Model Limit Exceeding (Fixes #30) #32
base: master
Are you sure you want to change the base?
Add Token Length Check Before Requests to Avoid Model Limit Exceeding (Fixes #30) #32
Conversation
Make sure to adapt the code to trim ALL the messages that are sent to the LLM and Aider. We need to evaluate the number of tokens of that message and ensure it is below the context length of that model. Implement this check before any call to an LLM via API (it might be through different providers like Anthropic, OpenRouter, LiteLLM, etc) or to Aider. |
Aider:Important Updates:
If you require further assistance or specific actions, please let me know! |
Tasks:
|
Aider:Important Updates Summary:
This ensures that the necessary adjustments are made to optimize functionality and maintain system integrity. Please set the required environment variables to enable the disabled features. |
Add the maximum context length whenever they initialize the llm. That is, when they initialize classes ChatOpenAI or ChatAnthropic. |
Aider:Summary of Actions and Changes:
These changes will help to ensure the system operates smoothly and aligns with the current API limits and configurations. Please proceed to implement these modifications. |
Problem: I get rate limited for sending a lot of tokens in a given set of minutes. The idea to solve this would be to limit the number of requests I send in a given timeframe. That is, the strategy you were following of limiting max_tokens was not correct. Task: Implement the count of tokens in a given timeframe so that we know if we should wait before sending a request to a provider. |
Aider:Configuration Update for AI Coding Assistant
Feel free to reach out if you need further information or clarification on any of the updates! |
Can you implement RateLimiter so that before every call it checks if we have reached the token limit and sleeps until we can send tokens again? |
Aider:Summary of Recent Changes and Actions
For further actions, please ensure the API keys are set correctly to enable additional features. |
return 0 | ||
|
||
# Estimate tokens using byte length heuristic | ||
estimated_tokens = len(text.encode('utf-8')) // 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not how you get token count, you're supposed to use a tokenizer like tiktoken
and do something like this:
tokens = tokenizer.encode(text)
token_count = len(tokens)
raise ValueError(f"Unknown context length for model: {model_name}") | ||
|
||
# Validate token handling works for this model | ||
check_message_tokens("", model_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not doing anything, you're counting tokens in an empty string. The point of this pull request is to check tokens and limit them "somehow" as its running in the langgraph loop.
f"Estimated tokens: {estimated_tokens:,}\n" | ||
"Please use interactive mode for large repositories." | ||
) | ||
sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why exit? I don't think were passing entire repo worth of tokens at startup. If we are, just find a way to pass less, should not just exit.
I know this is trying to solve a real issue and these changes were made by a bot, but its currently a really really useless/incorrect merge request. First off, rate limiting logic should be in a separate pull request to checking/minimizing token length. There should be flags to enable/disable this functionality with sensible defaults. This bot put all the checks in the wrong places, and the token counter is completely wrong. I may try to make my own pull request to solve this issue, I've seen token count errors before >20000 for claude as the system is running for 10+ minutes or so. I've noticed relevant files can be removed as its working, not sure what exactly is accumulating in the prompt besides that will have to dig further. |
Should have it fixed it here: #45 |
Pull Request Description
Overview
This pull request addresses Issue #30: Add context length check by implementing a token length check prior to sending requests to the
aider
. The proposed changes aim to prevent exceeding the model's token limit, which has been a significant concern when running requests in headless mode, particularly when dealing with large repositories.Problem Statement
Users have reported encountering token limit errors when utilizing the default anthropic model to solve issues within large repositories. Specifically, requests would hit the model's limit at 300k tokens, leading to unforeseen failures and interruptions in workflow. This feature aims to introduce a safeguard mechanism that checks the estimated token length before making requests, thereby enhancing user experience and reliability.
Changes Made
New Method Implementation:
estimate_repo_tokens()
in theCiaynAgent
class located inra_aid/agents/ciayn_agent.py
.Token Length Check:
aider
requests with the command:ra-aid -m "solve XYZ" --cowboy-mode
Testing and Configuration Considerations
EXPERT_OPENAI_API_KEY
andTAVILY_API_KEY
are configured correctly to utilize the enhanced features within the application.Next Steps
ra-aid
tool concerning token management.Note
If additional configurations are needed or if any further assistance is required regarding the upcoming changes, please feel free to reach out!
Fixes #30