Azure Embedding Quota Limit #936

danieldekay · 2024-10-21T13:58:34Z

Describe the bug
I am running a detailed report with Azure Openai, and am hitting quota limits. While I have a rate limit activated of 500k tokens per minute, it seems to still throw an error and not handle the throttling request well.

openai.RateLimitError: Error code: 429 - {'error': {'code': '429', 'message': 'Requests to the Embeddings_Create Operation under Azure OpenAI API version 2024-02-15-preview have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 86400 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.'}}

Of course the error message should be false, as my rate limits are not per day, and waiting 24h is not an option.

Expected behavior

The log to the user (e.g. websocket) should indicate that there is throttling in place.
The embedding should resume after a while, or indicate what the user can do instead.

The text was updated successfully, but these errors were encountered:

roninio · 2024-10-27T12:17:17Z

i have the same issue.

ElishaKay · 2024-10-30T14:59:54Z

Fair point, we'll have to think how to make this smoother.

A) @danieldekay, is it the same docs that you're running reports on?

Have a look at this PR: "Documents, crawled urls, and website will be chunked and loaded to the inputted vector store if vector_store is not None."

#838

Meaning, if you run GPTR with the same Langchain vectorstore, perhaps it will cut down the embeddings processes.

B) the "cooling off" feature is also a good idea. Did you mention somewhere that there's a Langchain method we can leverage to get the required "cool off" period?

Once we have that, we can go about adding the websocket message. Adding an exception handler block would also be a good first step which publishes a websocket message to the frontend

danieldekay · 2024-10-30T15:07:19Z

@ElishaKay - it's a standard web research report based on Bing.

Langchain has support for a rate limiter:
https://python.langchain.com/docs/how_to/chat_model_rate_limiting/

maybe that is also an option
https://www.perplexity.ai/search/when-i-am-embedding-documents-zjPsfHmgRk.KVOIf4xXaQQ#0

ElishaKay · 2024-10-31T07:52:09Z

Awesome.

Adding to the resilience channel on Discord.

For anyone reading who hasn't joined the Discord, Join here to access the above link

roninio · 2024-10-31T08:12:19Z

I solved the issue by
_embeddings = AzureOpenAIEmbeddings(
model=model,
timeout=60,
chunk_size=1000,
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
openai_api_key=os.environ["AZURE_OPENAI_API_KEY"],
openai_api_version=os.environ["AZURE_OPENAI_API_VERSION"],
**embdding_kwargs,

and changing to azure_openai:text-embedding-3-large

If you think this is correct solution i can add a pull request
The suggested solution (chat_model_rate_limiting) is not working for AzureOpenAIEmbeddings in the current version of Langhian. https://python.langchain.com/docs/how_to/chat_model_rate_limiting/

Youbiquitous · 2024-11-08T12:18:17Z

Has someone solved this?

ElishaKay · 2024-11-09T19:54:16Z

Sure @roninio,

Green light for the PR - maybe we should also set a default azure embedding model in the config?

There's a good chance this is also a cause of a problem for the Open_AI API - i.e. that we should upgrade the embedding model.

Sounds like we should edit that file to:

match os.environ["EMBEDDING_PROVIDER"]:
    case "openai":
        self.embedding_model = "text-embedding-3-large"
    case "azure_openai":
        self.embedding_model = "text-embedding-3-large"

roninio · 2024-11-11T07:42:49Z

Hi created a pull request #979
in the pull request I also updated documentation regarding Azure

ElishaKay mentioned this issue Nov 11, 2024

Azure embedding quota limit #979

Merged

ElishaKay closed this as completed Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Azure Embedding Quota Limit #936

Azure Embedding Quota Limit #936

danieldekay commented Oct 21, 2024

roninio commented Oct 27, 2024

ElishaKay commented Oct 30, 2024 •

edited

Loading

danieldekay commented Oct 30, 2024

ElishaKay commented Oct 31, 2024

roninio commented Oct 31, 2024

Youbiquitous commented Nov 8, 2024

ElishaKay commented Nov 9, 2024 •

edited

Loading

roninio commented Nov 11, 2024 •

edited

Loading

Azure Embedding Quota Limit #936

Azure Embedding Quota Limit #936

Comments

danieldekay commented Oct 21, 2024

roninio commented Oct 27, 2024

ElishaKay commented Oct 30, 2024 • edited Loading

danieldekay commented Oct 30, 2024

ElishaKay commented Oct 31, 2024

roninio commented Oct 31, 2024

Youbiquitous commented Nov 8, 2024

ElishaKay commented Nov 9, 2024 • edited Loading

roninio commented Nov 11, 2024 • edited Loading

ElishaKay commented Oct 30, 2024 •

edited

Loading

ElishaKay commented Nov 9, 2024 •

edited

Loading

roninio commented Nov 11, 2024 •

edited

Loading