Specify a "Content-Type" header in response streams so that Cloudflar… #305
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
By default, the upstream "ai" module (https://github.com/vercel/ai/) is returning the response from the Azure OpenAI API with a "text/plain" Content-Type. If your azurechat instance is hosted behind a cloudflared tunnel (maybe just Cloudflare in general, I haven't tested this), the response will be buffered by cloudflare(d) and returned all at once.
Other than not being the expected experience, this can lead to timeouts (cloudflare default = 100s) if the response is long or slow to generate. If the response is streamed, there should be enough data flowing to avoid this timeout.
Cloudflared looks for the Content-Type response header and if it is "text/event-stream" it will know not to buffer the response but to send it immediately to the requestor. Reference.