-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure OpenAI: remove SseReader's use of subtly blocking StreamReader.EndOfStream #41844
Conversation
API change check API changes are not detected in this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix
The more general impact is scalability: even though it looks like it's async, it's actually doing synchronous I/O, blocking the thread while waiting for data to arrive rather than allowing the thread to do other things in the meantime. What you cite is also impactful to streaming latency and is how I first noticed an issue: returning one of these IAsyncEnumerables from an ASP.NET minimal API results in that blocking behavior negatively impacting how buffering happens in the implicit JSON serialization, causing the streamed tokens to be batched rather than written to the wire individually. |
@trrwilson I'm trying to find a nightly build including this fix, but it seems this feed from the documentation only carries the beta.13 version without this fix. Could you help me point to where the nightly builds are published? |
@joakimriedel the package feeds seemingly prioritize the beta/alpha tagging in semantic versioning over recency, which I've always found a bit odd. If you check the full list of versions, you'll find the nightly/alpha/dev builds beneath the published beta builds: E.g. here's the latest: |
@trrwilson oh 🤦♂️ thanks a lot, that do make sense though with alpha < beta sort order. |
Problem
StreamReader.EndOfStream
introduces a subtle but important bit of blocking I/O to the project's server-sent event reader that causes the underlyingIAsyncEnumerator
for streaming chat completions to invariantly report a completed state for its MoveNext. This causes issues in downstream consumption scenarios that block on this, e.g. via JsonSerializer:Fix
Rather than use
EndOfStream
, we replace the EoS check with an implicit one performed via a null return by ReadLine*. This produces the desired behavior without inducing the blocking I/O that erroneously makes streaming output appear blocking.fixes #41838 and thanks/credit to @stephentoub for the investigation and fix confirmation.