Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClientModel: Change approach to buffering response.Content #41080

Closed
annelo-msft opened this issue Jan 8, 2024 · 3 comments · Fixed by #41772
Closed

ClientModel: Change approach to buffering response.Content #41080

annelo-msft opened this issue Jan 8, 2024 · 3 comments · Fixed by #41772
Assignees
Labels
Azure.Core Client This issue points to a problem in the data-plane of the library. CoreDi 🚀 🔮 System.ClientModel Base Core library
Milestone

Comments

@annelo-msft
Copy link
Member

annelo-msft commented Jan 8, 2024

Problem Description

The APIs we have today in Azure.Core around Response.Content and Response.ContentStream and buffering are not transparent. It is not clear to the end-user which responses are buffered and which are not. The only way for users to discover this is to read the docs for a service method.

High-level proposal

Make a change to the current implementation of the PipelineResponse.Content property such that we no longer throw if the response has not been buffered (i.e. PipelineMessage.BufferResponse has been set to false by the client-author). This means that end-users can always access PipelineResponse.Content and if it is not buffered, it will buffer the response in the response.Content field. In the rare situation where response.Content is larger than what we can fit in memory, the user will either get an IOException or -- since we will have wrapped the live network stream in a ReadTimeoutStream -- the buffering will throw a TaskCanceledException due to the network timeout being applied.

Implementation proposal

(From @KrzysztofCwalina and under investigation per feasibility in #41693 and related PRs)

Could we do the following?

In the case of unbuffered responses:

  1. Service method creates, composes, and sends an instance of HttpMessage.
  2. The transport stashed HttpRequestMessage in the Response
  3. Service method returns Response to the user.
  4. If the user accesses Content first, we read the live stream, buffer into byte[], and return BinaryData wrapping the byte[]. If a subsequent call accesses ContentStream we either throw or return MemoryStream wrapping the byte[]
  5. If the user accesses ContentStream first, any subsequent access to Content or ContentStream will throw InvalidOperationException

In the case of buffered responses:

  1. Service method creates, composes, and sends an instance of HttpMessage.
  2. The transport reads the live stream into a byte[], stashes the byte[] in Response, disposes the HttpRequestMessage
  3. Service method returns Response to the user.
  4. If the caller accesses Content, we return BinaryData wrapping the byte[]
  5. If the caller accesses ContetnStream, we return MemoryStream wrapping the byte[]

Prior notes

See this comment thread for details: #41016 (comment)

This issue tracks the work to change how response buffering works for ClientModel responses. @KrzysztofCwalina would like to optimize for buffered content in the BinaryData response.Content property and not retain the MemoryStream we used to buffer the content in the ContentStream property. In a greenfield scenario, this would be great, but with a lot of code built on top of Azure.Core's response.ContentStream property and the idea that Content throws if ContentStream doesn't hold the buffered MemoryStream, we would likely need to go through breaking change review with the arch board to assess the feasibility of this change.

A proposed implementation that helps identify the CI tests that are affected by this change can be found here: #41047

@annelo-msft
Copy link
Member Author

annelo-msft commented Feb 1, 2024

Initial notes from work on #41693 and annelo-msft#13:

  • The proposal assumes we have moved buffering into the transport (tracked by ClientModel: Investigate feasibility of moving response buffering into the transport #41198)
  • Buffering the network stream in the response.Content property means that we must do this synchronously. We may be able to do this with the cancellation token on the message if we store it as an internal property on the response.
  • We may not be able to provide a new implementation for response.ContentStream in the base type PipelineResponse, because this is an abstract property in Azure.Core's Response, which means all existing implementations have overridden any implementation we might provide there if we changed this to virtual. So, a lazily-created stream need to be created in the derived type.
  • It would be great if we could have the simplified story around disposing the streams -- if we've buffered the response (message.BufferResponse was true), the transport called Dispose on the network stream; if we didn't buffer the response -- we either Dispose the network stream when the user calls response.Content or it is the responsibility of the end-user to dispose response.ContentStream after reading from it (the streaming APIs where we set message.BufferRespose to false are uncommon and low-level/advanced). This is complicated by the public method ExtractResponseContent on HttpMessage and the assumption that that will be called if/when we pass the ContentStream to the end user in a Response from a service method. Today, if that is not called, the response is expected to dispose the ContentStream if it is not buffered in its own Dispose method.

I am still working on an end-to-end solution to complete the feasibility assessment.

@annelo-msft
Copy link
Member Author

annelo-msft commented Feb 7, 2024

Response.Content

Problem

Azure.Core-based client protocol methods return Response. Without reading the docs, a caller of such a method may not know whether the response content has been buffered or not. Today, in Azure.Core, if the response content has not been buffered, accessing the Response.Content property will throw an InvalidOperationException.

The problem with throwing from the Response.Content property is that it is not idiomatic to .NET. In .NET, APIs are either fully polymorphic (i.e. they just work), or we supply "testers" to know in advance whether calling the API will throw.

Since we plan to ship PipelineResponse in the System.ClientModel namespace, we must make the type align with design principles the BCL team uses for .NET types in System.* namespaces. When we integrate Azure.Core with System.ClientModel, and PipelineResponse becomes the base type of Azure.Core's Response type, changes we make to the Content property will take effect in Azure.Core as well.

We discuss proposals to address this problem below and propose a solution based on this discussion at the end of this comment.

Proposals

1. Make Response.Content "always work"

In this proposal:

  • For buffered responses

    • If the caller accesses Content, we return a BinaryData wrapping the buffered byte[] content
    • If the caller accesses ContentStream, we return a MemoryStream wrapping the buffered byte[] content
  • For unbuffered responses

    • If the caller accesses Content first, we read the live stream, buffer into byte[] and return a BinaryData wrapping the byte[] content
      • If a subsequent call accesses ContentStream, we return a MemoryStream wrapping the byte[] content
    • If the caller accesses ContentStream first, we return the live stream.
      • If a subsequent call accesses ContentStream again, we return the same stream, but if it was a network stream, it can no longer be read from.
      • If a subsequent call accesses Content, the Content property will throw InvalidOperationException because we cannot buffer the content after the network stream was read.

Advantages:

  • In most cases, callers of protocol methods can access response.Content and it will just work. This is idiomatic to .NET.

Disadvantages:

  • To read the bytes from the network stream in the Content property in the unbuffered case, we must read the stream synchronously.
  • If the network stream is too large to fit in memory in the unbuffered case, we will throw an IOException. This is a different exception type than we throw today in Azure.Core.

2. Continue to throw from Response.Content

  • For buffered responses

    • If the caller accesses Content, we return a BinaryData wrapping the byte[] content
    • If the caller accesses ContentStream, we return a MemoryStream wrapping the byte[] content
  • For unbuffered responses

    • If the caller accesses Content, throw an InvalidOperationException
    • If the caller accesses ContentStream, we return the live stream.

Advantages:

  • We do not try to synchronously read from a network stream, or risk loading too much data into memory.

Disadvantages:

  • We ship a type in System.ClientModel that is not idiomatic to .NET because it is not fully polymorphic and it does not provide a tester.

3. Add a tester to Response

In this approach, we use either option 1 or option 2 above, and we add a property IsBuffered to PipelineResponse that is inherited by Response. Callers of protocol methods who are uncertain whether to call Content or ContentStream can call IsBuffered to understand whether it is safe to access Content.

Advantages:

  • The type is idiomatic to .NET in that it provides a "tester" API to prevent accessing a member on the type that might throw an exception

Disadvantages:

  • End users might think they always have to check IsBuffered before accessing the Content property when it is uncommon that responses are unbuffered. This could make call-sites overly complex.

4. Introduce a StreamingResponse type

In this approach, we introduce a StreamingPipelineResponse type with only the ContentStream property, and make it the base type of the Response hierarchy. Azure.Core-based clients would return StreamingPipelineResponse from protocol methods. ClientModel-based clients would continue to return ClientResult from protocol methods.

This would look as follows:

public abstract class StreamingPipelineResponse {
    Stream? ContentStream { get; set; }
}

public abstract class PipelineResponse : StreamingPipelineResponse { 

    // Inherits ContentStream
    BinaryData Content { get; }
}

public abstract class Response : PipelineResponse { 
    // Inherits both ContentStream and Content
}

Azure.Core protocol methods would need to return StreamingPipelineResponse, which would have different header types than Azure.Core Response header types, which could lead to confusion.

It may also not work with protocol methods on ClientModel-based clients. This is because ClientModel-based clients return ClientResult from protocol methods, not PipelineResponse. ClientResult.GetRawResponse() would have to return the base type -- i.e. a StreamingPipelineResponse, which the user would have to down-cast to a PipelineResponse for the majority of use cases.

Advantages:

  • Elegant solution for Azure.Core-based libraries

Disadvantages:

  • Azure.Core users must use a ClientModel response type with this class of protocol methods, which may be confusing
  • Unclear what Response<T>.GetRawResponse() would return in this case -- we would still have a Content property on Response to deal with
  • Does not solve discoverability for ClientModel-based clients, and makes use of protocol methods much harder for the 99% use case

Recommendation

I recommend going with option 1 for the following reasons:

  • If users run into problems with this approach, we simply add the tester described in option 3, and we are done.
  • The instance of protocol methods for streaming APIs in clients is incredibly rare. We only provide these for scenarios where a service wants to expose a web API and needs to stream the response content from one service (like blob storage or container registry) through their service to an end user. Users of these methods are likely to use convenience methods instead of protocol methods, and if they do use protocol methods, they not only need to know to use the lower-level ContentStream API, but also to handle the response returned from the protocol method differently -- i.e. they must dispose the response so that the network stream gets closed. Given this, the likelihood of issues with using the Content property in the vast majority of cases is very small, and if it does truly become a problem for customers, we add a tester as described above.

@annelo-msft
Copy link
Member Author

We settled on this as the public API for PipelineResponse:

public abstract class PipelineResponse
{
    public abstract Stream? ContentStream { get; set; }

    public abstract BinaryData Content { get; }

    public abstract BinaryData ReadContent(CancellationToken cancellationToken = default);

    public abstract ValueTask<BinaryData> ReadContentAsync(CancellationToken cancellationToken = default);
}

The tester provided is message.BufferResponse. This is sufficient for policies to check in order to know whether or not the response was buffered so they don't call ReadContent when holding a live network stream and load too much data into memory. End-users of protocol methods exposing live network streams are expected to read the documentation of the protocol method to know that it exposes a live network stream, and only call ReadContent to buffer it if they expect the response content to be small enough to load into memory.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Azure.Core Client This issue points to a problem in the data-plane of the library. CoreDi 🚀 🔮 System.ClientModel Base Core library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant