Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: make it easier to create custom Stream, TextReader, and TextWriter implementations #64958

Closed
madelson opened this issue Feb 8, 2022 · 6 comments
Assignees
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Milestone

Comments

@madelson
Copy link
Contributor

madelson commented Feb 8, 2022

Background and motivation

Deriving confidently from Stream, TextReader, and TextWriter today is fairly difficult and it is easy to shoot yourself in the foot performance-wise. There are many derivations of these classes throughout the framework and in supporting libraries; not all of them do the "right" thing all the time (e. g. dotnet/aspnetcore#38210).

For example, if we derive from Stream we are presented with the following abstract methods to override:

  • CanRead/CanWrite/CanSeek
  • Length/SetLength
  • Position
  • Flush
  • Read(byte[], int, int)
  • Write(byte[], int, int)
  • Seek(long, SeekOrigin)

First off, it seems strange that we need to implement Seek; why can't that have a default implementation based on Position and (for SeekOrigin.End) Length?

Implementing Read(byte[], int, int) and/or Write(byte[], int, int) is pretty straightforward, but we're immediately left with some performance "traps" that will make our stream perform less well than it could.

  • ReadByte/WriteByte allocate a new byte array on every call
  • Read(Span<byte>)/Write(ReadOnlySpan<byte>) which seem like the "right" thing to call these days actually copy to or from a rented buffer and call the byte[] versions. Same thing with the async versions that use Memory<byte>.
  • ReadAsync/WriteAsync just runs the synchronous method in another thread. It also goes through the old BeginRead/BeginWrite methods which are complicated; not sure if they have more overhead than a task-based async implementation would.
  • FlushAsync() also just calls the sync method in another thread.
  • Let's say we want to fix the async issue by overwriting ReadAsync and/or WriteAsync. If we override just the byte[], int, int versions then the Memory versions still copy and any old code that actually calls BeginRead/BeginWrite will still follow the sync-in-another-thread path.

TextWriter/TextReader have mostly all the same issues, but are even more complex because they have more methods to work with. The fact that these route all methods through the single-char methods Write(char) and Read() feels strange, but maybe there is a good reason for this.

I think I understand why things are the way they are (in my understanding it is to preserve compat for people who have overridden these types in the past), but it would be great if we could do better. If this were done in a way that the framework could make use of it, then that would potentially eliminate a lot of fairly redundant code across lots of classes and libraries.

API Proposal

My proposal would be to introduce 3 new classes StreamBase, TextReaderBase, and TextWriterBase to provide "modern" alternatives to the existing base classes. These classes would derive from the existing classes but route between their methods in a different way to simplify derivation:

namespace System.IO
{
    public abstract class StreamBase : Stream
    {
        protected abstract int ReadInternal(Span<byte> span); // name TBD; e.g. could be ReadCore
        protected abstract void WriteInternal(ReadOnlySpan<byte> span);

        // Overrides abstract Read(byte[], int, int), Write(byte[], int, int) to call ReadInternal/WriteInternal

       // Overrides abstract Seek(long, SeekOrigin) to leverage Position and Length

       // Overrides Length and SetLength(long) to throw NotSupportedException by default on the premise that
       // relatively few streams support these "in the wild" (is this accurrate?)

        // Overrides virtual methods from base class so that:
        // * All sync methods flow through ReadInternal/WriteInternal
        // * All non-Memory async methods flow through the Memory async methods
        // * The memory async methods call ReadInternal/WriteInternal/Flush synchronously and return a completed Task

        // Need a better name for this. The idea here is to allow framework implementations that were falling back to the
        // "call sync method from a background thread" pattern for compat. Only needed if (a) such implementations
        // exist and (b) there is interest in them leveraging StreamBase.
        protected virtual bool UseThreadPoolForAsyncOverSyncMethods => false;
    }

    public abstract class TextReaderBase : TextReader
    {
        // ignoring the details for now; can revisit if there is interest in the overall idea
    }

    public abstract class TextWriterBase : TextWriter
    {
        // ignoring the details for now; can revisit if there is interest in the overall idea
    }
}

API Usage

class MyStream : StreamBase
{
    public override bool CanRead => true;
    public override bool CanWrite => true;
    public override bool CanSeek => false;
    
    public override long Position { get => throw new NotImplementedException(); set => throw new NotImplementedException(); }

    public override void Flush() { /* sync flush impl */ }

    protected override int ReadInternal(Span<byte> span) { /* sync read impl */ }
    protected override void WriteInternal(ReadOnlySpan<byte> span) { /* sync write impl */ }

    // optional

    public override Task FlushAsync(CancellationToken cancellationToken) { /* async flush impl */ }
    public override ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default) { /* async read impl */ }
    public override ValueTask WriteAsync(ReadOnlyMemory<byte> buffer, CancellationToken cancellationToken = default) { /* async write impl */ }
}

Alternative Designs

An alternate path would be to re-route the calls between methods and provide more virtual overrides in the existing base classes. For compat, this would be opt-in behavior via a constructor parameter or virtual property. The upside is that we avoid introducing new types. Another upside is that framework classes could use the new behavior easily and leverage the toggle to swap back to the old behavior whenever they are derived from. The downside is that it may be difficult for consumers to understand how to leverage the new functionality.

Some other considerations for the general design:

  • We could avoid introducing the Internal methods and just use the existing virtual methods. The downside of this is that the abstract methods guide implementers towards doing the right thing which is nice.
  • A common Stream behavior is for CanRead/CanWrite to start returning false after disposal; we could provide support for this.
  • Right now I have async behavior as "optional", but in today's world one could make an argument that Stream authors should always consider this an make it required instead.

A problem that remains in the current design is that if you do want seek or timeout behavior you have to know about and implement a specific subset of the optional virtual methods. We could unify those into a single method like so:

class StreamBase
{
    // CanSeek, Position, Seek, Length, and SetLength all have sealed implementations which delegate to this.Seeker

    protected virtual ISeeker Seeker => throw new NotSupportedException();

    public interface ISeeker
    {
        bool CanSeek { get; }
        long Position { get; set; }
        long Length { get; set; }      
    }
}

Risks

Fragmentation might increase complexity/confusion instead of decreasing it, although docs could help with this.

@madelson madelson added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Feb 8, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.IO untriaged New issue has not been triaged by the area owner labels Feb 8, 2022
@ghost
Copy link

ghost commented Feb 8, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

Deriving confidently from Stream, TextReader, and TextWriter today is fairly difficult and it is easy to shoot yourself in the foot performance-wise. There are many derivations of these classes throughout the framework and in supporting libraries; not all of them do the "right" thing all the time (e. g. dotnet/aspnetcore#38210).

For example, if we derive from Stream we are presented with the following abstract methods to override:

  • CanRead/CanWrite/CanSeek
  • Length/SetLength
  • Position
  • Flush
  • Read(byte[], int, int)
  • Write(byte[], int, int)
  • Seek(long, SeekOrigin)

First off, it seems strange that we need to implement Seek; why can't that have a default implementation based on Position and (for SeekOrigin.End) Length?

Implementing Read(byte[], int, int) and/or Write(byte[], int, int) is pretty straightforward, but we're immediately left with some performance "traps" that will make our stream perform less well than it could.

  • ReadByte/WriteByte allocate a new byte array on every call
  • Read(Span<byte>)/Write(ReadOnlySpan<byte>) which seem like the "right" thing to call these days actually copy to or from a rented buffer and call the byte[] versions. Same thing with the async versions that use Memory<byte>.
  • ReadAsync/WriteAsync just runs the synchronous method in another thread. It also goes through the old BeginRead/BeginWrite methods which are complicated; not sure if they have more overhead than a task-based async implementation would.
  • FlushAsync() also just calls the sync method in another thread.
  • Let's say we want to fix the async issue by overwriting ReadAsync and/or WriteAsync. If we override just the byte[], int, int versions then the Memory versions still copy and any old code that actually calls BeginRead/BeginWrite will still follow the sync-in-another-thread path.

TextWriter/TextReader have mostly all the same issues, but are even more complex because they have more methods to work with. The fact that these route all methods through the single-char methods Write(char) and Read() feels strange, but maybe there is a good reason for this.

I understand why things are the way they are (in my understanding it is to preserve compat for people who have overridden these types in the past), but it would be great if we could do better. If this were done in a way that the framework could make use of it, then that would potentially eliminate a lot of fairly redundant code across lots of classes and libraries.

API Proposal

My proposal would be to introduce 3 new classes StreamBase, TextReaderBase, and TextWriterBase to provide "modern" alternatives to the existing base classes. These classes would derive from the existing classes but route between their methods in a different way to simplify derivation:

namespace System.IO
{
    public abstract class StreamBase : Stream
    {
        protected abstract int ReadInternal(Span<byte> span);
        protected abstract void WriteInternal(ReadOnlySpan<byte> span);

        // Overrides abstract Read(byte[], int, int), Write(byte[], int, int) to call ReadInternal/WriteInternal

       // Overrides abstract Seek(long, SeekOrigin) to leverage Position and Length

       // Overrides Length and SetLength(long) to throw NotSupportedException by default on the premise that
       // relatively few streams support these "in the wild" (is this accurrate?)

        // Overrides virtual methods from base class so that:
        // * All sync methods flow through ReadInternal/WriteInternal
        // * All non-Memory async methods flow through the Memory async methods
        // * The memory async methods call ReadInternal/WriteInternal/Flush synchronously and return a completed Task

        // Need a better name for this. The idea here is to allow framework implementations that were falling back to the
        // "call sync method from a background thread" pattern for compat. Only needed if (a) such implementations
        // exist and (b) there is interest in them leveraging StreamBase.
        protected virtual bool UseThreadPoolForAsyncOverSyncMethods => false;
    }

    public abstract class TextReaderBase : TextReader
    {
        // ignoring the details for now; can revisit if there is interest in the overall idea
    }

    public abstract class TextWriterBase : TextWriter
    {
        // ignoring the details for now; can revisit if there is interest in the overall idea
    }
}

API Usage

class MyStream : StreamBase
{
    public override bool CanRead => true;
    public override bool CanWrite => true;
    public override bool CanSeek => false;
    
    public override long Position { get => throw new NotImplementedException(); set => throw new NotImplementedException(); }

    public override void Flush() { /* sync flush impl */ }

    protected override int ReadInternal(Span<byte> span) { /* sync read impl */ }
    protected override void WriteInternal(ReadOnlySpan<byte> span) { /* sync write impl */ }

    // optional

    public override Task FlushAsync(CancellationToken cancellationToken) { /* async flush impl */ }
    public override ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default) { /* async read impl */ }
   public override ValueTask WriteAsync(ReadOnlyMemory<byte> buffer, CancellationToken cancellationToken = default) { /* async write impl */ }
}

Alternative Designs

An alternate path would be to re-route the calls between methods and provide more virtual overrides in the existing base classes. For compat, this would be opt-in behavior via a constructor parameter or virtual property. The upside is that we avoid introducing new types. Another upside is that framework classes could use the new behavior easily and leverage the toggle to swap back to the old behavior whenever they are derived from. The downside is that it may be difficult for consumers to understand how to leverage the new functionality.

Some other considerations for the general design:

  • We could avoid introducing the Internal methods and just use the existing virtual methods. The downside of this is that the abstract methods guide implementers towards doing the right thing which is nice.
  • A common Stream behavior is for CanRead/CanWrite to start returning false after disposal; we could provide support for this.

Risks

Fragmentation might increase complexity/confusion instead of decreasing it, although docs could help with this.

Author: madelson
Assignees: -
Labels:

api-suggestion, area-System.IO, untriaged

Milestone: -

@DaZombieKiller
Copy link
Contributor

Related: #58216

@jozkee jozkee removed the untriaged New issue has not been triaged by the area owner label Mar 10, 2022
@jozkee jozkee added this to the 7.0.0 milestone Mar 10, 2022
@jozkee jozkee added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Mar 10, 2022
@jeffhandley jeffhandley modified the milestones: 7.0.0, Future Jul 9, 2022
@jozkee
Copy link
Member

jozkee commented Dec 6, 2022

today is fairly difficult and it is easy to shoot yourself in the foot performance-wise.

@madelson I wrote a proposal with the same goal as yours for Stream #79261, although they are not quite similar, I thought you may be interested.

@madelson
Copy link
Contributor Author

madelson commented Dec 6, 2022

although they are not quite similar

@jozkee they seem pretty similar to me. What do you see as the key differences? Could be interesting to hash those out here and see if we can arrive at one design proposal.

@jozkee
Copy link
Member

jozkee commented Dec 6, 2022

On the main proposal, one difference is the Template Method Pattern which says that we can have the existing methods doing typical validations for arguments.

In the alternative proposals, you are proposing that the old methods funnel into the newer methods but only if certain opt-in flag is specified, which I think is a good idea, but it is less enforceable.

On my issue, the alternative design is still about a new Stream class with new abstracts and the idea of having versioning-like semantics as part of the class name i.e: NewStream2 : NewStream : Stream.

Maybe we can factor-out the main proposals and keep alternative proposals separate.

@madelson
Copy link
Contributor Author

madelson commented Jan 7, 2023

Would it make sense to close this issue in favor of @jozkee's new proposal?

@madelson madelson closed this as completed Jan 7, 2023
@madelson madelson reopened this Jan 7, 2023
@jozkee jozkee closed this as not planned Won't fix, can't repro, duplicate, stale Jan 11, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Feb 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Projects
None yet
Development

No branches or pull requests

4 participants