Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Add SubStream type to get a slice of a larger stream #83340

Open
jozkee opened this issue Mar 13, 2023 · 4 comments
Open

[API Proposal]: Add SubStream type to get a slice of a larger stream #83340

jozkee opened this issue Mar 13, 2023 · 4 comments
Assignees
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO
Milestone

Comments

@jozkee
Copy link
Member

jozkee commented Mar 13, 2023

Background and motivation

It is generally useful to have a Stream wrapper for read/write a specific part of a larger Stream, and it would be beneficial to provide an implementation that can be easily used and improved.

There has been multiple request asking for such a type, see https://stackoverflow.com/a/30494628/4231460.
It also came up on the Twitter survey that @adamsitnik posted: #58216 (comment).

Additionally, we currently have similar internal implementations that read a portion of a stream.

API Proposal

namespace System.IO
{
    public sealed class SubStream : Stream
    {
        public SubStream(Stream baseStream, long length, bool canWrite) { }

        public override bool CanRead { get { } }
        public override bool CanSeek { get { } }
        public override bool CanWrite { get { } }
        public override long Length { get { } }
        public override long Position { get { } set { } }
        public override void Flush() { }
        public override int Read(byte[] buffer, int offset, int count) { }
        public override int Read(Span<byte> buffer) { }
        public override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) { }
        public override async ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default) { }
        public override int ReadByte() { }
        public override long Seek(long offset, SeekOrigin origin) { }
        public override void SetLength(long value) { }
        public override void Write(byte[] buffer, int offset, int count) { }
        public override void Write(ReadOnlySpan<byte> buffer) { }
        public override Task WriteAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) { }
        public override ValueTask WriteAsync(ReadOnlyMemory<byte> buffer, CancellationToken cancellationToken = default) { }
        public override void WriteByte(byte value) { }
    }
}

Notes

I considered having a bool leaveOpen flag but it seems to me that most of the time you create a substream, you are not done working with the base stream and hence, you will almost never use leaveOpen = false.

Having the canWrite parameter is useful because you may want to pass the SubStream as read-only where the baseStream supports writing. This is how the SubReadStream in IO.Compression currently works.

I sketched a simple implementation of this proposal and used it to replace internal SubReadStream on System.Formats.Tar and System.IO.Compression. While it could be feasible to completely replace the internal types with this one, they have implementation details that will need to be removed if we want to use this more-generic type. e.g:

  • SubReadStream from Tar has HasReachedEnd property and it throws EndOfStreamException when superStream advances to next entry. This is an implementation detail we don't want to expose in this API.
  • SubReadStream from Zip throws NotSupportedException when attempting to write/flush/seek the SubStream (the base stream does support it). Throwing on write can be addressed by the bool canWrite param in the ctor but I'm not sure if we want to expose bool canSeek and bool canFlush as well.

API Usage

Pass a subset of your current stream without buffereing.

var subStream = new SubStream(baseStream, length: 100, canWrite: true);
DoSomethingWith(subStream);
// If you want to start at an specific position you need to move it yourself
baseStream.Position += 100;
var subStream = new MySubStream(baseStream, length: 200, canWrite: true);
DoSomethingWith(subStream);

Alternative Design 1

Use a factory e.g:

public static class StreamFactory
{
    // alternative name: Slice
    public static Stream CreateSubStream(Stream baseStream, long length, bool canWrite) { }
}

Alternative Design 2

Having a length parameter for a DelegatingStream.ctor could be an alternative to having a fully-fledged SubStream implementation but that would overload (even more) the proposal in #43139.

Risks

I'm not sure how a Seek operation on a SubStream should work. The guideline in the Stream docs is "Seeking to any location beyond the length of the stream is supported". But it appears to me that allowing seeking past the specified lenght may result in a pit of failure, but let me know what you think.

@jozkee jozkee added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Mar 13, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Mar 13, 2023
@ghost
Copy link

ghost commented Mar 13, 2023

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

It is generally useful to have a Stream wrapper for read/write a specific part of a larger Stream, and it would be beneficial to provide an implementation that can be easily used and improved.

There has been multiple request asking for such a type, see https://stackoverflow.com/a/30494628/4231460.
It also came up on the Twitter survey that @adamsitnik posted: #58216 (comment).

Additionally, we currently have similar internal implementations that read a portion of a stream.

API Proposal

namespace System.IO
{
    public sealed class SubStream : Stream
    {
        public SubStream(Stream baseStream, long length, bool canWrite) { }

        public override bool CanRead { get { } }
        public override bool CanSeek { get { } }
        public override bool CanWrite { get { } }
        public override long Length { get { } }
        public override long Position { get { } set { } }
        public override void Flush() => _baseStream.Flush();
        public override int Read(byte[] buffer, int offset, int count) { }
        public override int Read(Span<byte> buffer) { }
        public override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) { }
        public override async ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default) { }
        public override int ReadByte() { }
        public override long Seek(long offset, SeekOrigin origin) { }
        public override void SetLength(long value) { }
        public override void Write(byte[] buffer, int offset, int count) { }
        public override void Write(ReadOnlySpan<byte> buffer) { }
        public override Task WriteAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken) { }
        public override ValueTask WriteAsync(ReadOnlyMemory<byte> buffer, CancellationToken cancellationToken = default) { }
        public override void WriteByte(byte value) { }
    }
}

Notes

I considered having a bool leaveOpen flag but it seems to me that most of the time you create a substream, you are not done working with the base stream and hence, you will almost never use leaveOpen = false.

Having the canWrite parameter is useful because you may want to pass the SubStream as read-only where the baseStream supports writing. This is how the SubReadStream in IO.Compression currently works.

I sketched a simple implementation of this proposal and used it to replace internal SubReadStream on System.Formats.Tar and System.IO.Compression. While it could be feasible to completely replace the internal types with this one, they have implementation details that will need to be removed if we want to use this more-generic type. e.g:

  • SubReadStream from Tar has HasReachedEnd property and it throws EndOfStreamException when superStream advances to next entry. This is an implementation detail we don't want to expose in this API.
  • SubReadStream from Zip throws NotSupportedException when attempting to write/flush/seek the SubStream (the base stream does support it). Throwing on write can be addressed by the bool canWrite param in the ctor but I'm not sure if we want to expose bool canSeek and bool canFlush as well.

API Usage

Pass a subset of your current stream without buffereing.

var subStream = new SubStream(baseStream, length: 100, canWrite: true);
DoSomethingWith(subStream);

Alternative Designs

Having a length parameter for a DelegatingStream.ctor could be an alternative to having a fully-fledged SubStream implementation but that would overload (even more) the proposal in #43139.

Risks

N/A

Author: Jozkee
Assignees: -
Labels:

api-suggestion, area-System.IO

Milestone: -

@jozkee jozkee added this to the 8.0.0 milestone Mar 13, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Mar 13, 2023
@jozkee jozkee self-assigned this Mar 13, 2023
@danmoseley
Copy link
Member

Would it make sense for the constructor to take an offset rather than seeking manually?

Would the wrapper stream verify before each operation that the offset of the underlying stream didn't get moved underneath it? Would that throw?

@jozkee
Copy link
Member Author

jozkee commented Mar 13, 2023

Would it make sense for the constructor to take an offset rather than seeking manually?

If we do that, that .ctor would be useless for unseekable streams. Not saying that's a bad thing, though.

Would the wrapper stream verify before each operation that the offset of the underlying stream didn't get moved underneath it?

We could do it or we couldn't, even perhaps we could have a bool arg (ensurePosition) that specifies which behavior a user wants.
I looked at BufferedStream to see how it reacts to the base stream moving, and it does not validate it. Same with other streams. Although, the internal SubReadStream used in System.IO.Compression does adjust the position each time Read is called.

if (_superStream.Position != _positionInSuperStream)
_superStream.Seek(_positionInSuperStream, SeekOrigin.Begin);

@jozkee jozkee changed the title [API Proposal]: SubStream [API Proposal]: Add SubStream type to get a slice of a larger stream Mar 14, 2023
@jeffhandley
Copy link
Member

We're moving this out to Future and it will be considered again for .NET 9.

@jeffhandley jeffhandley modified the milestones: 8.0.0, Future May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO
Projects
None yet
Development

No branches or pull requests

3 participants