Proposal: using ArrayPool<byte> in MemoryStream with new APIs #22428

stephentoub · 2017-06-22T18:16:33Z

AB#1244354
When not constructed with a specific byte[], MemoryStream allocates byte[]s every time it needs to grow. It would be tempting to just change the implementation to use ArrayPool<byte>.Shared.Rent to get that array, but this is problematic for a few reasons, mostly to do with existing code:

It's fairly common to not dispose MemoryStreams, as currently the Dispose is effectively a nop. But with wrapping buffers from ArrayPool<byte>, it's important to release the currently used buffer back to the pool when the stream is disposed. And it would be expensive to make MemoryStream finalizable to deal with this.
MemoryStream by default allows the buffers it creates to be exposed through its TryGetBuffer method. That means folks today could be taking and using these buffers, potentially even after the MemoryStream is disposed, which could lead to code using the array after the array is returned to the pool and potentially used by someone else.

One solution would just be to introduce a new Stream type, e.g.

namespace System.IO
{
    public sealed class ArrayPoolStream : Stream
    {
        public ArrayPoolStream() : this(ArrayPool<byte>.Shared) { }
        public ArrayPoolStream(ArrayPool<byte> pool);
        ~ArrayPoolStream();

        public ArraySegment<byte> DangerousGetArray();

        ... // override all the relevant members on Stream
    }
}

That has its own downsides, though. In particular, there's something nice about this just being a part of MemoryStream, being easily used in existing code that uses MemoryStreams, etc. Another option would be to just plumb this into MemoryStream, but in an opt-in manner, and accept some of the resulting limitations because they're opt-in:

Add a ctor: public MemoryStream(ArrayPool<byte> pool).
When that ctor is used, buffers come from the pool rather than being new'd up.
If you don't Dispose, you leak the buffer, but since this only happens when you use the new ctor, we're at least not impacting existing code, and it's little different than just using the pool directly and not returning a buffer.
We either make TryGetBuffer return false, or as with Dispose we accept that you had to opt-in to this by using the new ctor.

I'm leaning towards the second option: just add the new MemoryStream(ArrayPool<byte>) ctor, and allow TryGetBuffer to work; devs just need to know that when they create the stream with this ctor, they should dispose the stream, and they shouldn't use the array retrieved from TryGetBuffer after doing something else to the stream.

The text was updated successfully, but these errors were encountered:

svick · 2017-06-22T18:57:39Z

It's fairly common to not dispose MemoryStreams, as currently the Dispose is effectively a nop.

BTW, I have recently added a note to the documentation of MemoryStream that says that it doesn't have to be disposed. So changing that now would be breaking a promise the documentation makes. (And if it's not an appropriate promise, the documentation should be changed back.)

omariom · 2017-06-22T22:32:41Z

Would it be less dangerous to return ReadOnlySpan in another overload of TryGetBuffer ?

stephentoub · 2017-06-23T00:00:15Z

Would it be less dangerous to return ReadOnlySpan in another overload of TryGetBuffer ?

A little maybe, but many of the same problems apply, e.g.

stream.TryGetBuffer(out ArraySegment<byte> buffer);
stream.Write(...);
Use(buffer);

vs

stream.TryGetBuffer(out ReadOnlySpan<byte> buffer);
stream.Write(...);
Use(buffer);

In both cases, the stream may have replaced the array referenced by buffer, in which case buffer is now pointing to something old.

Plus, many of the cases for which TryGetBuffer really need access to the array.

madhub · 2018-07-21T08:21:48Z

We use MemoryStream all the time, it would be great if bytes array come from ArrayPool instead of allocating . Please plan this feature for your next milestone.

Timovzl · 2018-08-16T15:38:33Z

@svick If a particular class that implements IDisposable does not need to be disposed, I would always consider that an implementation detail, never to be part of the public documentation. Clients should honor the IDisposable interface, or ignore it at their (our) own peril. The implementation must be able to change, in my opinion.

markrendle · 2018-08-16T16:11:39Z

How about having a MemoryStreamPool that provides a MemoryStream backed by a pooled array, and having very clear Return() or Dispose() requirements? The MemoryStreamPool.Shared instance could use arrays from ArrayPool<byte>.Shared, and constructed instances could create their own pool.

svick · 2018-08-18T09:49:26Z

@Timovzl Are you saying you Dispose() all Tasks in your async code?

I think that in the cases where Dispose does not do anything, it's just unnecessary ceremony. But since it's hard to find out whether that's the case, and whether it will stay that way, I think documentation should make this clear.

wokket · 2018-08-20T23:26:12Z

Just adding another voice to this, given the widespread use of MemoryStream in our apps the ability to reduce (generally quite large) allocs would be great. I'm comfortable with an opt-in approach if required, although the 2.1 release showed how awesome it is when 'things just get faster'.

re: Disposing memory streams: There's so much tooling, helpers, VS Add-Ins, guidance etc that all falls to a pit of 'always dispose your disposables'. We treat any undisposed (especially I/O) class as a red flag in code reviews, and feel dirty when forced to hand such things off with no concrete confirmation of cleanup ( MVC's FileResult() is a great example there)

Timovzl · 2018-08-22T11:38:40Z

@Timovzl Are you saying you Dispose() all Tasks in your async code?

@svick I uh... always use ValueTask! :P Just kidding, you had me there.

Still, I maintain my position, exceptions notwithstanding. As this very issue demonstrates, allowing our class documentation to say, "hey, you may use this class without dispose, just for laziness", puts us in this awkward position when the implementation can no longer support that liberty.

Even if disposing is not necessary, why share this in the documentation at all? Why let people skip dispose on some Stream types? Just have them disposed as usual, even where it was not strictly necessary.

And then we can keep the exceptions minimal, such as for Task. The fewer exceptions, the less to remember. The fewer exceptions, the fewer questions raised by code reviewers, such as when they spot undisposed Streams. And our implementations are free to change.

imanushin · 2018-10-26T09:29:45Z

@markrendle , if you need MemoryStreamPool - you can just use
Microsoft.IO.RecyclableMemoryStream.

Looks like it has all that you want:

It uses chunks internally (which is nice to re-send huge files)
It re-uses byte arrays created

Timovzl · 2018-11-13T10:56:13Z

@imanushin RecyclableMemoryStream looks interesting. Note that its GetBuffer() method returns the actual (current) buffer. It matches the docs' current documentation and gives you the power to do things - to do things wrong, too. Of course, in part, this was already the case with MemoryStream, but now you can retain access to a buffer that will be returned to the pool and used by something else. I'm not too fond of this potential for misuse. Granted, the alternative isn't great...

tmds · 2019-11-21T09:16:44Z

It uses chunks internally (which is nice to re-send huge files)

Maybe something that could be considered here also.

geoffkizer · 2020-11-14T08:12:39Z

It uses chunks internally (which is nice to re-send huge files)

Maybe something that could be considered here also.

Using chunks internally seems inconsistent with supporting GetBuffer().

I do think using chunks internally is interesting to consider, but I'm not sure how to square this with the GetBuffer() API; seems like we might need a separate type for this.

saucecontrol · 2020-11-14T08:36:32Z

It gets a new larger contiguous buffer if necessary to present a single buffer with all content:

https://github.com/microsoft/Microsoft.IO.RecyclableMemoryStream/blob/master/src/RecyclableMemoryStream.cs#L54-L57

The only problem there is that the buffer might be from the pool, and now you have a second potential owner of said buffer. Would be safer to just treat GetBuffer() like ToArray() with a fresh non-pooled array.

Timovzl · 2020-11-19T09:44:08Z

Would be safer to just treat GetBuffer() like ToArray() with a fresh non-pooled array.

In that case, at the very least an array from the array pool could be returned. That leaves the user with the option of returning it, making the code mostly allocation-free. As for users unaware of this... renting an array from the array pool and not returning it has no downsides over just allocating a new array, right?

saucecontrol · 2020-11-19T09:57:47Z

Good point, as long as the MemoryStream doesn't return it when Disposed in that case. That's the tricky part of it... the buffer can't be owned by both the MemoryStream and the consumer because it's fine if neither returns it to the pool but bad if one returns it and the other continues using it.

Timovzl · 2020-11-19T10:32:05Z

Good point, as long as the MemoryStream doesn't return it when Disposed in that case [...] because it's fine if neither returns it to the pool but bad if one returns it and the other continues using it.

Agreed. Once you call GetBuffer(), you become the owner of the array. Either you return it neatly with ArrayPool<byte>.Return() (best reuse) or you let it be garbage collected (nothing goes wrong, at least).

This does require some attention to the single-block case too. Do we implement a special case to relinquish ownership of the single block? That seems hard to keep track of. We might need to switch to "large buffers" as soon as GetBuffer() is invoked.

Any clever ideas? It would be so nice if the single-block case could stay copy-free...

tmds · 2020-11-19T12:51:07Z

Would be safer to just treat GetBuffer() like ToArray() with a fresh non-pooled array.

GetBuffer allows you to also change the data that is in the MemoryStream, which would not be the case if a fresh non-pooled array was returned.

Using chunks internally seems inconsistent with supporting GetBuffer().

When backed by a chunks, GetBuffer could be disallowed, similar as when MemoryStream is constructed with publiclyVisible to false using:

public MemoryStream (byte[] buffer, int index, int count, bool writable, bool publiclyVisible);

Or, the same parameter could be added to the new constructor:

public MemoryStream(ArrayPool<byte> pool, bool publiclyVisible)

When publiclyVisible is set to false, the implementation can use chunks.

Or, when someone calls GetBuffer, the implementation could automagically switch to non-chunked mode.

stephentoub · 2020-11-19T15:21:07Z

The more I've thought about this, the more I think it needs to be done separately from MemoryStream:

It's very common to not Dispose a MemoryStream; not doing so with pooled buffers will "leak" / drain the pool.
MemoryStream.ToArray() is explicitly usable after MemoryStream.Dispose(); that's called out in the docs and a non-trivial amount of code relies on that behavior. That breaks badly if Dispose returns the buffer to the pool, and there's no way for the MemoryStream to know whether ToArray will be called. Using a different constructor could change the behavior of ToArray to disallow that, but that breaks the MemoryStream contract.
Existing misuses of MemoryStream, such as concurrent accesses, could result in spooky-action-at-a-distance bugs, with arrays being returned to the pool while still being used, resulting in new very hard to diagnose failure modes. When we've introduced ArrayPool usage into other existing streams, we've generally only done so when there was either existing synchronization we could guard usage behind or where the overhead of adding such synchronization wasn't prohibitive, but I fear it would be in MemoryStream.
Changing the MemoryStream behavior from growing an array to maintaining a linked list of arrays has observable performance impact, both positive for some cases (avoiding copying for append-only writes) and negative for others (e.g. random access via seeking, consuming more buffers of varying sizes from the pool). It's hard to predict whether the impact will be net positive or negative across the vast usage of MemoryStream.

geoffkizer · 2020-11-19T16:59:33Z

Changing the MemoryStream behavior from growing an array to maintaining a linked list of arrays has observable performance impact

Also the GetBuffer() issues mentioned above.

Timovzl · 2020-11-21T12:06:30Z

@tmds

GetBuffer allows you to also change the data that is in the MemoryStream, which would not be the case if a fresh non-pooled array was returned.

Actually, with the above discussion about this, the intent was that the returned array becomes the stream's own buffer (at least until you write to the stream), so changing the data certainly works. Hopefully the discussion makes more sense with that in mind.

When backed by a chunks, GetBuffer could be disallowed [...]

I imagine a less ideal GetBuffer (that sometimes needs to do copying) is preferable to one that sometimes simply does not work. The caller doesn't necessarily know when it's using small vs. large buffers. Having the method throw sometimes, in a way that is nontransparent to the caller, would render the method a lot less usable.

Or, when someone calls GetBuffer, the implementation could automagically switch to non-chunked mode.

That would be the easiest. It is a bit disappointing that the simple case of a single chunk would then involve copying, so I'm still holding out hopes that we can come up with something more clever. But it might be acceptable.

Timovzl · 2020-11-21T12:38:01Z

The more I've thought about this, the more I think it needs to be done separately from MemoryStream:

I think a separate, more suitable API would certainly be worth pursuing. I'm wondering, when we wish to supply a low-allocation implementation, in how many cases we must pass a MemoryStream. Can we practically always get away with passing any Stream? Or do we have example use cases where MemoryStream is a requirement?

If a MemoryStream tends to be the requirement, we should probably tackle both issues, but separately. If any Stream will do, then let's forget about MemoryStream and make something ideal.

I'm not fond of the idea of honoring API misuse. If you neglect to dispose an IDisposable object, invoke methods on a disposed object, or concurrently access an object that is not thread-safe... then there may be consequences - now, or when the implementation changes.

If a type unrelated to MemoryStream turns out to be the way to go anyway, then it's a plus that API misuse isn't punished. But should that consideration be a priority?

Admittedly, as for not disposing, the documentation says that's allowed, so that needs to be addressed. (Strange implementation detail to mention in the documentation, but I digress.) If we can make sure that not disposing merely degenerates to the equivalent of using non-pooled streams (analog to not returning arrays rented from an array pool), then this is a non-issue, right? Potential gains, with no potential losses.

stephentoub · 2020-11-21T12:57:45Z

If we can make sure that not disposing merely degenerates to the equivalent of using non-pooled streams (analog to not returning arrays rented from an array pool), then this is a non-issue, right?

Arrays from the pool are more "valuable" than others, as they're much more likely to have been around for longer, be in higher generations, etc. There's also been discussion about using pinned/aligned arrays in the pool. So taking arrays from the pool and not returning them is generally worse on a system than allocating new ones, degrading other unrelated components using the pool.

I'm not fond of the idea of honoring API misuse. If you neglect to dispose an IDisposable object, invoke methods on a disposed object, or concurrently access an object that is not thread-safe... then there may be consequences - now, or when the implementation changes.

Not disposing a MemoryStream is not misuse, nor is using ToArray after disposal.

Concurrent use is misuse, but at the same time, MemoryStream is used by millions of libraries and applications; there is guaranteed to be misuse. And we can't afford to introduce changes into such a type that could introduce a myriad of difficult to diagnose race conditions in entirely unrelated parts of the app, e.g. misuse of a MemoryStream over here causes sensitive data to be leaked to an http response over there. We need to be cognizant of such issues for new APIs, but it's a much bigger deal for existing ones, especially ones as core s MemoryStream.

Timovzl · 2020-11-21T15:05:05Z

Arrays from the pool are more "valuable" than others, as they're much more likely to have been around for longer, be in higher generations, etc. There's also been discussion about using pinned/aligned arrays in the pool. So taking arrays from the pool and not returning them is generally worse on a system than allocating new ones, degrading other unrelated components using the pool.

I did not realize! That does limit our options... and support your proposition to not inherit from MemoryStream.

I'm still very curious if we even have examples where inheriting from MemoryStream poses an advantage. I don't seem to recall any APIs that take one as a parameter.

stephentoub · 2020-11-24T02:53:54Z

I don't seem to recall any APIs that take one as a parameter.

To my knowledge there aren't any public APIs in the core libraries strongly-typed to accept a MemoryStream. No doubt there are cases outside of the core libraries.

houseofcat · 2021-07-23T23:48:42Z

@stephentoub Would this possibly fall under Up For Grabs? Or have you made some traction internally?

I think ArrayPoolStream would work nicely, but add a little creative breathing room without need for backwards compatibility. Could also operate as a MemoryStream drop-in replacement for a large amount of use cases (those not misusing the Stream) to begin with.

stephentoub · 2021-12-10T20:54:06Z

@houseofcat, sorry for my prolonged delay in responding; I missed the GitHub notification and only just saw the response.

It's up for grabs, but at this point that's about coming up with a design proposal rather than actually submitting a PR with the implementation.

jeffhandley · 2023-03-21T21:11:13Z

Weighing the value of this against other Stream-related work, we've concluded that we will not pursue bringing this feature into .NET 8. We do expect to bring this feature into dotnet/runtime in the future, but for the time-being we encourage usage of the RecyclableMemoryStream implementation.

If folks encounter blockers for using RecyclableMemoryStream, the .NET team would consider contributing to that library to unblock those scenarios.

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 5.0 milestone Jan 31, 2020

maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020

stephentoub removed the untriaged New issue has not been triaged by the area owner label Feb 25, 2020

stephentoub modified the milestones: 5.0, Future Feb 25, 2020

bjornen77 mentioned this issue Sep 22, 2021

System.IO.Stream Backlog #58216

Open

14 tasks

geoffkizer mentioned this issue Dec 15, 2021

Consider using pooled buffers for HttpClient response buffering #62845

Open

jeffhandley mentioned this issue Jan 9, 2022

Libraries continuous improvement work planned for .NET 7 #57209

Closed

19 tasks

jeffhandley added the Team:Libraries label Jan 9, 2022

jeffhandley modified the milestones: Future, 7.0.0 Jan 9, 2022

jeffhandley mentioned this issue Feb 1, 2022

System.IO work planned for .NET 7 #64596

Closed

50 tasks

jeffhandley modified the milestones: 7.0.0, Future Jul 9, 2022

stephentoub modified the milestones: Future, 8.0.0 Sep 21, 2022

jeffhandley added the partner-impact This issue impacts a partner who needs to be kept updated label Nov 14, 2022

stephentoub mentioned this issue Mar 7, 2023

[API Proposal]: ArrayPool based MemoryStream #83105

Closed

jeffhandley modified the milestones: 8.0.0, Future Mar 21, 2023

albyrock87 mentioned this issue Jul 26, 2024

[Windows] Handle non-seekable streams in PlatformImage.FromStream (take 2) dotnet/maui#23824

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: using ArrayPool<byte> in MemoryStream with new APIs #22428

Proposal: using ArrayPool<byte> in MemoryStream with new APIs #22428

stephentoub commented Jun 22, 2017 •

edited by StephenBonikowsky

Loading

svick commented Jun 22, 2017

omariom commented Jun 22, 2017

stephentoub commented Jun 23, 2017

madhub commented Jul 21, 2018

Timovzl commented Aug 16, 2018

markrendle commented Aug 16, 2018

svick commented Aug 18, 2018

wokket commented Aug 20, 2018

Timovzl commented Aug 22, 2018

imanushin commented Oct 26, 2018

Timovzl commented Nov 13, 2018

tmds commented Nov 21, 2019

geoffkizer commented Nov 14, 2020

saucecontrol commented Nov 14, 2020

Timovzl commented Nov 19, 2020

saucecontrol commented Nov 19, 2020

Timovzl commented Nov 19, 2020

tmds commented Nov 19, 2020

stephentoub commented Nov 19, 2020 •

edited

Loading

geoffkizer commented Nov 19, 2020

Timovzl commented Nov 21, 2020

Timovzl commented Nov 21, 2020

stephentoub commented Nov 21, 2020 •

edited

Loading

Timovzl commented Nov 21, 2020

stephentoub commented Nov 24, 2020

houseofcat commented Jul 23, 2021 •

edited

Loading

stephentoub commented Dec 10, 2021

jeffhandley commented Mar 21, 2023

Proposal: using ArrayPool<byte> in MemoryStream with new APIs #22428

Proposal: using ArrayPool<byte> in MemoryStream with new APIs #22428

Comments

stephentoub commented Jun 22, 2017 • edited by StephenBonikowsky Loading

svick commented Jun 22, 2017

omariom commented Jun 22, 2017

stephentoub commented Jun 23, 2017

madhub commented Jul 21, 2018

Timovzl commented Aug 16, 2018

markrendle commented Aug 16, 2018

svick commented Aug 18, 2018

wokket commented Aug 20, 2018

Timovzl commented Aug 22, 2018

imanushin commented Oct 26, 2018

Timovzl commented Nov 13, 2018

tmds commented Nov 21, 2019

geoffkizer commented Nov 14, 2020

saucecontrol commented Nov 14, 2020

Timovzl commented Nov 19, 2020

saucecontrol commented Nov 19, 2020

Timovzl commented Nov 19, 2020

tmds commented Nov 19, 2020

stephentoub commented Nov 19, 2020 • edited Loading

geoffkizer commented Nov 19, 2020

Timovzl commented Nov 21, 2020

Timovzl commented Nov 21, 2020

stephentoub commented Nov 21, 2020 • edited Loading

Timovzl commented Nov 21, 2020

stephentoub commented Nov 24, 2020

houseofcat commented Jul 23, 2021 • edited Loading

stephentoub commented Dec 10, 2021

jeffhandley commented Mar 21, 2023

stephentoub commented Jun 22, 2017 •

edited by StephenBonikowsky

Loading

stephentoub commented Nov 19, 2020 •

edited

Loading

stephentoub commented Nov 21, 2020 •

edited

Loading

houseofcat commented Jul 23, 2021 •

edited

Loading