-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Share System.IO.Pipelines.BufferSegments (make BufferSegmentStack static) across instances of Pipe #49259
Comments
How? There's a maximum amount of segments that are pooled. Can you show how you're using the pipe and also how you're measuring the memory usage? |
hmm, missed the part about restricted growth As for an example, it would be for use in re-usable buffers for handling socket receives:
|
You reset the pipe during the use of the socket? Why? I'm confused on the usage pattern and savings here. What are you seeing and what are you trying to reduce? |
sorry, pseudocoding, I will update the example |
basically, incoming messages can be of varying sizes, so using a pipe to take advantage of its segmenting is useful. However, when re-using these pipes as sockets open/close, it would be nice if they could share internal buffer segments so that after running for a while, all of these re-used pipes don't each end up holding onto an internal stack of buffers. Especially if most messages only need 1 segment. |
I still don't understand what you'd be saving. Do you have some back of the napkin math? Do you have a memory profile? I'd like to understand how much you think you'd save if the segments were shared across pipes. Presumably if each pipe rented an array from the array pool. It wouldn't change how much memory was allocated unless your workload was constantly dropping and adding connections. If you had lots of concurrent connections, there'd be no savings... |
So this would be the case of many constantly dropped/added. I cannot provide specifics unfortunately but I was thinking if there are (tens/hundreds of) thousands of connections coming and going it might end up with 75% of the segments unused most of the time. This may be a case of me wanting to over-optimize, idk |
I'd like to replicate the scenario so I can measure and play with changes. The more data you can provide the better (and would have a better likelihood of success). How many pipes are you pooling? I'm guessing you don't pool tens of thousands of pipes (that would be counter productive I think)? I've thought about doing this in the past but haven't seen the scenario that has convinced me we should optimized for this. I could be convinced with the right evidence. |
Unfortunately I cannot provide too much detail but 10k+ connections at any one time, some short-lived, some long, The size of each socket receive would vary in size from one segment to multiple (but likely usually one segment). Maybe pooling all of those pipes is not a good idea? I am not sure, the goal is to avoid re-allocating them if they will likely be needed again (which is obv. dependent on the number of incoming sockets) |
Pooling pipes per request is fine if you were going to pool per request. We pool pipes for HTTP/2 requests (we reuse the request object and reset the pipe on them). We don't pool pipes at the connection level because the benefits are less clear. You'd need to have a good idea of the socket churn as that's the only time you'll get reuse. The pool becomes less effective if there are lots of long lived connections so you only benefit from not allocating, if your pool size matches the churn you expect. |
In my case, as these are continuous sockets, not one off (http) requests, it makes more sense (imo) to hold the pipe for the duration of the socket and return it to the pool after the socket closes. Holding onto a large array of pools I agree might not make sense, but for a ~10k object pool, overhead seems pretty minor. |
The Pipe today is around 368 bytes * 2 = 736 bytes per socket (assuming you're using a duplex pipe and a Pipe implementation for each side of the socket). BufferSegments are each about 80 bytes and one segment maps to 4K of memory (~2% overhead). Memory is returned before the Pipe is reset so the thing you're focused on in this issue is the BufferSegment itself (that 2% overhead). The amount of space that should be used by the BufferSegments at any one time should be 2% of buffered data stored in the Pipe. That roughly amounts to how you're using the pipe higher up the stack to parse messages. This currently doesn't take the minimum segment size into account (it assumes 4K and that's a gap) but by default, I'd be surprised if you saved much here. If you can, take a memory dump from your running application and spelunk around the object graph to see what kinds of numbers you see in your scenario. That'll give more me some more data to work off of. PS: I'm currently looking at shrinking the size of the Pipe because it's massive. |
For me, it is more of the perf cost + memory churn/traffic of creating those segment. |
Have you had time to try this? |
I have not yet, I might have time today or tomorrow |
So I need to do some further testing, but some quick numbers were probably 1.5x (on the high end) the number of buffers compared to in-use. So for a test where I had 1000 active connections, I had about 1200-1500 allocated buffers. For comparison, I am actually currently working with a custom solution that shares a pool of segments across all sockets, and fetches/links new segments to a sequence per socket as needed. In my use case (unfortunately I cannot give more details at the moment), I had an overhead of about 5-10 buffers for 1000 connections. The times when an extra buffer is needed is very low per concurrent receive, even though all of the clients will perform the same operation, requiring multiple segments. |
Can you give some concrete numbers (MB sizes) or pictures from the dumps of the Pipe types? |
I'll look into using a per PipeOptions pool for segments |
I did a prototype of this an I'm not sure about it as yet. The tricky part about sharing segments across all pipes by default are the following:
I ran a couple of tests:
In my changes, I also currently fallback to the per Pipe stack that gets used if you exceed 256 segments in an individual pipe (or if the global pool is empty). I added this because of the scenario above where a single pipe tries to buffer lots of segments (this number could be configurable). Of course we don't need to be perfect here but it's really hard to pick sizes for global pools like this because they depend so much on usage and also because shrinking them is hard. We could also let you configure the pool size but that's also tricky especially with a static so this needs more thought. @cobalthex In your scenario, what's the maximum number of concurrent connections you expect, and how much do they have buffered by default? How big are your message sizes? |
cc @AArnott |
I am expecting at least 10k ccu with each only holding onto one buffer. Message sizes vary, currently buffers are set to 4k. However that may change in the future (likely to shrink). There is currently a maximum receive size set at 1MB, but that is likely to shrink dramatically (needs more profiling) |
For pooling: if it is shared, could it not behave similarly to how ArrayPool.Shared works? (I have not dug into that code) Perhaps growing as much as necessary, and performing trim if memory pressure gets high (or no longer pooling) If the pipe/read/writer had separate pools wouldn't that just mean after writing to a pipe, it dumps those segments into the reader pool if you read from the pipe (maybe I'm misunderstanding) Perhaps this could be like ArrayPool where you can default to Shared or pass in your own via PipeOptions as Brennan suggested? This would allow you to tune the buffer sizes, max growth/other behaviors |
It could, but it's unclear how much you're saving. The array pool is pooling big buffers and those usually matter. This pool is saving 96KB of overhead per segment. I'm not sure they are comparable...
I think it could be done somewhere for sure but I'm not convinced that the benefit is there for the complexity introduced. |
One of the benefits of
System.IO.Pipelines.Pipe
is its ability to be re-used and otherwise conserve memory through usage of ArrayPools internally.Pipes create BufferSegments internally to store the memory segments it uses, which are further stored in a ReadOnlySequence.
While the buffer segments are re-used through BufferSegmentStack, this is a unique instance on every Pipe and cannot be shared.
This creates a potential issue where if using a pool of Pipes. Each pipe can have its stack grow ("boundlessly") and maybe end up with many un/under-used segments in a long-lived pool.
I would propose that having the buffer segments be responsible for owning/renting the contained array/memory and that the Pipe is only responsible for renting segments.
The text was updated successfully, but these errors were encountered: