-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double the size of the previous segment #66930
Conversation
- This change comes after long observations around how pipelines doesn't work well for copying large blocks of data mainly due to the segment allocation strategy. Today we allocate each segment > minimum segment size < max pool size. This works well if data is being quickly consumed because we can keep memory allocations to a minimum but doesn't work well when there's large chunks of data are being written. This change doubles the segment size based on the previous segment up to 1MB (arbitrary limit). This should make the default behavior work pretty much the same as today but performance of larger writes/reads should improve.
@@ -11,6 +11,9 @@ public class PipeOptions | |||
{ | |||
private const int DefaultMinimumSegmentSize = 4096; | |||
|
|||
// Arbitrary 1MB max segment size | |||
internal const int MaximumSegmentSize = 1024 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be higher for large copy scenarios? Not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should think about how this interacts with logic like that for SocketConnection.MinAllocBufferSize. I could see us potentially unnecessarily doing syscalls using relatively small 2KB buffer for reads after this change as we reach the end of the block. This is a DoS mitigation given smaller segment sizes. But with larger segment sizes, we could leave more bytes unfilled before allocating syscalls without wasting too much memory proportionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we'll need to tweak how we think about this (and if it matters). You'll end up throwing away 2K (which maybe is too aggressive anyways?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think throwing away up to 2K at the end of the segment is that bad especially if the segments get larger. I think we could throw away more. We want to reduce syscalls, so reading into small tail space would be counterproductive given large amounts of data. When we're copying data from one buffer to another in user mode, it makes far more sense to use all the available space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting anything or just saying this is interesting?
@@ -135,15 +135,16 @@ private void AllocateMemory(int sizeHint) | |||
_tailBytesBuffered = 0; | |||
} | |||
|
|||
BufferSegment newSegment = AllocateSegment(sizeHint); | |||
int newSegmentSize = Math.Min(PipeOptions.MaximumSegmentSize, _tail.Capacity * 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is simple to understand but another possible technique could be to base the growth on number of segments rather than the last segment (so taking the speed of the consumer into account to shrink the next segment). I'm not sure if it's a big concern though because:
- If the last segment is ever fully consumed, it'll restart to the minimum segment size (default 4K).
- If the last segment isn't fully consumed because the reader can't keep up, the pause threshold can be adjusted and working with bigger blocks of memory is better for the reader.
@@ -206,7 +208,7 @@ private void AllocateWriteHeadSynchronized(int sizeHint) | |||
} | |||
} | |||
|
|||
private BufferSegment AllocateSegment(int sizeHint) | |||
private BufferSegment AllocateSegment(int minimumSegmentSize, int sizeHint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We should rename the sizeHint
to minSize
or something in internal/private APIs because it now violates the contract to provide spans/memory smaller than that so it's not really a "hint". With the new variables, this becoming somewhat confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should but I dislike unrelated changes. I'll make that change as well.
@@ -197,7 +197,9 @@ private void AllocateWriteHeadSynchronized(int sizeHint) | |||
_writingHeadBytesBuffered = 0; | |||
} | |||
|
|||
BufferSegment newSegment = AllocateSegment(sizeHint); | |||
// Double the minimum segment size on subsequent segements | |||
int newSegmentSize = Math.Min(PipeOptions.MaximumSegmentSize, _writingHead.Capacity * 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What pool are you testing this with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArrayPool. I need to change the implementation of the PinnedBlockMemoryPool to make this work the way it's intended.
/azp run |
You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list. |
/azp run help |
No pipelines are associated with this pull request. |
/azp run runtime |
Azure Pipelines successfully started running 1 pipeline(s). |
@davidfowl, are you still working on this? |
Punt to future. |
PS: Running more performance experiments.
Makes #49259 more feasible as the number of segments will be less per pipe because of the growth strategy
Might improve #43480 in the default case