-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
@@ -1160,7 +1162,7 @@ private String[] SplitInternal(ReadOnlySpan<char> separators, int count, StringS | |||
return new String[] { this }; | |||
} | |||
|
|||
int[] sepList = new int[Length]; | |||
Span<int> sepList = Length < StackallocStringLengthLimit ? stackalloc int[Length] : new int[Length]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to avoid the new allocation here every time even for larger Lengths.
We may want to introduce internal ValueListBuilder to help with it. It would be similar to https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/Text/ValueStringBuilder.cs, but it would be much simpler without all the string specific handling (e.g. it should just have a single Append method that takes T).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should i create issue for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to fix this as part of #6136 - the use of ArrayPool is mentioned there as one of ideas. It should also help with cases where your current fix makes things a bit slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, i will take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas what API for it do you expect? We can`t return array to pool while returning result as ValueStringBuilder does.
See initial version https://gist.github.com/cod7alex/d1299959de8af5a18e9107c304ee1343
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the shape should be a bit different to make it work for this use case. E.g. there can be AsReadOnlySpan() method that returns the current list as ReadOnlySpan<T>
and there should be Dispose()
method that returns the array back to the bool if there is any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if you think that it would be better to give this some other name than ValueListBuilder - feel free to change it.
int[] sepList = new int[Length]; | ||
int[] lengthList; | ||
Span<int> sepList = Length < StackallocStringLengthLimit ? stackalloc int[Length] : new int[Length]; | ||
Span<int> lengthList = singleSeparator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be cheaper to do Span<int> lengthList = new Span<int>()
here and put Length < StackallocStringLengthLimit ? stackalloc int[Length] : new int[Length];
inside the else
clause below? That saves a branch, but adds newing up a struct, so I don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most expensive part in the current code is variable sized-stackalloc. Fixed-size stackalloc is much cheaper,..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why that is? Both cases zero the memory right, or does the compiler determine it doesn't need to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both cases zero the memory right
No. We have zero-init for stackalloc disabled in CoreLib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Along the lines of @danmosemsft's comment, why not just keep the existing structure of the code, which would avoid having to check singleSeparator
twice? e.g.:
Span<int> sepList = Length < StackallocStringLengthLimit ? stackalloc int[Length] : new int[Length];
Span<int> lengthList;
int defaultLength;
int numReplaces;
if (singleSeparator)
{
lengthList = default;
defaultLength = separator.Length;
numReplaces = MakeSeparatorList(separator, sepList);
}
else
{
lengthList = Length < StackallocStringLengthLimit ? stackalloc int[Length] : new int[Length];
defaultLength = 0;
numReplaces = MakeSeparatorList(separators, sepList, lengthList);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justinvp it does not compile with A result of a stackalloc expression of type 'Span<int>' canno t be used in this context because it may be exposed outside of the containing method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try making it Span<int> lengthList = stackalloc int[0];
, e.g.:
Span<int> sepList = Length < StackallocStringLengthLimit ? stackalloc int[Length] : new int[Length];
Span<int> lengthList = stackalloc int[0];
int defaultLength;
int numReplaces;
if (singleSeparator)
{
defaultLength = separator.Length;
numReplaces = MakeSeparatorList(separator, sepList);
}
else
{
lengthList = Length < StackallocStringLengthLimit ? stackalloc int[Length] : new int[Length];
defaultLength = 0;
numReplaces = MakeSeparatorList(separators, sepList, lengthList);
}
Reference: dotnet/corefx#25426 (comment)
Try making this Span temp = stackalloc int[0];
Otherwise the local is classified as returnable and later you cannot mix it with stackallocated ones
stackalloc int[0]; is basically a noop, but will have effect of marking the temp as not returnable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that works
Do we have data on the distribution of string lengths seen by split? If very short strings are common we might special case the 32 bytes or less case too, as current jit will optimize these further (#14623). If you can show the distribution peaks at a value not too much larger than 32 we could perhaps increase the threshold to extend the range of the jit optimization. I left it at 32 because larger values started causing local offset code bloat and I did not know of any cases that would benefit. |
@@ -1270,7 +1272,7 @@ private String[] SplitInternal(String separator, String[] separators, Int32 coun | |||
// the original string will be returned regardless of the count. | |||
// | |||
|
|||
private String[] SplitKeepEmptyEntries(Int32[] sepList, Int32[] lengthList, Int32 defaultLength, Int32 numReplaces, int count) | |||
private string[] SplitKeepEmptyEntries(Span<int> sepList, Span<int> lengthList, int defaultLength, int numReplaces, int count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change sepList
and lengthList
to be typed as ReadOnlySpan<int>
instead of Span<int>
since the spans are not modified in this method? The call sites wouldn't need to change as there's an implicit conversion from Span<T>
to ReadOnlySpan<T>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks
@@ -1308,7 +1310,7 @@ private String[] SplitKeepEmptyEntries(Int32[] sepList, Int32[] lengthList, Int3 | |||
|
|||
|
|||
// This function will not keep the Empty String | |||
private String[] SplitOmitEmptyEntries(Int32[] sepList, Int32[] lengthList, Int32 defaultLength, Int32 numReplaces, int count) | |||
private string[] SplitOmitEmptyEntries(Span<int> sepList, Span<int> lengthList, int defaultLength, int numReplaces, int count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change sepList
and lengthList
to be typed as ReadOnlySpan<int>
instead of Span<int>
since the spans are not modified in this method?
else | ||
{ | ||
Grow(); | ||
Append(item); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to implement like this without recursive call:
if (pos >= _span.Length)
Grow();
_span[pos] = item;
_pos = pos + 1;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely, thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to implement like this without recursive call
Why? I could be misremembering, but I thought I'd looked at the asm for both and the fast path was better this way. Maybe not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code generators tend to convert recursive calls like these into loops.
If you have verified by looking at the disassembly that this gives the best code, it is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be wrong and would need to look again.
@jkotas I made changes only for char parameter version now, am i moving in right direction?
Looked a bit at string versions and they seem to be not so easy to change due to logic around |
@@ -1162,23 +1162,31 @@ private String[] SplitInternal(ReadOnlySpan<char> separators, int count, StringS | |||
return new String[] { this }; | |||
} | |||
|
|||
Span<int> sepList = Length < StackallocStringLengthLimit ? stackalloc int[Length] : new int[Length]; | |||
int numReplaces = MakeSeparatorList(separators, sepList); | |||
Span<int> initialSpan = Length < StackallocStringLengthLimit ? stackalloc int[Length] : stackalloc int[StackallocStringLengthLimit]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can always allocate the full buffer here: stackalloc int[StackallocStringLengthLimit]
.
} | ||
} | ||
} | ||
break; | ||
} | ||
|
||
return foundCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think it maybe a bit easier to understand it the AsReadOnlySpan
call is done by the caller.
@cod7alex Yes, I think you are on the right path. |
@billwert for this question |
b2ca69f
to
e496b0a
Compare
Latest version has two problematic benchmarks, see below (i have added some new tests and increased iteration count to better see the trend). Not sure what is the reason to be honest, for second case it may be
|
This is allocation heavy benchmark. It is not unusual to see fluctuations like these in allocation heavy benchmarks. The GC kicks in at random spots. These random spots change as you change the allocation pattern. It may result into slowing down innocent victims. To prove this theory, you can try running the individual tests in a different order, or changing between workstation and server GC. |
|
||
int foundCount = 0; | ||
int sepListCount = sepList.Length; | ||
int currentSepLength = separator.Length; | ||
|
||
fixed (char* pwzChars = &_firstChar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unsafe code should not be necessary now that you have simplified the loop condition. for (int i = 0; i < s.Length; i++) { s[i]
is a pattern that the JIT bounds-check elimination is pretty good at recognizing.
|
||
namespace System | ||
{ | ||
public partial class String | ||
{ | ||
private const int StackallocStringLengthLimit = 512; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
512 feels a bit on the high side. Some of the methods allocate twice the size, and they will have 4kB+ stack frame.
It may be worth looking at the right size for this - check what is the smallest size that helps microbenchmark results.
|
||
namespace System | ||
{ | ||
public partial class String | ||
{ | ||
private const int StackallocStringLengthLimit = 512; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not have much to do with StringLength anymore. Rename it?
Case with 1000 separators is still slower with suggested experiments. PerfView CPU stacks:
after:
|
The problem is that the MakeSeparatorList is 2x slower for SplitCharLength1000AllCharsAreSeparators with this change. I do not think that the copy is the problem. The problem seems to be in the ValueListBuilder.Append method. This method is causing the hot loop in MakeSeparatorList to have about 2x more instructions. Try to tweak the implementation of this method. E.g. making it look like the similar method in ValueStringBuilder may help:
Also, changing |
@jkotas Sorry, my results were for some weird test version of code, missed that. The reason is truly Append, and changing it helps to get better results, but it is still slower than before (btw uint trick does not change anything). In my opinion, it is not possible to achieve same performance for this test case ( What i think is possible to do is to slightly tweak first version of PR so it gets stackalloc`ed Span or array from pool depending on string length instead of just allocation of array for big strings. |
How much slower? I think it is acceptable to take some regression for this corner case with this change. |
Actually, I don`t see the improvement while running clean benchmark with just that test, sorry about confusion once more. Moving to the same layout as list builder has same results (~8% slower) or there are very small diffs like 1ms. |
@@ -0,0 +1,61 @@ | |||
using System.Diagnostics; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add the standard license header
I think it is fine to take the 8% regression for the case of long strings that are all separators here. Could you please address the other feedback? It looks fine to me otherwise. |
using System.Diagnostics; | ||
using System.Runtime.CompilerServices; | ||
|
||
namespace System.Buffers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: System.Collections.Generic may be more appropriate namespace for this.
@@ -614,6 +614,7 @@ | |||
<Compile Include="$(BclSourcesRoot)\mscorlib.Friends.cs" /> | |||
</ItemGroup> | |||
<ItemGroup> | |||
<Compile Include="shared\System\Buffers\ValueListBuilder.cs" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add this to src\mscorlib\shared\System.Private.CoreLib.Shared.projitems instead. (More details: https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/README.md)
You maybe again just seeing a noise from the GC. Would you mind pushing your latest changes to github so we can take a look? |
Yes, sure |
@dotnet-bot test this please |
@dotnet-bot test Ubuntu x64 Checked corefx_baseline please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There is about ~10% regression in the corner case of longer string of separators. I think it is a good trade-off given the other improvements.
@dotnet-bot test Ubuntu x64 Checked corefx_baseline please |
Thanks @jkotas and guys for reviews and help. Do i need to do anything about test failures? They look unrelated to me. |
@dotnet-bot test Ubuntu x64 Checked corefx_baseline please |
@cod7alex The failures are infrastructure issues. Nothing required from you. |
@dotnet-bot test this please |
@dotnet-bot test Ubuntu x64 Checked corefx_baseline please |
1 similar comment
@dotnet-bot test Ubuntu x64 Checked corefx_baseline please |
* Use stackalloc in string.Split * Added initial usage of ValueListBuilder * Added usage of list builder to string separator Split overloads Signed-off-by: dotnet-bot-corefx-mirror <[email protected]>
* Use stackalloc in string.Split * Added initial usage of ValueListBuilder * Added usage of list builder to string separator Split overloads Signed-off-by: dotnet-bot-corefx-mirror <[email protected]>
* Use stackalloc in string.Split * Added initial usage of ValueListBuilder * Added usage of list builder to string separator Split overloads Signed-off-by: dotnet-bot <[email protected]>
* Use stackalloc in string.Split * Added initial usage of ValueListBuilder * Added usage of list builder to string separator Split overloads Signed-off-by: dotnet-bot <[email protected]>
Nice job! ❤️ |
I know I'm a little late to the party, but my original suggestion when reporting this issue was to break down Split into two main paths:
The latter scenario is where int arrays seem to be necessary. Adding int array allocations into the single delimiter path, ~80-95%, scenario seems to only add unnecessary allocation to the most common scenario, consequently slowing it. I've ran into many scenarios doing .ETL type operations, arguably better done elsewhere, that have required parsing entire files of delimited data where the char length can easily be >1k and you're running through MB of data. The int arrays remaining in this single-delimiter path is going to be expensive no matter what. Thoughts on branching up front based on array length of 1 with an alternate int-array-free path? Thanks, |
@jefffischer-episerver my suggestion is to open a new issue. It's easy to miss comments on closed issues. |
FWIW, there are no int arrays on the single-delimiter path after this change. The single-delimiter path will use stack allocated |
Thanks, @jkotas, unless I'm mistaken I don't believe that's the case. It should be unnecessary to pass through the string twice in the single-delimiter path. The call to MakeSeparator should be eliminated from that path and both SplitOmitEmptyEntries and SplitKeepEmptyEntries should be overloaded to accept a single string delimiter, instead. These overloads should do the unsafe string parsing, rather than MakeSeparator. Currently, there is about a 10x (the string size) Gen 0 memory allocation, we should look to bring that down to ~1x + some extra allocations (the string size) in the single-delimiter path. Maybe Span speeds things up, but it's not doing anything to Gen 0 mem allocations. |
As @danmosemsft suggested, please open a new issue on this. There should not be any 10x (the string size) Gen 0 memory allocation the way it is implemented right now. |
Apologies, I see that it's dependent on the density of delimiters. Catching up on .NET core changes.
Given a high density of delimiters per line and an int (4 bytes) allocated for each delimited char, the numbers aren't 10x, but higher than they truly need to be. Thanks for the recommendation on opening another issue. |
* Use stackalloc in string.Split * Added initial usage of ValueListBuilder * Added usage of list builder to string separator Split overloads Commit migrated from dotnet/coreclr@d062934
Adds usage of Span and stackalloc for strings that are not large. Allows to avoid allocations of int arrays.
Benchmarks:
Benchmark code:
https://gist.github.com/cod7alex/f123c3e662d21c3327399aa71d338485
Closes https://github.com/dotnet/coreclr/issues/6136
@danmosemsft @stephentoub