-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
possible perf optimizations for System.Linq.Set & System.Linq.Parallel.Set #47173
Comments
Tagging subscribers to this area: @eiriktsarpalis Issue DetailsAs in title - I think it'd be worthwhile to at least investigate possible possible performance optimizations for DescriptionAs mentioned by @danmosemsft in #37180 (comment) , there's a possible performance improvement by aligning it with improvements already present in DataThings/differences that do come to mind here as possible optimizations:
AnalysisAlthough the class is only used internally and not meant for general consumption, it's still possibly worthwhile to provide speed optimizations for it, as it seems it was intended to be heavily optimized instance for quick behind-the-scenes data lifting, may create minor performance bottlenecks in LINQ contexts. (e.g. when using https://docs.microsoft.com/en-us/dotnet/api/system.linq.enumerable.distinct?view=net-5.0)
|
Hi @FyiurAmron would you be interested in providing a prototype to evaluate potential improvements? |
@eiriktsarpalis what benchmark/microbenchmark framework would you suggest to measure the results? Is https://github.com/dotnet/BenchmarkDotNet OK, or do you have any other internal/external tooling that would be of use here? |
We use BenchmarkDotNet as well, cf. https://github.com/dotnet/performance |
@eiriktsarpalis I've created a quick project @ https://github.com/FyiurAmron/LinqSetPerf to show the range of possible perf improvements (currently from only the 4th item, taking the source size into account); I've refactored the code to be as isolated as possible, and modified basically just two places: This of course only deals with Also, FWIW, 7 as an initial size is extremely low initial alloc anyway. For general-case HashMap, it may be sensible, since the first reallocs are quite fast, and expected growth is small. Here, not so much. for a random
I'd say that the expected performance gains for those cases are high to very high, both in CPU & mem. Now, for the not-so-distinct arrays: (forced 4x repetition of elements)
A slight CPU speed increase, with non-obvious mem differences, depending on exact dataset. Still, I'd say that a set where only 25% of the 200 000 elements are distinct is somewhat of a ballpark "edge" for non-degenerate data. Finally, a completely degenerate case (0-filled array)
As expected, there's no statistically significant difference in speed here, but the mem difference is visible, and can get important if the size of the collection is high. FWIW, I expect the first 3 items in the original perf improvement list to give only a small boost in speed, but without any real tradeoffs. |
also, for reference, this is the speed difference with just using the actual random data,
4 reps of data in the set,
|
Closed by #49591 |
As in title - I think it'd be worthwhile to at least investigate possible possible performance optimizations for
System.Linq.Set
(andSystem.Linq.Parallel.Set
also, possibly).Description
As mentioned by @danmosemsft in #37180 (comment) , there's a possible performance improvement by aligning it with improvements already present in
HashSet
/Dictionary
.Data
Things/differences that do come to mind here as possible optimizations:
comparer == null
prevents both using the potentially fastervalue.GetHashCode()
instead ofInternalGetHashCode(value)
which further delegates to_comparer.GetHashCode(value)
(which, in this case, basically just callsvalue.GetHashCode()
) (also, please note the EqualityComparer<T>.Default.Equals doesn't devirtualize in a shared generic #10050 , which can currently further hurt performance)hashCode % _buckets.Length
) is calculated twice (with no caching) inFind
/Add
, even if no bucket resize did happen, and doesn't have any provision for https://github.com/stephentoub/runtime/blob/master/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSet.cs#L305HashHelpers.ExpandPrime(_count)
etc., so it may have noticeably higher collision rate, impacting the performance.new
&Array.copy
reallocs for iterating any collection with non-trivial size. The new size of_count * 2 + 1
is a smaller number than one you get from using prime series (which will usually go slightly above this number), which further increases the amount of allocs. While it's reasonable to assume thatDistinct
will be iterated less than a size of the original collection, using a more reasonable initial threshold than simply7
(e.g.Math.clamp(NextPrime(initialCollectionSize / 4), 7, 131)
, or any sensible ballpark figure along this line), would decrease the amount of early allocs significantly without allocating too much bucket memory in advance.Analysis
Although the class is only used internally and not meant for general consumption, it's still possibly worthwhile to provide speed optimizations for it, as it seems it was intended to be heavily optimized instance for quick behind-the-scenes data lifting, may create minor performance bottlenecks in LINQ contexts (e.g. when using https://docs.microsoft.com/en-us/dotnet/api/system.linq.enumerable.distinct?view=net-5.0), especially when used in worst-case scenarios (large number of consecutive longer iteration sequences, resulting in lots of reallocs/hash calculations).
The text was updated successfully, but these errors were encountered: