Feature/2005 large composite disposable perf #2092
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Addresses #2005
This change makes the
CompositeDisposable
far more efficient when it is dealing with large numbers (thousands) ofIDisposable
objects. This matters in "fan out/in" scenarios where items are split into large numbers of groups withGroupBy
, and then later recombined (usually with some per-group processing having been applied) with eitherSelectMany
orMerge
, because these operators useCompositeDisposable
to keep track of their subscriptions to upstream observables (the ones that emerge byGroupBy
in the scenario being described here).Fan out/in is an important technique in Rx.NET, as described at https://introtorx.com/chapters/transformation-of-sequences#the-significance-of-selectmany and https://introtorx.com/chapters/partitioning so we want this to work well even with large numbers of groups.
Before,$O(N^2)$ performance (where $N$ is the number of groups being merged) when the source finally calls
CompositeDisposable
used aList<IDisposable>
internally, which meant that itsRemove
method had to perform a linear scan of the list to find the item to remove. This resulted inOnCompleted
. If you had over 100,000 groups, this could end up taking seconds to complete!With this change, the
CompositeDisposable
continues to use the same strategy as before for small and moderate sized lists because there's no observable performance problem with that, and it is likely to be more efficient for small collections than a strategy optimized for larger collections. And awful lot ofCompositeDisposable
instances end up with only a handful of items, so it's important that it continues to perform well in that scenario.But with large numbers of disposables, it switches to a different strategy in which it uses a dictionary to track disposables.
We use a dictionary instead of the more obvious
HashSet<IDisposable>
becauseCompositeDisposable
has always allowed you add the same disposable instance multiple times. Although it would not be a good idea to depend on this, we've never prevented it, and since this is a public type, some Rx users might depend on it. So the dictionary has anint
value keeping track of how many times each particular disposable has been added. We expect this to be1
in most cases in practice.As the discussion at #2005 (comment) shows, we've added benchmarks that verify that this does address the performance problem described. We've also checked that this doesn't cause any significant change in performance for any other benchmarks. (That's important because
CompositeDisposable
is widely used within Rx.NET, so changes of this kind have the potential to have a broad negative impact. We wanted to make sure we hadn't made normal operation much slower just to improve this one particular and relatively unusual scenario.)