Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Add IList<T> extension methods providing in-place update operations #76375

Open
eiriktsarpalis opened this issue Sep 29, 2022 · 38 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Collections
Milestone

Comments

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Sep 29, 2022

Background and motivation

The List<T> type provides a number of methods providing bulk in-place updates, such as RemoveAll, RemoveRange, Reverse and Sort. The same cannot be said about IList<T> where common in-place operations are cumbersome to achieve. Consider for instance the following System.Text.Json example used in our own docs repo:

https://github.com/dotnet/docs/blob/6c6942bcb4594dc063db293e5fff33c6bea0b331/docs/standard/serialization/system-text-json/snippets/custom-contracts/IgnoreType.cs#L26-L38

A possible alternative to the above would be to roll your own extension method, but it would be even better if we shipped extension methods like that out of the box.

API Proposal

namespace System.Collections.Generic;

public partial static class CollectionExtensions
{
    public static void AddRange<T>(this IList<T> list, IEnumerable<T> collection);
    public static void InsertRange<T>(this IList<T> list, int index, IEnumerable<T> collection);
    public static void RemoveAll<T>(this IList<T> list, Predicate<T> predicate);
    public static void RemoveRange<T>(this IList<T> list, int index, int count);
    public static void Reverse<T>(this IList<T> list);

    public static void Sort<T>(this IList<T> list);
    public static void Sort<T>(this IList<T> list, Comparison<T> comparison);
    public static void Sort<T>(this IList<T> list, IComparer<T> comparer);
    public static void Sort<T>(this IList<T> list, int index, int count, IComparer<T> comparer);
}

API Usage

Removing properties of a given type in System.Text.Json can now be expressed as follows:

    public void ModifyTypeInfo(JsonTypeInfo ti)
    {
        if (ti.Kind != JsonTypeInfoKind.Object)
            return;

        ti.Properties.RemoveAll(prop => _ignoredTypes.Contains(prop.PropertyType));
    }
}

Alternative Designs

  • Should we consider exposing as DIMs?
  • Should the Sort implementation be stable? Or match List/Array.Sort semantics?
  • Should we include BinarySearch extension methods?

Risks

No response

Related to dotnet/core#2199

@eiriktsarpalis eiriktsarpalis added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Sep 29, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 29, 2022
@ghost
Copy link

ghost commented Sep 29, 2022

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

The List<T> type provides a number of methods providing bulk in-place updates, such as RemoveAll, RemoveRange, Reverse and Sort. The same cannot be said about IList<T> where common in-place operations are cumbersome to achieve. Consider for instance the following System.Text.Json example used in our own docs repo:

https://github.com/dotnet/docs/blob/6c6942bcb4594dc063db293e5fff33c6bea0b331/docs/standard/serialization/system-text-json/snippets/custom-contracts/IgnoreType.cs#L26-L38

A possible alternative to the above would be to roll your own extension method, but it would be even better if we shipped extension methods like that out of the box.

API Proposal

namespace System.Collections.Generic;

public partial static class CollectionExtensions
{
    public static void RemoveAll<T>(this IList<T> list, Predicate<T> predicate);
    public static void RemoveRange<T>(this IList<T> list, int index, int count);
    public static void Reverse<T>(this IList<T> list);

    // List.Sort is not stable -- should this extension be stable?
    public static void Sort<T>(this IList<T> list);
    public static void Sort<T>(this IList<T> list, Comparison<T> comparison);
    public static void Sort<T>(this IList<T> list, IComparer<T> comparer);
    public stsatic void Sort<T>(this IList<T> list, int index, int count, IComparer<T> comparer);

    // TOCONSIDER BinarySearch extension methods?
}

API Usage

Removing properties of a given type in System.Text.Json can now be expressed as follows:

    public void ModifyTypeInfo(JsonTypeInfo ti)
    {
        if (ti.Kind != JsonTypeInfoKind.Object)
            return;

        ti.Properties.RemoveAll(prop => _ignoredTypes.Contains(prop.PropertyType));
    }
}

Alternative Designs

  • Should we consider exposing as DIMs?
  • Should the Sort implementation be stable? Or match List/Array.Sort semantics?
  • Should we include BinarySearch extension methods?

Risks

No response

Author: eiriktsarpalis
Assignees: -
Labels:

api-suggestion, area-System.Collections

Milestone: -

@stephentoub
Copy link
Member

Should the Sort implementation be stable? Or match List/Array.Sort semantics?

If we add such extension methods, they should have the same semantics as List<T>. Otherwise, changing code from:

IList<T> list = new List<T>() { ... };
list.Sort(...);

to:

List<T> list = new List<T>() { ... };
list.Sort(...);

is a very subtle and unexpected source of bugs.

If we update List<T>/Array.Sort to be stable, then such extensions should follow suit as well.

@eiriktsarpalis
Copy link
Member Author

If we update List/Array.Sort to be stable

Assuming it doesn't regress perf we could do that, to my knowledge there is no stable in-place sorting implementation available.

@eiriktsarpalis
Copy link
Member Author

And if not, we might consider using a different name, e.g. StableSort.

@stephentoub
Copy link
Member

Assuming it doesn't regress perf we could do that, to my knowledge there is no stable in-place sorting implementation available.

There's been a long discussion of this in #68436.

@eiriktsarpalis eiriktsarpalis added this to the 8.0.0 milestone Sep 29, 2022
@eiriktsarpalis eiriktsarpalis self-assigned this Sep 29, 2022
@eiriktsarpalis eiriktsarpalis added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation untriaged New issue has not been triaged by the area owner labels Sep 29, 2022
@jkotas
Copy link
Member

jkotas commented Oct 4, 2022

How are the Sort methods expected to be implemented? How efficient is the implementation going to be and is it going to require yet another copy of the whole Sort algorithm?

In-place sorting of IList will require a virtual method call for each element get and set. It expect that it is often going to be faster to convert the IList to array first and then sort the array.

@eiriktsarpalis
Copy link
Member Author

eiriktsarpalis commented Oct 4, 2022

In-place sorting of IList will require a virtual method call for each element get and set. It expect that it is often going to be faster to convert the IList to array first and then sort the array.

Possibly. We might consider using an implementation that sorts using a temporary buffer, i.e.

public static void Sort<T>(this IList<T> list, IComparer<T>? comparer = null)
{
    T[] array = new T[list.Count];
    list.CopyTo(array, 0);

    Array.Sort(array, comparer);

    list.Clear();
    for (int i = 0; i < array.Length; i++)
        list[i] = array[i];
}

This might reduce the number of virtual calls, but does more allocations and copying. Benchmarking that vs. a bespoke IList<T> sorting algorithm would help determine a possible approach.

@Clockwork-Muse
Copy link
Contributor

... for that matter, checking if it is an actual List<T> would probably be reasonable, given how commonly it's used.

@eiriktsarpalis
Copy link
Member Author

Checking for List/Array would definitely be added to the implementation if we do follow that approach.

@terrajobst
Copy link
Member

terrajobst commented Oct 4, 2022

Video

  • It seems the proposal is mostly about convenience, not performance. As such we don't think the cons outweigh the pros
  • Cons
    • The proposal is equivalent to a DIM in that they there is a default implementation that works for any case. The performance of these methods, whether they are DIM'ed or extension methods, would be hard to reason about it.
    • Furthermore, without DIMs the only fast implementation could be in the same assembly (or lower) as CollectionExtension (which lives in corlib).
  • It seems consensus at this point is mixed. DIMs would be ideal, but some folks feel extension methods would be "good enough", give that these interfaces making perf a secondary concern.
  • For .NET 8, we should investigate the usage of DIMs in core types to see whether it's viable.
namespace System.Collections.Generic;

public partial static class CollectionExtensions
{
    public static void InsertRange<T>(this IList<T> list, int index, IEnumerable<T> collection);
    public static void RemoveAll<T>(this IList<T> list, Predicate<T> predicate);
    public static void RemoveRange<T>(this IList<T> list, int index, int count);
    public static void Reverse<T>(this IList<T> list);

    public static void Sort<T>(this IList<T> list);
    public static void Sort<T>(this IList<T> list, Comparison<T> comparison);
    public static void Sort<T>(this IList<T> list, IComparer<T> comparer);
    public static void Sort<T>(this IList<T> list, int index, int count, IComparer<T> comparer);
}

@terrajobst terrajobst added api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Oct 4, 2022
@Joe4evr
Copy link
Contributor

Joe4evr commented Oct 5, 2022

One thing that came to mind as I was watching the API review: Introducing Reverse might be source-breaking in certain existing cases as either it could tie overload resolution with LINQ's Reverse, or if it was introduced as a DIM on IList<T> it could win resolution and break LINQ method chains.

At least, I think that could happen, please correct me if I'm wrong.

@stephentoub
Copy link
Member

Introducing Reverse might be source-breaking in certain existing cases

Yes, although that's the case pretty much any time we add any instance method or extension method. For instance methods, if someone else had named an extension method the same thing, the newly-added instance method would start winning. And for extension methods, if someone else also has an extension method with the same shape, it would introduce ambiguities that break compilation.

@krwq
Copy link
Member

krwq commented Oct 6, 2022

Possibly they could live in the different namespace to avoid this problem (i.e. System.Collections.Generic.Extensions)
as a side note what does DIM stand for? (dimension doesn't seem to make sense in the context)

@stephentoub
Copy link
Member

what does DIM stand for

Default Interface Methods... basically adding virtual methods to an interface.

@eiriktsarpalis eiriktsarpalis modified the milestones: 8.0.0, Future Oct 6, 2022
@eiriktsarpalis eiriktsarpalis removed their assignment Oct 6, 2022
@GSPP
Copy link

GSPP commented Oct 13, 2022

Collection capabilities could be added with interfaces such as:

public interface ISortable //No base type
{
    void Sort(...);
}

And CollectionExtensions.Sort<t>(IList<T> list, ...) would then try to cast and delegate to an efficient implementation. Note that this interface does not derive from IList<T>. This design does not require crafting elaborate type hierarchies.

Conceptual sample implementation:

public static void Sort<T>(this IList<T> list, ...)
{
    if (list is ISortable<T> sortable) sortable.Sort(...);
    else SortIList(list); //Generalized algorithm for any IList<T>
}

This idea could be applied to other algorithms such as binary search as well. Others that come to mind: ToList, ToArray, the mutation operations for ranges from the opening post, copying to and from spans, linear search, ... All of these can then break through the abstraction and operate on the internal representation more efficiently.

This has some overheads but it results in reasonable behavior and performance in most cases. It is hard for users to really go wrong with this. Invocations on small data sets will feel some overhead, but performance at scale will be near optimal both asymptotically as well as practically.

This should also play nice with overload resolution. The compiler will automatically prefer ways of calling that have less indirection in them. It also should be a highly compatible design. It's easy to upgrade types and add new algorithms.

@KrzysztofCwalina
Copy link
Member

KrzysztofCwalina commented Oct 19, 2023

IList<T> extension AddRange is very important for the Azure SDK. We expose 90%+ of model collection properties as IList<T> and very often our users complain they have no convenient and efficient* way to add ranges.

* I would love if the implementation attempted to cast to List<T> and used List<T>.AddRange in the fast path branch.

@stephentoub
Copy link
Member

@CyrusNajmabadi, @cston, if we add an AddRange extension method on IList<T> or a DIM IList<T>.AddRange method, what implications will that have on collection expressions?

@cston
Copy link
Member

cston commented Oct 19, 2023

what implications will that have on collection expressions?

When constructing a collection for a collection expression targeting IList<T> or a type that implements IList<T>, the compiler may use an AddRange extension method or instance method.

Currently, the compiler is not using AddRange methods when constructing collections from collection expressions, but we expect to use AddRange in the future, particularly when spreads are involved:

class MyList<T> : IList<T> { ... }

IEnumerable<int> x = [1, 2, 3];
MyList<int> y = [..x]; // y = new MyList(); ((IList<int>)y).AddRange(x);

@terrajobst
Copy link
Member

IList<T> extension AddRange is very important for the Azure SDK. We expose 90%+ of model collection properties as IList<T>

What is the type of the underlying instance? According to our guidelines, we don't expect libraries to expose their data as IList<T> but as some kind of class that can expose additional APIs.

@terrajobst
Copy link
Member

@stephentoub @cston why does it matter if IList<T> would have an extension method? I'd assume the compiler would have to decide which type to instantiate (which I'd assume is List<T>) and what would matter for optimizations is that which APIs the concrete type would expose, rather than the target type, no?

@stephentoub
Copy link
Member

@terrajobst, it's about populating the destination type. If you have:

class Foo : IList<T>
{
    public Foo() { }
    public void Add(T item) { ... }
    ... // IList implementation
}

and someone writes

IEnumerable<T> data = ...;
Foo foo = [..data];

does the C# compiler generate:

Foo foo = new Foo();
foreach (T item in data) foo.Add(item);

or:

Foo foo = new Foo();
CollectionExtensions.AddRange(foo, data);

or...

@terrajobst
Copy link
Member

Right, so what matters is whether Foo (an instantiable type) has AddRange, not some interface abstraction (such as IList<T>) does, right?

@stephentoub
Copy link
Member

Right, so what matters is whether Foo (an instantiable type) has AddRange, not some interface abstraction (such as IList) does, right?

I don't understand. Here Foo implements IList<T> and does not expose an AddRange. My question to Chuck was exactly whether the compiler will use an extension AddRange on such a type in that case.

But there's also the case where the target type is an interface, e.g. this is perfectly legal:

IList<T> list = [..data];

and the question there applies as well: would the compiler use an extension AddRange.

Chuck's answer was that the planned answer to both questions is "yes".

@terrajobst
Copy link
Member

Ah, got it. That makes sense.

I'd hope that the compiler would prefer an AddRange() method on the concrete type over an AddRange() extension method though, just like it generally binds when both exist.

@KrzysztofCwalina
Copy link
Member

KrzysztofCwalina commented Oct 19, 2023

What is the type of the underlying instance? According to our guidelines, we don't expect libraries to expose their data as IList but as some kind of class that can expose additional APIs.

In most cases it's List<T>, but sometimes a custom internal types. BTW, the reason we do it is that we want to be able to switch implementation from List<T> to a collection that can observe changes made to it. And this is the reason we don't follow the guideline, which I have to say I was not even aware of, nor I understand why we would have it.

@terrajobst
Copy link
Member

terrajobst commented Oct 19, 2023

See these, specifically

✔️ DO use Collection or a subclass of Collection for properties or return values representing read/write collections.

@KrzysztofCwalina
Copy link
Member

Ah, yeah. That one. The biggest regret of my life :-)

@terrajobst
Copy link
Member

Seems to cut both ways though. Often times where we shipped a collection property as IList<Something> there were issues along the lines of "man I wish I could add a method or property to this thing". Another problem we saw is that people (sometimes inadvertently) downcast the interface to the concrete type which made it hard for us to change the concrete type later, but this can be mitigated by using an internal type that can't be downcast.

@CyrusNajmabadi
Copy link
Member

Pulling in @jnm2 @cston @RikkiGibson and @captainsafia for the collection-expr side of this.

I have thoughts (and my initial thinking is that this is actually a good thing for collection-exprs). But i'm heads down on fire, so i'll write things out later :)

@CyrusNajmabadi
Copy link
Member

Ok. So the topic of AddRange has come up for when C#/roslyn sees something like this:

// Where SomeDestCollection does *not* have the CollectionBuilderAttribute on it.
SomeDestCollection x = [x, y, z, .. w, .. v];

The default emit for this will effectively be:

SomeDestCollection __result = new();
__result.EnsureCapacity(3 + __w.Length + __v.Length); // if EnsureCapacity exists and `w/v` are statically known to expose lengths.
__result.Add(__x);
__result.Add(__y);
__result.Add(__z);
foreach (var __t in __w)
  __result.Add(__t);
foreach (var __t in __v)
  __result.Add(__t);

However, if we do find available AddRange methods we are permitted to use them, though the exact circumstances by which the compiler will do so are still up in the air and need thinking on when they would be appropriate. But, if they were available, the compiler could potentially generate something like teh following:

SomeDestCollection __result = new();
__result.EnsureCapacity(3 + __w.Length + __v.Length); // if EnsureCapacity exists and `w/v` are statically known to expose lengths.
__result.AddRange(new InlineArray3<> { __x, __y, __z }.AsSpan()); // If there's an AddRange that takes spans.
__result.AddRange(__w); // if there are AddRange's that support the instances being spread.
__result.AddRange(__v);

However, we do have some general concerns about this. Consider a real world example:

HashSet<int> hashSet = ...;
List<int> list = [1, 2, 3, .. hashSet];

With this proposal, we would now see an AddRange on List<T> that could take that hashSet. However, would that actually be a good thing to call? Instead of a no-alloc approach to adding all the elements (by enumerating hashSet in the caller), this would call into the IEnumerable codepath, which woudl allocate the IEnumerator and would be doing virtual calls on the iteration.

As such, while we're still only in the preliminary thinking stage, we're leaning toward only using these helpers when the exact types match (so we're not losing information, or the ability to enumerate without allocation), or perhaps when we know the input types, and we're calling some well known BCL AddRange method.

For example, if we knew the above AddRange was fine when you passed in a List<T> (Because it type checked for that type under the covers) and we knew we were passing a List<T>, then we might consider it ok. The LDM has 100% approved that teh compiler can take what it knows about the BCL and optimize heavily given the latest shipping BCL impl. We will, however, likely be more conservative with user patterns since we don't know if calling them might actually cause a negative impact on final performance.

@jnm2
Copy link
Contributor

jnm2 commented Oct 20, 2023

I'd be hesitant to let the compiler call an AddRange<T>(IList<T>) extension method since, in all cases I've thought of, it's pure overhead compared to what the compiler could be doing instead. That overhead includes:

  • The boxing of any of the arguments to AddRange
  • Any type checks that the extension method might do, like is List<T>
  • Interface IList<T>.Add calls instead of concrete Add calls on the collection type being constructed

However, it is less IL emitted at the collection expression site, and I'm not sure if that's interesting as a code size benefit.

@Neme12
Copy link

Neme12 commented Apr 1, 2024

Instead of extension methods, maybe these could be sealed methods directly on the interfaces? That would add the opportunity to later make them defaulted/virtual. Or would that be a breaking change?

For .NET 8, we should investigate the usage of DIMs in core types to see whether it's viable.

They could be non-DIM on the interface, if we're afraid of DIMs on core types.

It would also mean they don't show up on concrete implementations, thus alleviating the compiler concern.

@Neme12
Copy link

Neme12 commented Apr 1, 2024

It seems the proposal is mostly about convenience, not performance.

I usually define these extension methods so that they could be specialized for common types like List<T>, thus potentially having better perf than inserting manually one by one, depending on the implementation.

@Neme12
Copy link

Neme12 commented Apr 1, 2024

Often times where we shipped a collection property as IList<Something> there were issues along the lines of "man I wish I could add a method or property to this thing".

I mean that's what DIMs are for 😄

I think it's really time for the DIMs. They are already used all over the place for core primitive types like Int32 due to generic math. They already broke C++/CLI and they already fixed it. This can't be more breaking or more dangerous than adding interfaces with static abstracts, DIMs and operators to core types like Int32. The band-aid was already ripped off.

@Neme12
Copy link

Neme12 commented Apr 1, 2024

I wouldn't add all of these overloads as DIMs though:

    public static void Sort<T>(this IList<T> list);
    public static void Sort<T>(this IList<T> list, Comparison<T> comparison);
    public static void Sort<T>(this IList<T> list, IComparer<T> comparer);
    public static void Sort<T>(this IList<T> list, int index, int count, IComparer<T> comparer);

I think only one of them should be a DIM and virtual, probably the last one. The rest should be non-virtual (sealed) and delegate to that one.

@Neme12
Copy link

Neme12 commented Apr 1, 2024

I don't agree that we shouldn't add the default implementation because it would be slow. It would be no more slower than what people can already do today on top of the interface manually (and what they do do), or what they could possibly ever do on top of the interface. If we had a second interface like IBulkOperationsList<T>, then anyone who wanted to e.g. InsertRange would have to check if the list is an IBulkOperationsList<T> and if it is, use InsertRange, otherwise do it manually on top of Insert. At that point, you might as well have a single implementation on top of Insert without the check for the code to be simpler. This would defeat the entire purpose of the API - it should do this check for you and forward to the best possible implementation.

EDIT:

At that point, you might as well have a single implementation on top of Insert without the check for the code to be simpler.

Or you'd add your own extension method that does the appropriate thing. At which point, we might as well have the DIM, since this would be equivalent to that.

Actually, even the default DIM would be much better than people doing their own custom logic or their own extension methods like today, because they will probably implement it naively by repeatedly doing Insert or Remove from the middle when it's possible to implement it in a better way by copying and removing from the end.

@kronic
Copy link
Contributor

kronic commented Jun 12, 2024

Maybe it's worth adding more methods

public void Move(int oldIndex, int newIndex);
public void MoveRange(int fromIndex, int toIndex, int count);

@julealgon
Copy link

See these, specifically

✔️ DO use Collection or a subclass of Collection for properties or return values representing read/write collections.

@terrajobst is there more context on why this specific guideline was created? I've seen it before but I'll be honest in saying I never follow it, instead preferring to return either ICollection or IList or some other interface like the read-only or immutable variants.

I'm aware there is some overhead with the fact it's an interface (with the virtual calls and such), but does the advice still hold as strongly as it did way back when it was defined at the "beginnings" of .NET, considering all the advancements made to performance? Were there more concerns other than performance itself?


Collection capabilities could be added with interfaces such as:

public interface ISortable //No base type
{
    void Sort(...);
}

I noticed nobody commented on this proposal from @GSPP . It was one of the first things that came to my mind when reading this proposal too. Is this not a good approach for some reason?

Maybe there would need to be a distinction between "sortable in-place" vs the standard concept of being sortable which is achievable just by implementing IComparable though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Collections
Projects
None yet
Development

No branches or pull requests