Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique / Duplicate operators #125

Closed
JamesFaix opened this issue Nov 4, 2015 · 1 comment · May be fixed by #169
Closed

Unique / Duplicate operators #125

JamesFaix opened this issue Nov 4, 2015 · 1 comment · May be fixed by #169

Comments

@JamesFaix
Copy link

There are several variations on the theme of duplicate record detection that I find myself needing quite often when validating Excel files.

The first method, UniqueKeys just returns all keys (from a given key selector delegate) that only occur once in a collection.

 public static IEnumerable<TKey> UniqueKeys<TSource, TKey>(this IEnumerable<TSource> me,
        Func<TSource, TKey> keySelector, IEqualityComparer<TKey> keyComparer) {
        //check for null arguments

        return me.GroupBy(keySelector, keyComparer)
            .Where(g => g.Count() == 1)
            .Select(g => g.Key);
    }

Likewise, a DuplicateKeys method would be the same, just with "Where(g => g.Count() == 1)" replaced with "Where(g => g.Count() > 1)".

Building on these, are UniqueElements and DuplicateElements methods,

 public static IEnumerable<TSource> DuplicateElements<TSource, TKey>(this IEnumerable<TSource> me,
        Func<TSource, TKey> keySelector, IEqualityComparer<TKey> keyComparer) {
         //check for null args
        return me.ExceptByAll(me.UniqueKeys(keySelector, keyComparer), keySelector, keyComparer);
    }

The DuplicateKeys method can be used, for example, to find a distinct list of ID numbers that are used more than once, while DuplicateElements can be used to return a full list of rows where duplicate ID numbers are used.

The "Elements" methods are dependent on ExceptAll or ExceptByAll being implemented. (see #124 )

@atifaziz
Copy link
Member

This is now covered by CountBy since 2.0, e.g.:

var xs = new[] { 7, 4, 6, 5, 1, 2, 3, 8, 6, 4, 5, 1, 2 };

var duplicates = // 4, 6, 5, 1, 2
    from e in xs.CountBy(x => x)
    where e.Value > 1
    select e.Key;
    
var unique = // 7, 3, 8
    from e in xs.CountBy(x => x)
    where e.Value == 1
    select e.Key;

atifaziz added a commit that referenced this issue Nov 19, 2023
This is a squashed merge of PR #1037 that partially addresses #125.

---------

Co-authored-by: Atif Aziz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants