Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented CountBy (Issue 104) #207

Closed
wants to merge 15 commits into from
109 changes: 109 additions & 0 deletions MoreLinq.Test/CountByTest.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
#region License and Terms
// MoreLINQ - Extensions to LINQ to Objects
// Copyright (c) 2016 Leandro F. Vieira (leandromoh). All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#endregion

using NUnit.Framework;
using System;
using System.Collections.Generic;
using System.Linq;
using LinqEnumerable = System.Linq.Enumerable;

namespace MoreLinq.Test
{
[TestFixture]
public class CountByTest
{
[Test]
[ExpectedException(typeof(ArgumentNullException))]
public void CountByWithNullSequence()
{
IEnumerable<int> sequence = null;
sequence.CountBy(x => x % 2 == 0);
}

[Test]
[ExpectedException(typeof(ArgumentNullException))]
public void CountByWithNullProjection()
{
Func<int, bool> projection = null;
Enumerable.Range(1, 10).CountBy(projection);
}

[Test]
public void CountBySimpleTest()
{
var result = new[] { 1, 2, 3, 4, 5, 6, 1, 2, 3, 1, 1, 2 }.CountBy(c => c);

var expectations = new List<KeyValuePair<int, int>>()
{
{ 1, 4 },
{ 2, 3 },
{ 3, 2 },
{ 4, 1 },
{ 5, 1 },
{ 6, 1 },
};

result.AssertSequenceEqual(expectations);
}

[Test]
public void CountByEvenOddTest()
{
var result = Enumerable.Range(1, 100).CountBy(c => c % 2);

var expectations = new List<KeyValuePair<int, int>>()
{
{ 1, 50 },
{ 0, 50 },
};

result.AssertSequenceEqual(expectations);
}

[Test]
public void CountByWithEqualityComparer()
{
var result = new[] { "a", "B", "c", "A", "b", "A" }.CountBy(c => c, StringComparer.OrdinalIgnoreCase);

var expectations = new List<KeyValuePair<string, int>>()
{
{ "a", 3 },
{ "B", 2 },
{ "c", 1 },
};

result.AssertSequenceEqual(expectations);
}

[Test]
public void CountByHasKeysOrderedLikeGroupBy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I change the last line of CountByImpl:

             return keys.Select(k => new KeyValuePair<TKey, int>(k, dic[k]));

to simply return dic instead:

             return dic;

then this test (CountByHasKeysOrderedLikeGroupBy) doesn't fail.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I change the last line of CountByImpl to simply return dic instead then this test (CountByHasKeysOrderedLikeGroupBy) doesn't fail.

In pratice, yes, it doesnt fail (because, currently, dictionary brings the results in order that keys were inserted), but as you once said yourself

dictionaries do not guarantee enumeration in any particular or logical order

So order in enumeration of dictionary can change anyday with a new implementation. To ensure CountBy always return in the order that keys were found the current last line is necessary.

Copy link
Collaborator Author

@leandromoh leandromoh Nov 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a test missing to check that CountBy is lazy.

What CountByIsLazy test should exactly do? If I understood well, the implementation isn't lazy since I must iterate over all the IEnumerable before return any KeyValuePair.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CountBy isn't lazy but it can and should be (like GroupBy). The iterator doesn't need to run until someone iterates the results! You can make it lazy very easily by using yield to return the results. So instead of the following as the last line of CountByImpl:

            return keys.Select(k => new KeyValuePair<TKey, int>(k, dic[k]));

Do instead:

            foreach (var key in keys)
                yield return new KeyValuePair<TKey, int>(key, dic[key]);

Now the compiler will re-write the code to run during iteration, rendering it lazy!

Copy link
Collaborator Author

@leandromoh leandromoh Nov 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But It isn't the same thing since Select is lazy (do exactly the foreach yield) ?

Copy link
Collaborator Author

@leandromoh leandromoh Nov 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how could I write a test that check if the return of CountBy is lazy (for CountByIsLazy) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how could I write a test that check if the return of CountBy is lazy (for CountByIsLazy) ?

See some of the existing tests that test for laziness for inspiration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But It isn't the same thing since Select is lazy (do exactly the foreach yield) ?

Yes, Select is lazy except you compute the result imperatively before you send them back via Select. That computation is happening when CountBy is called when it should happen when the enumerable returned by your method is enumerated by the caller (and which could be some time after).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the explanation. CountByIsLazy was implemented.

{
var randomSequence = MoreEnumerable.Random(0, 100).Take(100).ToArray();

var countByKeys = randomSequence.CountBy(x => x).Select(x => x.Key);
var groupByKeys = randomSequence.GroupBy(x => x).Select(x => x.Key);

countByKeys.AssertSequenceEqual(groupByKeys);
}

[Test]
public void CountByIsLazy()
{
new BreakingSequence<string>().CountBy(x => x.Length);
}
}
}
4 changes: 4 additions & 0 deletions MoreLinq.Test/TestExtensions.cs
Original file line number Diff line number Diff line change
Expand Up @@ -73,5 +73,9 @@ internal static IEnumerable<string> GenerateSplits(this string str, params char[
yield return split;
}

internal static void Add<TKey, TValue>(this IList<KeyValuePair<TKey, TValue>> list, TKey key, TValue value)
{
list.Add(new KeyValuePair<TKey, TValue>(key, value));
}
}
}
120 changes: 120 additions & 0 deletions MoreLinq/CountBy.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
#region License and Terms
// MoreLINQ - Extensions to LINQ to Objects
// Copyright (c) 2016 Leandro F. Vieira (leandromoh). All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#endregion

namespace MoreLinq
{
using System;
using System.Collections.Generic;
using System.Linq;
using System.Runtime.CompilerServices;

static partial class MoreEnumerable
{
/// <summary>
/// Applies a key-generating function to each element of a sequence and returns a sequence of
/// unique keys and their number of occurrences in the original sequence.
/// </summary>
/// <typeparam name="TSource">Type of the elements of the source sequence.</typeparam>
/// <typeparam name="TKey">Type of the projected element.</typeparam>
/// <param name="source">Source sequence.</param>
/// <param name="keySelector">Function that transforms each item of source sequence into a key to be compared against the others.</param>
/// <returns>A sequence of unique keys and their number of occurrences in the original sequence.</returns>
public static IEnumerable<KeyValuePair<TKey, int>> CountBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
return source.CountBy(keySelector, null);
}

/// <summary>
/// Applies a key-generating function to each element of a sequence and returns a sequence of
/// unique keys and their number of occurrences in the original sequence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not mention how this overload is any different that the other & therefore could look odd on a page of summaries. It should mention the additional argument. Considering adding:

An additional argument specifies a comparer to use for testing equivalence of keys.

/// An additional argument specifies a comparer to use for testing equivalence of keys.
/// </summary>
/// <typeparam name="TSource">Type of the elements of the source sequence.</typeparam>
/// <typeparam name="TKey">Type of the projected element.</typeparam>
/// <param name="source">Source sequence.</param>
/// <param name="keySelector">Function that transforms each item of source sequence into a key to be compared against the others.</param>
/// <param name="comparer">The equality comparer to use to determine whether or not keys are equal.
/// If null, the default equality comparer for <typeparamref name="TSource"/> is used.</param>
/// <returns>A sequence of unique keys and their number of occurrences in the original sequence.</returns>
public static IEnumerable<KeyValuePair<TKey, int>> CountBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey> comparer)
{
if (source == null) throw new ArgumentNullException("source");
if (keySelector == null) throw new ArgumentNullException("keySelector");

return CountByImpl(source, keySelector, comparer);
}

static IEnumerable<KeyValuePair<TKey, int>> CountByImpl<TSource, TKey>(
IEnumerable<TSource> source, Func<TSource, TKey> keySelector,
IEqualityComparer<TKey> comparer)
{
List<TKey> keys;
List<int> counts;

// Avoid the temptation to inline the CountByLoop method, which
// exists solely to separate the scope & lifetimes of the locals
// needed for the actual looping of the source & production of the
// results (that happens once at the start of iteration) from
// those needed to simply yield the results. It is harder to reason
// about the lifetimes (if the code is inlined) with respect to how
// the compiler will rewrite the iterator code as a state machine.
// For background, see:
// http://blog.stephencleary.com/2010/02/q-should-i-set-variables-to-null-to.html

CountByLoop(source, keySelector, comparer ?? EqualityComparer<TKey>.Default,
out keys, out counts);

for (var i = 0; i < keys.Count; i++)
yield return new KeyValuePair<TKey, int>(keys[i], counts[i]);
}

static void CountByLoop<TSource, TKey>(IEnumerable<TSource> source,
Func<TSource, TKey> keySelector, IEqualityComparer<TKey> comparer,
out List<TKey> keys, out List<int> counts)
{
var dic = new Dictionary<TKey, int>(comparer);
keys = new List<TKey>();
counts = new List<int>();
var havePrevKey = false;
var prevKey = default(TKey);
var index = 0;

foreach (var item in source)
{
var key = keySelector(item);

if (// key same as the previous? then re-use the index
(havePrevKey && comparer.GetHashCode(prevKey) == comparer.GetHashCode(key)
&& comparer.Equals(prevKey, key))
// otherwise try & find index of the key
|| dic.TryGetValue(key, out index))
{
counts[index]++;
}
else
{
dic[key] = keys.Count;
keys.Add(key);
counts.Add(1);
}

prevKey = key;
havePrevKey = true;
}
}
}
}