Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Await: asynchronous evaluation of sequence elements #205

Merged
merged 62 commits into from
Apr 9, 2018
Merged

Conversation

atifaziz
Copy link
Member

@atifaziz atifaziz commented Oct 27, 2016

This PR is just an idea, an RFC to solicit feedback. I'm not even sure this belongs in MoreLINQ as it may expand the scope of the library to cover async scenarios.

var beatles = new[]
{
    "John Lennon",
    "Paul McCartney",
    "George Harrison",
    "Ringo Starr",
};

var http = new HttpClient();
var results = await beatles.SelectAsync(
    async e => await http.SendAsync(new HttpRequestMessage(HttpMethod.Get, "https://en.wikipedia.org/wiki/" + e.Replace(' ', '_'))),
    (e, rsp) => new
    {
        Topic = e,
        rsp.StatusCode,
        rsp.Content.Headers.ContentLength,
        rsp.Content.Headers.ContentType.MediaType,
    });

foreach (var e in results)
    Console.WriteLine(e.ToString());

Output is:

{ Topic = John Lennon, StatusCode = OK, ContentLength = 516857, MediaType = text/html }
{ Topic = Paul McCartney, StatusCode = OK, ContentLength = 754121, MediaType = text/html }
{ Topic = George Harrison, StatusCode = OK, ContentLength = 591170, MediaType = text/html }
{ Topic = Ringo Starr, StatusCode = OK, ContentLength = 476451, MediaType = text/html }

@fsateler
Copy link
Member

fsateler commented Oct 27, 2016

Interesting. Some thoughts, in no particular order

  • This method consumes (and executes the tasks of) the entire source sequence. More general scenarios will likely require a maximum number of tasks to be executed at the same time.
  • If it is named SelectAsync, it should probably mimic Select and return just the TAsync instead of a KeyValuePair.
  • The method is not streaming, as it will wait for all tasks to finish before yielding the first result. This is another difference from plain Select.

While this is very useful, I'm unsure this is a good fit for MoreLINQ and it should instead be put in a separate library. Processing of async sequences is likely to require a lot more functions and functionality, and I don't know if this will end up polluting the MoreLINQ library with either more dependencies, or reimplementation of functionality available elsewhere[1]. Perhaps putting this into a separate dll (MoreAsyncLINQ.dll ?), within this same repository makes sense.

[1] For example, I made my own implementation of SelectAsync by building on top of Rx:

public static IEnumerable<TResult> SelectAsync<TSource, TResult>(this IEnumerable<TSource> src, Func<TSource, Task<TResult>> asyncSelector, int maxConcurrency) {
    // Merge iterates eagerly over the source sequence
    // However, Observable.FromAsync will only execute the function when it is subscribed to
    // Therefore, we *must* call the async selector inside the function, otherwise the maxConcurrency is useless
    return src.Select(s => Observable.FromAsync(() => asyncSelector(s))).Merge(maxConcurrency).ToEnumerable();
}

@atifaziz
Copy link
Member Author

atifaziz commented Oct 27, 2016

@fsateler Thanks for all that feedback! The initial implementation was a very poor crack at the problem. The latter commits address all of your points & SelectAsync now pretty much looks (signature-wise) and behaves (streams) like Select.

The updated usage now looks like this:

var beatles = new[]
{
    "John Lennon",
    "Paul McCartney",
    "George Harrison",
    "Ringo Starr",
};

var http = new HttpClient();
var results =
    from e in beatles.SelectAsync(async e => new
    { 
        Topic = e,
        Response = await http.SendAsync(new HttpRequestMessage(HttpMethod.Get, "https://en.wikipedia.org/wiki/" + e.Replace(' ', '_')))
    })
    select new
    {
        e.Topic,
        e.Response.StatusCode,
        e.Response.Content.Headers.ContentLength,
        e.Response.Content.Headers.ContentType.MediaType,
    };

foreach (var e in results)
    Console.WriteLine(e.ToString());

Processing of async sequences is likely to require a lot more functions and functionality,

Not sure. Take a look at the code now.

…will end up polluting the MoreLINQ library with either more dependencies, or reimplementation of functionality available elsewhere.

That's what I'm afraid of too thus why I consider this PR as an RFC (should have a label for that?) for now.

Perhaps putting this into a separate dll (MoreAsyncLINQ.dll ?), within this same repository makes sense.

I don't want to create & maintain another whole library for just one method in the async category. Perhaps a MoreLinq.Sandbox would be better for all ideas that are cooking and unsupported?

I made my own implementation of SelectAsync by building on top of Rx

Thanks for sharing that and I had done something very similar in the past in projects where I already had a dependency on Rx. However, for projects that don't, I felt it may be possible to implement it simply using existing infrastructure from the framework.

@fsateler
Copy link
Member

I consider this PR as an RFC (should have a label for that?)

There is already a discussion label. Maybe this tag can be used for this too?

I don't want to create & maintain another whole library for just one method in the async category. Perhaps a MoreLinq.Sandbox would be better for all ideas that are cooking and unsupported?

This sounds like a good idea, but orthogonal to what I proposed.

I made my own implementation of SelectAsync by building on top of Rx

Thanks for sharing that and I had done something very similar in the past in projects where I already had a dependency on Rx. However, for projects that don't, I felt it may be possible to implement it simply using existing infrastructure from the framework.

It is certainly possible. But, then we get back to the most important issue:

…will end up polluting the MoreLINQ library with either more dependencies, or reimplementation of functionality available elsewhere.

That's what I'm afraid of too

.

Processing of async sequences is likely to require a lot more functions and functionality,

Not sure. Take a look at the code now.

What I mean is that mixing async code and sequences is usually more complex than just having a SelectAsync. Why not implement WhereAsync, SelectManyAsync, JoinAsync, etc too? Another example: the code now no longer preserves order of the source sequence. Should there be a new overload that does?
If there is potential for adding more async-related methods, I suspect the amount of reimplementation of stuff already present in other libraries will increase..

var queue = new BlockingCollection<object>();
using (var _ = source.GetEnumerator())
{
var item = _;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this indirection?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To stop ReSharper from complaining about potentially accessing a disposed closure. I found this was good enough for now instead of polluting the code with suppression comments.

if (maxConcurrency <= 0) throw new ArgumentOutOfRangeException(nameof(maxConcurrency));
if (selector == null) throw new ArgumentNullException("selector");

var queue = new BlockingCollection<object>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use Tuple<ExceptionDispatchInfo, Task<TResult>> instead of object? Then, use an approach like the one in Go or Javascript: The first item is null if no error ocurred, the second item is null if an error occurred.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly how it started out! Don't believe me? See my Throttle implementation that was the basis for SelectAsync. In the end, I felt that Tuple represents a product and not a union/either/choice. With a tuple, you can get invalid combinations & while both approaches need if-ing on the various cases, using a super type like object as a union of the run-time types just seemed better as a shortcut. Ideally I'd have Notification<T> handy because that's exactly what's needed here but I didn't want to go out and add new types as I'm in two minds of adding this to MoreLINQ.

@fsateler
Copy link
Member

What I mean is that mixing async code and sequences is usually more complex than just having a SelectAsync. Why not implement WhereAsync, SelectManyAsync, JoinAsync, etc too? Another example: the code now no longer preserves order of the source sequence. Should there be a new overload that does?

For avoidance of doubt, I'm not suggesting you should provide these methods for SelectAsync to be useful. The questions are mostly thinking out loud trying to discover if the async-related methods will grow to a sizable number.

@atifaziz
Copy link
Member Author

atifaziz commented Oct 27, 2016

Another example: the code now no longer preserves order of the source sequence. Should there be a new overload that does?

Select in LINQ to SQL and PLINQ does not preserve order either:

In PLINQ, the goal is to maximize performance while maintaining correctness. A query should run as fast as possible but still produce the correct results. In some cases, correctness requires the order of the source sequence to be preserved; however, ordering can be computationally expensive. Therefore, by default, PLINQ does not preserve the order of the source sequence. In this regard, PLINQ resembles LINQ to SQL, but is unlike LINQ to Objects, which does preserve ordering.

I think it's understood that there needs to be a balancing act here. If streaming is the greater benefit then order can't be guaranteed and if order is important then you need to follow-up with an OrderBy.

I have the same concerns as you about the proliferation of other async-enabled operators, which is why I continue to be in two minds about adding this to MoreLINQ. I reckon though that SelectAsync will be the most popular (the primary motivation being use of async lambdas in existing LINQ queries) & the rest may just fall under YAGNI. If anyone is doing anything more sophisticated then they'll probably be using Rx in the first place. Instead of guessing & worrying, I think the right answer to put all these fears to rest may be to have another library for hosting experiments, be that MoreLinq.Sandbox or MoreLinq.Labs. And it doesn't have to have tests or rigorous review.

@atifaziz
Copy link
Member Author

Some open points:

  • What happens to tasks that are in-flight when the iterator is closed or an error is thrown by one of the asynchronous operations?
  • How would one support cancellation for the entire enumeration? The cancellation token would have to be re-created on each enumeration for the query to have repeatable semantics. Would cancellation be ever needed?

@atifaziz atifaziz changed the title SelectAsync to asynchronously pairs sequence elements with their projections SelectAsync to asynchronously project elements of a sequence Oct 27, 2016
@atifaziz
Copy link
Member Author

The IDisposable.Dispose docs clearly state that subsequent calls should be ignored:

If an object's Dispose method is called more than once, the object must ignore all calls after the first one.

So it is safe (or not incorrect) to assume so.

@atifaziz
Copy link
Member Author

atifaziz commented Oct 28, 2016

What happens to tasks that are in-flight when the iterator is closed or an error is thrown by one of the asynchronous operations?

There are now new overloads where the projection function receives an additional argument that's a CancellationToken. The cancellation token is signaled when either the iteration is stopped early (e.g. SelectAsync is combined with Take or TakeWhile) or the projection of an element throws an error, thus allowing in-flight tasks to cancel.

Below is an example using an overload that sends a CancellationToken to the projection function.

var beatles = new[]
{
    "John Lennon",
    "Paul McCartney",
    "George Harrison",
    "Ringo Starr",
};

var http = new HttpClient();
var results =
    from e in beatles.SelectAsync(async (e, ct) =>
    {
        try
        {
            var url = "https://en.wikipedia.org/wiki/" + e.Replace(' ', '_');
            return new
            { 
                Topic = e,
                Response = await http.SendAsync(new HttpRequestMessage(HttpMethod.Get, url), ct),
            };
        }
        catch (OperationCanceledException)
        {
            Console.WriteLine("ABORT! " + e);
            throw;
        }
    })
    select new
    {
        e.Topic,
        e.Response.StatusCode,
        e.Response.Content.Headers.ContentLength,
        e.Response.Content.Headers.ContentType.MediaType,
    };

foreach (var e in results.Take(2))
    Console.WriteLine(e.ToString());

Because we only care about 2 results (note the Take(2) in the foreach loop at the end), one possible output is:

{ Topic = John Lennon, StatusCode = OK, ContentLength = 518001, MediaType = text/html }
{ Topic = George Harrison, StatusCode = OK, ContentLength = 591180, MediaType = text/html }
ABORT! Ringo Starr
ABORT! Paul McCartney

Since SelectAsync does not respect the source order, the output above will vary with each run.

# Conflicts:
#	MoreLinq.Test/project.json
#	MoreLinq/project.json
System.Collections.Concurrent is available in .NET Standard 1.1 and
above, and we don't want to add a whole new target for that just now
since it is covered by the .NET Standard 2.0 target.
@atifaziz
Copy link
Member Author

atifaziz commented Mar 6, 2018

but I haven't been able to come up with better alternatives. CollectTasks, Wait, Await don't sound any better to me.

SelectTask or SelectAwaitable also come to mind but I fear they won't bring much more clarity. Is Fetch so bad? Do we need to really make anything about tasks or awaiting obvious in the name?

BTW, since I never explicitly said this before: I like the overall shape, and I think this can be a useful addition to MoreLINQ.

Cool and I'd like to work towards getting moving it out of the experimental space. If we can settle on a name and release it as experimental then there'll be time to work out unit tests and iron out any kinks from battle-testing.

@atifaziz
Copy link
Member Author

atifaziz commented Mar 6, 2018

CollectTasks, Wait, Await don't sound any better to me.

I realised that it is a bit useless to have a projection function in the first overload. The method can be reduced to simply being an extension of a sequence of tasks (IEnumerable<Task<T>>). See what it gives in 3819d3d. What's also interesting is that it then made sense to simply call the method Await, as one of your suggestions! This is either a good thing or it means that the first overload should be removed because no one in their right mind will use it; it'll waste work if the sequence is not fully consumed. The second overload makes you think about cancellation.

atifaziz added 3 commits March 7, 2018 07:25
This reverts commit d2857e8 that was
partially complete.
This is what commit d2857e8 should
have been.
@atifaziz
Copy link
Member Author

atifaziz commented Mar 7, 2018

With 09497dc, I'm proposing to rename SelectAsync to Await entirely. With some careful re-wording, I think it brings about more clarity. That is, SelectAsync was possibly the wrong name all along because the function in the following overload felt like a projection when it's really an evaluation.

public static IAwaitQuery<TResult> Await<T, TResult>(
    this IEnumerable<T> source,
    Func<T, CancellationToken, Task<TResult>> evaluator)

It's purpose is not as much to project (TTResult) as it is to simply supply or inject the CancellationToken into the evaluation in order to abort it if the iteration is terminated prematurely.

@fsateler What do you think? Am I twisting things to force retrofitting a definition that favours Await as a name or does it also make sense to you, and clarify/scope things better?

@atifaziz
Copy link
Member Author

atifaziz commented Apr 4, 2018

@fsateler Did I miss anything from the changes you requested in your last review? Wondering what's keeping to ship this as an experiment?

Copy link
Member

@fsateler fsateler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new names better.

I think this is ready to go into the Experimental namespace.

@atifaziz atifaziz added this to the 3.0.0 milestone Apr 5, 2018
@atifaziz atifaziz changed the title SelectAsync to asynchronously project elements of a sequence Await: asynchronous evaluation of sequence elements Apr 5, 2018
@atifaziz atifaziz modified the milestones: 3.0.0, 3.0.0 βeta 1 Apr 5, 2018
@atifaziz atifaziz merged commit 3e53e03 into master Apr 9, 2018
@atifaziz atifaziz deleted the SelectAsync branch April 9, 2018 09:17
@MrSmoke MrSmoke mentioned this pull request Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants