Clean Near-Unmaintainable Parallel Logic #53593

Alex-ABPerson · 2021-06-02T10:53:19Z

The problem

The code within Parallel has remained extremely similar since its introduction all the way back in Framework. And over the years it has become increasingly less maintainable, to the point that someone who worked on it may have even left a comment on their difficulties working with it (see below).

There are only four methods that hold practically the entire behaviour of Parallel:

Invoke
ForWorker<TLocal>
ForWorker64<TLocal>
ForEachWorker<TSource, TLocal>

All of the features, thread locals, parallel loop state changes, passing indices to methods etc. are all performed by these four methods for every mode, and simply enabled or disabled by making arguments on them as null.

This is a perfectly fine idea if executed properly, but as it is the methods are quite a mess. Almost all of it has been packed into a single, huge, method, for all four modes. There's no structuring, none of it has been split up and organised into smaller sub-methods, there's loads of seemingly duplicated logic across all four modes, it's all just become one big slew of logic from start to finish.

It's possible the issue is bad enough that a developer has even left a comment about it:

runtime/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/Parallel.cs

Line 3245 in ff2507c

// I'm not going to try to maintain this

This comment may be talking from the code's perspective saying that "this task won't maintain this value" or something along those lines, or it may literally be talking from the developer's perspective saying that they're just going to do this because they're not really sure of the exact implications, which given the state of the method isn't an unreasonable possibility. It's not entirely clear which it is without untangling the logic, which I won't do if you decide it's not worthwhile to clean these up.

Why it matters

Overall, this does make it very hard to maintain the Parallel methods and it's only going to get worse. When it comes around to looking into issues like #50566 as the code currently is it may be a painful process if it's needed to look through or even debug these methods and attempt to apply a fix.

In addition, the way the methods are packed in currently makes it harder to spot any sort of redundant logic, hindering possible performance opportunities too.

If it is decided that it is a worthwhile process to try to clean it up, try to remove any repeated code, refactor it into smaller sub-methods etc. then I'd happily take it up myself.

The text was updated successfully, but these errors were encountered:

dotnet-issue-labeler · 2021-06-02T10:53:24Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

stephentoub · 2021-06-02T12:20:36Z

over the years it has become increasingly less maintainable

It's barely changed, nor needed to change, since it was ported to .NET Core years ago. To my knowledge we've also had no problems fixing (rare) bugs when they have been discovered. On what are you basing your assertions?

It's possible the issue is bad enough that a developer has even left a comment about it:

This is not about the maintainability of the code.

If it is decided that it is a worthwhile process to try to clean it up, try to remove any repeated code, refactor it into smaller sub-methods

If you'd like to try to improve the code, you're certainly welcome to, and we'd happily accept a PR that made things better. I fear, though, that any such refactoring is more likely to introduce bugs than to help avoid them in the future, but I'm happy to be proven wrong if you'd like to try it. We also need to avoid any performance regressions in such an effort, so as part of such an effort we'd ask for detailed performance tests and results.

Alex-ABPerson · 2021-06-02T12:35:05Z

It's barely changed, nor needed to change, since it was ported to .NET Core years ago.

Apologies, that is actually what I was primarily looking at, the move from .NET Framework to .NET Core.

This is not about the maintainability of the code.

Fair enough

If you'd like to try to improve the code, you're certainly welcome to... We also need to avoid any performance regressions in such an effort, so as part of such an effort we'd ask for detailed performance tests and results.

Well I'll see what I can do then

ghost · 2021-06-02T14:56:16Z

Tagging subscribers to this area: @carlossanlop
See info in area-owners.md if you want to be subscribed.

Issue Details

The problem

The code within Parallel has remained extremely similar since its introduction all the way back in Framework. And over the years it has become increasingly less maintainable, to the point that someone who worked on it may have even left a comment on their difficulties working with it (see below).

There are only four methods that hold practically the entire behaviour of Parallel:

Invoke
ForWorker<TLocal>
ForWorker64<TLocal>
ForEachWorker<TSource, TLocal>

All of the features, thread locals, parallel loop state changes, passing indices to methods etc. are all performed by these four methods for every mode, and simply enabled or disabled by making arguments on them as null.

This is a perfectly fine idea if executed properly, but as it is the methods are quite a mess. Almost all of it has been packed into a single, huge, method, for all four modes. There's no structuring, none of it has been split up and organised into smaller sub-methods, there's loads of seemingly duplicated logic across all four modes, it's all just become one big slew of logic from start to finish.

It's possible the issue is bad enough that a developer has even left a comment about it:

runtime/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/Parallel.cs

Line 3245 in ff2507c

// I'm not going to try to maintain this

This comment may be talking from the code's perspective saying that "this task won't maintain this value" or something along those lines, or it may literally be talking from the developer's perspective saying that they're just going to do this because they're not really sure of the exact implications, which given the state of the method isn't an unreasonable possibility. It's not entirely clear which it is without untangling the logic, which I won't do if you decide it's not worthwhile to clean these up.

Why it matters

Overall, this does make it very hard to maintain the Parallel methods and it's only going to get worse. When it comes around to looking into issues like #50566 as the code currently is it may be a painful process if it's needed to look through or even debug these methods and attempt to apply a fix.

In addition, the way the methods are packed in currently makes it harder to spot any sort of redundant logic, hindering possible performance opportunities too.

If it is decided that it is a worthwhile process to try to clean it up, try to remove any repeated code, refactor it into smaller sub-methods etc. then I'd happily take it up myself.

Author:	Alex-ABPerson
Assignees:	-
Labels:	`area-System.Threading.Tasks`, `untriaged`
Milestone:	-

frankhaugen · 2021-09-21T17:33:06Z

@Alex-ABPerson are you working on this? If not, I might want to look into this

Alex-ABPerson · 2021-09-21T17:34:59Z

@frankhaugen Hey, I was planning on doing it but to be honest, I just hadn't found the time. So, if you'd like to feel free!

frankhaugen · 2021-09-26T14:10:09Z

@Alex-ABPerson I'm a big "short and sweet" -fanatic, and so I'm itching to split into more files. Would you perceived that as an improvement or a worsening?

stephentoub · 2021-09-26T14:19:55Z

Please don't split it into different files. Thanks :-)

frankhaugen · 2021-09-26T17:48:37Z

@stephentoub not even to emulate how you implemented Parallel.ForEachAsync? I think this should be at least consistent. There might be some really good reason why you did the ForEachAsync in it's own file, so I'd like to know the rationale here, if you have the capacity to give an answer, (I looked at PR, issue and in-code comments to see if there was some hint's for why the split, and not just side-by-side it in Parallel.cs).

There might be an Parallel.ForAsync() soon, (if #59019 is approved), so this is a thing to have a "rule" for when it is a new file and when it's heaped together in Parallel.cs, (there's probably some obscure docs where this policy is stated but I can't find anything related in the coding guidelines)

stephentoub · 2021-09-26T17:50:35Z

If for no other reason than moving code between files makes it really time-consuming to review. I don't personally believe that this code is worth the ROI.

frankhaugen · 2021-09-26T20:01:00Z

@stephentoub I do get @Alex-ABPerson 's point about maintainability and cleanliness: e.g. ForWorker and ForWorker64 has 240 lines each and 200 of them are identical.

I've spent a couple of hours playing around with cleaning it up a bit, but to clean it, it's either merging Worker32 and Worker64 into a single method, which is very dirty and will make bugfixing harder, or split it into parts eliminating all the duplications and nesting with methods, ("Uncle Bob" would approve of such a refactoring); this will make the code harder to read and mentally follow, reducing and not improving maintainability.

To be blunt this issue should be discouraged from being worked on. If this is to be cleaned, it would have to be a full refactoring of the System.Threading.Tasks.Parallel -project to have some value, but then it's just introducing more chances for bugs and unexpected behaviors. So I must agree with Stephen and say that this isn't worth it.

stephentoub · 2021-09-27T03:49:23Z

Thanks, @frankhaugen.

To be blunt this issue should be discouraged from being worked on. If this is to be cleaned, it would have to be a full refactoring of the System.Threading.Tasks.Parallel -project to have some value, but then it's just introducing more chances for bugs and unexpected behaviors. So I must agree with Stephen and say that this isn't worth it.

I agree 😉 and hinted at that in #53593 (comment). But I wanted to give folks the opportunity to prove my expectations wrong.

ForWorker and ForWorker64

Would the new support for static interface methods and the new interfaces implemented by Int32/Int64 help to consolidate that at all?

frankhaugen · 2021-09-28T20:10:39Z

ForWorker and ForWorker64

Would the new support for static interface methods and the new interfaces implemented by Int32/Int64 help to consolidate that at all?

Sure, if there was just one method and you can give it an integer of either variant, most of the code is just int vs. long, and where it's not, I can't see it being hard to work around it.

It would be an excellent usage example of how the interfaces give value when used to handle an either/or with int32 and int64

But I've not seen any in depth docs on the new interfaces except for what was discussed on one of the design reviews streams, so I might be spewing complete gibberish

Alex-ABPerson · 2021-10-12T11:50:16Z

That's what I was thinking initially, there is a lot of duplication across the methods ForWorker32 and ForWorker64, and a lot of the differences present are just 32-bit sizes vs 64-bit sizes.

Looking through the code, I can't see why ForWorker32 and ForWorker64 couldn't be merged into one generic ForWorker. In fact, it could be hypothetically be done right now. We could just make two structs implementing an interface (one for 32-bit, one for 64-bit), then use the structs as the generic, and the interface as a constraint, and let the JIT elide anything needed re-compile for each generic, but if we do it with the new interfaces they would make it even better and hopefully quite elegant.

I haven't gotten far enough to fully plan it out, but in theory it could work really well, yes.

tjwald · 2022-08-14T00:15:40Z

Hi! I'm new to working on Open source and would love to give it a go.
I think that I could try to clean up this file as a nice start. What do you think?

danmoseley · 2022-08-14T02:37:04Z

@tjwald you are welcome to try subject to the caveats and concerns above that it may not be worth it, and any change would need to be accompanied by thorough perf measurements. Perhaps there is some other issue in this repo that might be interesting instead? Many are marked "help wanted"

tjwald · 2022-08-15T04:40:09Z

I think I want this one for now :)
What should I read in order to start working on this?

tjwald · 2022-08-18T19:07:02Z

Hey I am working on it and will write here updates as I go. Can you assign this to me?

frankhaugen · 2022-09-27T16:16:38Z

@tjwald how is the work progressing? Have you discovered any big hurdles?

tjwald · 2022-09-27T19:17:27Z

I focused on merging the implementation of ForWorker32 and ForWorker64 using the new generic math feature.
I have a working draft; however, it required if statements in key locations, especially around the usage of the Interlocked APIs (which currently do not have support with generic math).
I am now working on 2 things:

Trying to run the performance suites on both versions to see what's the current situation.
Trying to get a big picture of the changes that I have done to understand if there is a better approach.

I currently have some environment issues, but I will fix them (Ran out of disk space 😄)

Alex-ABPerson · 2022-10-04T12:54:46Z

The if statements should be no problem. So long as they can be elided by the JIT (i.e. they're using the typeof(T) == typeof(X)) pattern, they shouldn't cause any perf regression

tjwald · 2022-10-08T18:14:22Z

Hi! so a few things:

do you want to see my current changes? I have the For32 and For64 merged together.
I did a basic empty body perf test to see that it is not a regression and so far, so good, but I should write more perf tests. What tests should I write? Should I open a PR on those tests and submit them to the perf repo first?
what should the branch name be?

The main deduplication I achieved was through using the Generic math for merging the 32 and 64 version together.
However, I have not tackled the issue of all of the different types of bodies being passed around (all the nulled variables etc.).
In addition, I haven't started looking at invoke or foreach.
So, what should I do next?

Perf tests
Invoke
Foreach
Body logic separation
Something else

frankhaugen · 2022-10-10T08:48:23Z

@tjwald

... So, what should I do next?

I think you should show your changes, as that's kind of the point of open-source development on GitHub, (It's git + social media), so people who will "veto" your approach can do it as early as possible.

Don't waste your time doing development using a structure, style or pattern that might get "noped", so push what you have, make a draft PR and attach it to this isssue and ask for preliminary feedback.

On the matter of branchnaming, based on branchnames in the repo, it is pretty random in format so just name it descriptivly and anyway you will be forking "runtime" to your profile so your changes live there anyways, and a branch should stop existing after merging if it's a feature-branch anyway.

Also, if you have banchmarks, they should be their own PR if you have some that are of quality and are testing the "old"/existing implementation, just because they might be approved or rejected on very different grounds than your other code

!These are just general advice as I'm not a contributor to this repo or in any way involved beyond being an interested community-member that would love to work on this issue, but I don't have the capacity to do it right 😃

tjwald · 2022-10-12T18:47:52Z

Hi @frankhaugen, @Alex-ABPerson!
I opened my initial PR. Linked above 😄
There are some hideous things in there, however I would love to get feedback and guidance how to improve on my ideas.

tjwald · 2022-10-14T00:48:54Z

I have opened a secondary PR with a possible continuation for the refactoring

tjwald · 2022-10-23T00:36:15Z

Hi everyone!
I have been refactoring this code on the secondary branch, and I have come to a very nice situation - The code for 3/4 of the main methods have been merged (almost!)
First, in the 1st PR I have merged the 32 and 64 version as previously mentioned.
Secondly, I introduced 2 new structures into the code: worker bodies and workers.
Worker bodies are wrappers of the user provided functions with state management and enable code reuse between For and Foreach methods.
Workers are the replicable core of the implementations of the loops in each of the methods: For and Foreach. (Note that foreach in fact has 2 implementations and this is reflected now in the Worker inheritance).
To facilitate the replication of the workers, I introduced a WorkerBodyFactory that is called for each replication of the worker.

Where I could, I joined methods as to avoid duplication, and created helper methods to show the commonalities of the separate core methods - which has pointed out that foreach and for have VERY similar bodies.
I am going to try to merge them into a single implementation.

tjwald · 2022-11-01T16:06:43Z

Hi all :)
If you will look at this PR - you will see that all but Invoke are using the same implementation.

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Jun 2, 2021

jeffschwMSFT added the area-System.Threading.Tasks label Jun 2, 2021

stephentoub added this to the Future milestone Jul 12, 2021

jeffhandley added help wanted [up-for-grabs] Good issue for external contributors and removed untriaged New issue has not been triaged by the area owner labels Jul 13, 2021

buyaa-n assigned tjwald Sep 20, 2022

tjwald mentioned this issue Oct 12, 2022

WIP: Clean near unmaintainable parallel logic 53593 #76949

Closed

4 tasks

tjwald mentioned this issue Oct 14, 2022

Cleanup unmaintainable parallel logic with workers tjwald/csharp-runtime#1

Open

tjwald mentioned this issue Nov 7, 2022

Cleanup unmaintainable parallel logic with workers #77993

Closed

stephentoub mentioned this issue Apr 14, 2023

Deduplicate Parallel.For code with generic math #84853

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 14, 2023

stephentoub closed this as completed in #84853 Apr 15, 2023

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 15, 2023

ghost locked as resolved and limited conversation to collaborators May 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean Near-Unmaintainable Parallel Logic #53593

Clean Near-Unmaintainable Parallel Logic #53593

Alex-ABPerson commented Jun 2, 2021 •

edited

Loading

dotnet-issue-labeler bot commented Jun 2, 2021

stephentoub commented Jun 2, 2021

Alex-ABPerson commented Jun 2, 2021 •

edited

Loading

ghost commented Jun 2, 2021

The problem

Why it matters

frankhaugen commented Sep 21, 2021

Alex-ABPerson commented Sep 21, 2021

frankhaugen commented Sep 26, 2021

stephentoub commented Sep 26, 2021

frankhaugen commented Sep 26, 2021

stephentoub commented Sep 26, 2021 •

edited

Loading

frankhaugen commented Sep 26, 2021

stephentoub commented Sep 27, 2021

frankhaugen commented Sep 28, 2021

Alex-ABPerson commented Oct 12, 2021 •

edited

Loading

tjwald commented Aug 14, 2022

danmoseley commented Aug 14, 2022

tjwald commented Aug 15, 2022

tjwald commented Aug 18, 2022

frankhaugen commented Sep 27, 2022

tjwald commented Sep 27, 2022

Alex-ABPerson commented Oct 4, 2022 •

edited

Loading

tjwald commented Oct 8, 2022 •

edited

Loading

frankhaugen commented Oct 10, 2022

tjwald commented Oct 12, 2022 •

edited

Loading

tjwald commented Oct 14, 2022

tjwald commented Oct 23, 2022

tjwald commented Nov 1, 2022

Clean Near-Unmaintainable Parallel Logic #53593

Clean Near-Unmaintainable Parallel Logic #53593

Comments

Alex-ABPerson commented Jun 2, 2021 • edited Loading

The problem

Why it matters

dotnet-issue-labeler bot commented Jun 2, 2021

stephentoub commented Jun 2, 2021

Alex-ABPerson commented Jun 2, 2021 • edited Loading

ghost commented Jun 2, 2021

The problem

Why it matters

frankhaugen commented Sep 21, 2021

Alex-ABPerson commented Sep 21, 2021

frankhaugen commented Sep 26, 2021

stephentoub commented Sep 26, 2021

frankhaugen commented Sep 26, 2021

stephentoub commented Sep 26, 2021 • edited Loading

frankhaugen commented Sep 26, 2021

stephentoub commented Sep 27, 2021

frankhaugen commented Sep 28, 2021

Alex-ABPerson commented Oct 12, 2021 • edited Loading

tjwald commented Aug 14, 2022

danmoseley commented Aug 14, 2022

tjwald commented Aug 15, 2022

tjwald commented Aug 18, 2022

frankhaugen commented Sep 27, 2022

tjwald commented Sep 27, 2022

Alex-ABPerson commented Oct 4, 2022 • edited Loading

tjwald commented Oct 8, 2022 • edited Loading

frankhaugen commented Oct 10, 2022

tjwald commented Oct 12, 2022 • edited Loading

tjwald commented Oct 14, 2022

tjwald commented Oct 23, 2022

tjwald commented Nov 1, 2022

Alex-ABPerson commented Jun 2, 2021 •

edited

Loading

Alex-ABPerson commented Jun 2, 2021 •

edited

Loading

stephentoub commented Sep 26, 2021 •

edited

Loading

Alex-ABPerson commented Oct 12, 2021 •

edited

Loading

Alex-ABPerson commented Oct 4, 2022 •

edited

Loading

tjwald commented Oct 8, 2022 •

edited

Loading

tjwald commented Oct 12, 2022 •

edited

Loading