-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean Near-Unmaintainable Parallel Logic #53593
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
It's barely changed, nor needed to change, since it was ported to .NET Core years ago. To my knowledge we've also had no problems fixing (rare) bugs when they have been discovered. On what are you basing your assertions?
This is not about the maintainability of the code.
If you'd like to try to improve the code, you're certainly welcome to, and we'd happily accept a PR that made things better. I fear, though, that any such refactoring is more likely to introduce bugs than to help avoid them in the future, but I'm happy to be proven wrong if you'd like to try it. We also need to avoid any performance regressions in such an effort, so as part of such an effort we'd ask for detailed performance tests and results. |
Apologies, that is actually what I was primarily looking at, the move from .NET Framework to .NET Core.
Fair enough
Well I'll see what I can do then |
Tagging subscribers to this area: @carlossanlop Issue DetailsThe problemThe code within There are only four methods that hold practically the entire behaviour of
All of the features, thread locals, parallel loop state changes, passing indices to methods etc. are all performed by these four methods for every mode, and simply enabled or disabled by making arguments on them as null. This is a perfectly fine idea if executed properly, but as it is the methods are quite a mess. Almost all of it has been packed into a single, huge, method, for all four modes. There's no structuring, none of it has been split up and organised into smaller sub-methods, there's loads of seemingly duplicated logic across all four modes, it's all just become one big slew of logic from start to finish. It's possible the issue is bad enough that a developer has even left a comment about it: runtime/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/Parallel.cs Line 3245 in ff2507c
This comment may be talking from the code's perspective saying that "this task won't maintain this value" or something along those lines, or it may literally be talking from the developer's perspective saying that they're just going to do this because they're not really sure of the exact implications, which given the state of the method isn't an unreasonable possibility. It's not entirely clear which it is without untangling the logic, which I won't do if you decide it's not worthwhile to clean these up. Why it mattersOverall, this does make it very hard to maintain the In addition, the way the methods are packed in currently makes it harder to spot any sort of redundant logic, hindering possible performance opportunities too. If it is decided that it is a worthwhile process to try to clean it up, try to remove any repeated code, refactor it into smaller sub-methods etc. then I'd happily take it up myself.
|
@Alex-ABPerson are you working on this? If not, I might want to look into this |
@frankhaugen Hey, I was planning on doing it but to be honest, I just hadn't found the time. So, if you'd like to feel free! |
@Alex-ABPerson I'm a big "short and sweet" -fanatic, and so I'm itching to split into more files. Would you perceived that as an improvement or a worsening? |
Please don't split it into different files. Thanks :-) |
@stephentoub not even to emulate how you implemented Parallel.ForEachAsync? I think this should be at least consistent. There might be some really good reason why you did the ForEachAsync in it's own file, so I'd like to know the rationale here, if you have the capacity to give an answer, (I looked at PR, issue and in-code comments to see if there was some hint's for why the split, and not just side-by-side it in Parallel.cs). There might be an Parallel.ForAsync() soon, (if #59019 is approved), so this is a thing to have a "rule" for when it is a new file and when it's heaped together in Parallel.cs, (there's probably some obscure docs where this policy is stated but I can't find anything related in the coding guidelines) |
If for no other reason than moving code between files makes it really time-consuming to review. I don't personally believe that this code is worth the ROI. |
@stephentoub I do get @Alex-ABPerson 's point about maintainability and cleanliness: e.g. ForWorker and ForWorker64 has 240 lines each and 200 of them are identical. I've spent a couple of hours playing around with cleaning it up a bit, but to clean it, it's either merging Worker32 and Worker64 into a single method, which is very dirty and will make bugfixing harder, or split it into parts eliminating all the duplications and nesting with methods, ("Uncle Bob" would approve of such a refactoring); this will make the code harder to read and mentally follow, reducing and not improving maintainability. To be blunt this issue should be discouraged from being worked on. If this is to be cleaned, it would have to be a full refactoring of the |
Thanks, @frankhaugen.
I agree 😉 and hinted at that in #53593 (comment). But I wanted to give folks the opportunity to prove my expectations wrong.
Would the new support for static interface methods and the new interfaces implemented by Int32/Int64 help to consolidate that at all? |
Sure, if there was just one method and you can give it an integer of either variant, most of the code is just int vs. long, and where it's not, I can't see it being hard to work around it. It would be an excellent usage example of how the interfaces give value when used to handle an either/or with int32 and int64 But I've not seen any in depth docs on the new interfaces except for what was discussed on one of the design reviews streams, so I might be spewing complete gibberish |
That's what I was thinking initially, there is a lot of duplication across the methods Looking through the code, I can't see why I haven't gotten far enough to fully plan it out, but in theory it could work really well, yes. |
Hi! I'm new to working on Open source and would love to give it a go. |
@tjwald you are welcome to try subject to the caveats and concerns above that it may not be worth it, and any change would need to be accompanied by thorough perf measurements. Perhaps there is some other issue in this repo that might be interesting instead? Many are marked "help wanted" |
I think I want this one for now :) |
Hey I am working on it and will write here updates as I go. Can you assign this to me? |
@tjwald how is the work progressing? Have you discovered any big hurdles? |
I focused on merging the implementation of ForWorker32 and ForWorker64 using the new generic math feature.
I currently have some environment issues, but I will fix them (Ran out of disk space 😄) |
The if statements should be no problem. So long as they can be elided by the JIT (i.e. they're using the |
Hi! so a few things:
The main deduplication I achieved was through using the Generic math for merging the 32 and 64 version together.
|
I think you should show your changes, as that's kind of the point of open-source development on GitHub, (It's git + social media), so people who will "veto" your approach can do it as early as possible. Don't waste your time doing development using a structure, style or pattern that might get "noped", so push what you have, make a draft PR and attach it to this isssue and ask for preliminary feedback. On the matter of branchnaming, based on branchnames in the repo, it is pretty random in format so just name it descriptivly and anyway you will be forking "runtime" to your profile so your changes live there anyways, and a branch should stop existing after merging if it's a feature-branch anyway. Also, if you have banchmarks, they should be their own PR if you have some that are of quality and are testing the "old"/existing implementation, just because they might be approved or rejected on very different grounds than your other code !These are just general advice as I'm not a contributor to this repo or in any way involved beyond being an interested community-member that would love to work on this issue, but I don't have the capacity to do it right 😃 |
Hi @frankhaugen, @Alex-ABPerson! |
I have opened a secondary PR with a possible continuation for the refactoring |
Hi everyone! Where I could, I joined methods as to avoid duplication, and created helper methods to show the commonalities of the separate core methods - which has pointed out that foreach and for have VERY similar bodies. |
Hi all :) |
The problem
The code within
Parallel
has remained extremely similar since its introduction all the way back in Framework. And over the years it has become increasingly less maintainable, to the point that someone who worked on it may have even left a comment on their difficulties working with it (see below).There are only four methods that hold practically the entire behaviour of
Parallel
:Invoke
ForWorker<TLocal>
ForWorker64<TLocal>
ForEachWorker<TSource, TLocal>
All of the features, thread locals, parallel loop state changes, passing indices to methods etc. are all performed by these four methods for every mode, and simply enabled or disabled by making arguments on them as null.
This is a perfectly fine idea if executed properly, but as it is the methods are quite a mess. Almost all of it has been packed into a single, huge, method, for all four modes. There's no structuring, none of it has been split up and organised into smaller sub-methods, there's loads of seemingly duplicated logic across all four modes, it's all just become one big slew of logic from start to finish.
It's possible the issue is bad enough that a developer has even left a comment about it:
runtime/src/libraries/System.Threading.Tasks.Parallel/src/System/Threading/Tasks/Parallel.cs
Line 3245 in ff2507c
This comment may be talking from the code's perspective saying that "this task won't maintain this value" or something along those lines, or it may literally be talking from the developer's perspective saying that they're just going to do this because they're not really sure of the exact implications, which given the state of the method isn't an unreasonable possibility. It's not entirely clear which it is without untangling the logic, which I won't do if you decide it's not worthwhile to clean these up.
Why it matters
Overall, this does make it very hard to maintain the
Parallel
methods and it's only going to get worse. When it comes around to looking into issues like #50566 as the code currently is it may be a painful process if it's needed to look through or even debug these methods and attempt to apply a fix.In addition, the way the methods are packed in currently makes it harder to spot any sort of redundant logic, hindering possible performance opportunities too.
If it is decided that it is a worthwhile process to try to clean it up, try to remove any repeated code, refactor it into smaller sub-methods etc. then I'd happily take it up myself.
The text was updated successfully, but these errors were encountered: