Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Adding API for parallel block to task_arena to warm-up/retain/release worker threads #1522
base: master
Are you sure you want to change the base?
[RFC] Adding API for parallel block to task_arena to warm-up/retain/release worker threads #1522
Changes from 5 commits
215a17d
74bf599
60fb251
1501d06
9f9279a
8a159d9
81fb7f0
97b499a
44fccee
8efcdc0
5989d23
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wording "should enter..." might create/add to confusion. As already mentioned, from the usage standpoint "one time fast leave" is not a state to enter, but rather a command to execute when the parallel phase has ended and no other one is active.
So I would change to something like "Indicates that worker threads should avoid busy-waiting once there is no more work in the arena".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced to the suggested one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to find better names to the enum class and its values.
I am specifically concerned about the use of "delayed" in case the actual behavior might be platform specific, not always delayed. But also
workers_leave
is not a very good name.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking... Combined with your previous comment about
automatic
, perhaps we could have 3 modes instead:If we assume that we have these 3 modes now,
fast
anddelayed
modes would enforce behavior regardless of the platform. That would give user more control while preserving usability (for example,automatic
would be translated tofast
option for hybrid systems).What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's an option we can consider. We'd need to think though what the "enforced" delayed mode will guarantee.
For the "automatic" mode, we say that after work completion threads might be retained in the arena for unspecified time chosen by internal heuristics. How would the definition of "enforced" delayed leave mode differ, what additional guarantees it would provide to make sense for users to choose it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like if we have a dedicated
automatic
policy, that means thatdelayed
policy should guarantee at least some level of threads' retention relatively tofast
policy.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the whole structure is just a hint to the scheduler no real guarantees provided therefore from the description we get:
From the implementation stand point it has a lot of sense since we will have clear invariants for arena e.g., default arena on hybrid platform will have "Fast" leave state.
So it defiantly improves implementation logic while bringing some potential value to the users ("Delayed" will behave as thread retantion if user explicitly specified).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akukanov Regarding enumeration class name. Do you find
leave_policy
a better name thanworkers_leave
? Seems more natural to me when using it during arena construction:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pavelkumbrasev please describe the semantics of the
delayed
hint in a way that is meaningful for users.For example, the description I used above
as well as what you mentioned
are good for
automatic
but rather bad fordelayed
because all aspects of the decision are left to the implementation. Even if changed to "will be retained for unspecified time", it would still be rather weak because the time can be any, including arbitrarily close to 0 - that is, it's not really different fromautomatic
, and there is no clear reason to prefer it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Perhaps we are not on the same page. I would like it to be:
Automatic is basically another heuristic on choosing "fast" or "leave" based on underlying HW. Perhaps, automatic not the best name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this definition, I see no difference for users between "automatic" and "delayed", because in both cases the decision of whether to delay or not to delay, and for how long, is left to the implementation. If that is the intended behavior, let's not complicate the API with a redundant enum value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's have
fast
andautomatic
policies then since we're not sure right now whether we can provide some meaningful guarantees fordelayed
policy to user.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore, my suggestion is to address both of these by changing the API (here and in other places) to something like the following:
Then to add somewhere the explanation how this affects/changes the behavior of the current parallel block and how this composes with the arena's setting and other parallel blocks within it. For example, it may be like:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no composability problem really, as all but the last end-of-block calls are simply ignored, and only the last one has the one-time impact on the leave policy. Also it does not affect the arena settings, according to the design.
Of course if the calls come from different threads, in general it is impossible to predict which one will be the last. However, even if the code is designed to create parallel blocks in the same arena by multiple threads, all these blocks might have the same leave policy so that it does not matter which one is the last to end.
Using the same enum for the end of block as for the construction of the arena seems more confusing than helpful to me, as it may be perceived as changing the arena state permanently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess it is the last one that decreases the ref counter to zero. I don't see any issue with this. Later blocks use the arena's policy if not specified explicitly.
I indicated the difference in the parameter naming this_block_leave, but if it is not enough, we can also indicate that more explicitly with an additional type: arena_workers_leave and phase_workers_leave. Nevertheless, my opinion is that it would not be a problem if documentation/specification includes explanation of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more question here is - what if the default value would be the opposite to the one specified for the arena setting? Meaning that if arena is constructed with the "fast leave" policy, then each parallel block/phase would have "delayed leave". I understand that it might be perceived as even more confusing, but I just don't quite understand the idea of having additional possibility for the user to specify a parallel phase that ends with the same arena's workers leave policy. What the user wanted to say by this? Why to use "parallel block" API in this case at all? It might be even more confusing.
Since we only have two policies, perhaps, it would be better to introduce something like:
If later demand appears, other parameters could be added to these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that this discussion is not just about API semantics but really about a different architecture, where each parallel block/stage might have its own customizable retention policy. It differs significantly from what is proposed, so I think it needs deeper elaboration, perhaps with new state change diagrams etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More on this:
The primary point of a parallel phase is not to set a certain leave policy when the phase ends (for which it would be sufficient to have a single "switch the state" method). The parallel phase allows to use a distinct retention policy during the phase - for example, to prolong the default busy-wait duration or to utilize different heuristics. I.e., it does not switch between "fast" and "delayed" but introduces a third possible state of thread retention.
Once all initiated parallel phases end, the retention policy returns, according to the proposed design, to the state set at arena construction. However the use case for threads to leave as soon as possible still remains. For that reason, the extra argument at the end of the block is useful to indicate this "one time fast leave" request.
Hope that helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the arena's
workers_leave
behavior andscoped_parallel_block
both specified in the constructors, this change in behavior set at the end of a parallel block looks inconsistent.Would it be better to have this setting be specified at the start of a parallel block rather than at its end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make the API harder from usability standpoint. User will need somehow link this parameter from the start of the block to the end of the block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above about making the approach to be a bit more generic. Essentially, I think we can write something like "implementation-defined" in case of a concurrent calls to this API. However, it seems to me that the behavior should be kind of relaxed, so to say. Meaning that if there is at least one "delayed leave" request happening concurrently with possibly a number of "fast leave" requests, then it, i.e., "delayed leave" policy prevails.
Also, having the request stated up front allows scheduler to know the runtime situation earlier, hence making better decisions about optimality of the workers' behavior.