-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improvement-15910][UI] Supposed to provide a default value for the custom parallelism when using the mode of parallel execution. #15912
Conversation
max='100' | ||
min='1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a good idea to restrict the parallelism should between 1
and 100
, this shouldn't be restricted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that an upper limit should be set. If there is no upper limit, all complement tasks will be triggered, which will put a lot of pressure on the system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have encountered a real scenario where the user needs to supplement three years of data. After selecting the parallel mode, 1500+ workflow instances are generated at one time, which directly causes the dolphinscheduler worker service to go down. I suggest that there is an upper limit to limit it. We can discuss this upper limit. How much is appropriate to set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Master/Worker has its own protection strategy, I don't think we should rely on a workflow config to protect the system. And I don't think we can get the correct upper limit at this discussion, different user have different cluster size and usage Scenarios, if we don't know how to set the max value, why we need to set. In additional, if we want to avoid the system crash in a large parallel, the correct way is to limit the rate at server side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an appropriate value cannot be negotiated, it should be returned by api-server
and allow the administrator to configure it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can try to set task-execute-threads-full-policy
to be REJECT, then the worker will only receive 100 tasks.
task-execute-threads-full-policy: REJECT |
BTW, I agree with @Gallardot if we want to set a maximal restriction, it's better to make this value to be configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can try to set
task-execute-threads-full-policy
to be REJECT, then the worker will only receive 100 tasks.
task-execute-threads-full-policy: REJECT BTW, I agree with @Gallardot if we want to set a maximal restriction, it's better to make this value to be configurable.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because Master/Worker has its own protection strategy, giving user prompts is a solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warning message is too strange, why it's better not to be over 10?
@@ -66,7 +66,7 @@ export const useForm = () => { | |||
tenantCode: 'default', | |||
environmentCode: null, | |||
startParams: null, | |||
expectedParallelismNumber: '', | |||
expectedParallelismNumber: '2', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the default value is 2 rather than 3? If you want to set a default value, I would prefer 1 or a default empty value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that if parallel mode is selected, it should be greater than 1, otherwise just select serial mode. Having default values will make the user experience more friendly and allow users to do as few operations as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my habit is to use maximum parallelism every time, then this change will not be friendly to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my habit is to use maximum parallelism every time, then this change will not be friendly to me.
You could also use the maximum parallelism 100. But limit-free will be risky to the system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this will bring risky, there will exist a lot of risky in the system. We need to limit the scheduler number, limit the task number in one workflow, limit the size (parallelism in each backfill * backfill request number)...... The main thing is that why this will bring risky to the system, if this will bring risky to system we should fix this in a correct way, rather than simple limit the input form, can this restriction fix this problem? From my point, this restriction just make it more inconvinent for me to use, but it does not solve the problem caused by the high stress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not using the maximum parallelism has another meaning, ensuring the execution of other normal scheduled tasks. If the user directly uses the maximum degree of parallelism, the normally scheduled tasks will not be executed normally. I understand that the user needs to know whether his complement tasks during this period will affect other tasks before selecting the maximum degree of parallelism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this make sense to me, the maximum paramllelism does really will affect the current running scheduler workflow. We should direct users to try not to use such large parallel reads unless there is some special reason.
Please don't add |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## dev #15912 +/- ##
============================================
- Coverage 39.72% 39.66% -0.07%
+ Complexity 5038 5013 -25
============================================
Files 1349 1349
Lines 45615 45615
Branches 4890 4891 +1
============================================
- Hits 18122 18091 -31
- Misses 25578 25609 +31
Partials 1915 1915 ☔ View full report in Codecov by Sentry. |
Please retry analysis of this Pull-Request directly on SonarCloud |
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Purpose of the pull request
This PR will close #15910 .
Brief change log
Verify this pull request
This change added tests and can be verified as follows:
The test result is as follows: