-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run promotion jobs in parallel #4747
Conversation
Signed-off-by: Sayali Gaikawad <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes @gaiksaya . LGTM, few minor questions/comments.
The release promotion job today takes 1-2 hours to run as all jobs run parallel.
Did we mean to say all jobs run serially (as of today)
|
||
pipeline { | ||
options { | ||
timeout(time: 4, unit: 'HOURS') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does 4 hour still remains a relevant timeout when jobs running in parallel should finish much faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be reduced to 2 now! Just to be safe. Sometimes bringing up new agents take time due to availability and other parallely running jobs so giving it a buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, is there a way to make if configurable (dynamic) for better tuning too?
I am thinking of how we can get to a sweet spot of knowing failures early enough without triggering false positive. This can be followed separately in its own issue if you think its worth pursuing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout is max time allowed in case the job gets hung up due to some reason. Apart from that it has no use. Mainly added due to infrastructure constraints.
} | ||
|
||
@Test | ||
void shouldExecuteWithoutErrors() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to also check that job was actually executed in parallel? Since assertions only state it got executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really! What the end state of the job would be can only be mocked from our end or run actually on the jenkins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I wish there was a mechanism to latch or clock counter.
Signed-off-by: Sayali Gaikawad <[email protected]>
stage('OpenSearch Yum promotion') { | ||
agent { | ||
docker { | ||
label AGENT_LINUX_X64 | ||
image 'docker/library/alpine:3' | ||
registryUrl 'https://public.ecr.aws/' | ||
alwaysPull true | ||
} | ||
} | ||
steps { | ||
echo 'Triggering distribution-promote-repos for OpenSearch Yum' | ||
build job: 'distribution-promote-repos', wait: true, parameters: [string(name: 'DISTRIBUTION_JOB_NAME', value: 'distribution-build-opensearch'), | ||
string(name: 'DISTRIBUTION_REPO_TYPE', value: 'yum'), | ||
string(name: 'DISTRIBUTION_BUILD_NUMBER', value: params.OPENSEARCH_RC_BUILD_NUMBER), | ||
string(name: 'INPUT_MANIFEST', value: "${params.RELEASE_VERSION}/opensearch-${params.RELEASE_VERSION}.yml"), | ||
] | ||
echo 'Promotion successful for OpenSearch yum!' | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are repeating Steps for each stages with some variation, I am wondering if there is some better way to do this with some code restructuring. Such as defining stages and steps first and then iterating over them?
I understand if that's totally not possible in scripts as this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is possible by putting it into groovy scripts but we want to control each job execution and parameters at the lower level here. Putting it in once single script or library is prone to errors and debugging issues. Definitely there is a scope for improvement once we are sure of the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, maybe worth exploring this separately?
Yes! That is correct. It was kind of POC which worked (thanks to @prudhvigodithi ). |
assertCallStack().contains("release-promotion-parallel.string({name=DISTRIBUTION_NAME, value=tar})") | ||
assertCallStack().contains("release-promotion-parallel.string({name=DISTRIBUTION_ARCHITECTURE, value=x64})") | ||
|
||
// OpenSearch Linux tar x64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know we have covered all the required steps in workflow like this? Is there a separate workflow model or state machine which can be source of truth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wait: true
parameter is responsible for returning the status of the triggered job. It propagates back the state.
echo 'Promotion successful for OpenSearch Dashboards Linux tar arm64!' | ||
} | ||
} | ||
stage('OpenSearch Dashboards Linux tar x64') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @gaiksaya , I think the idea is to run both OS and OSD tar x64 at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! Is there any reason to wait for Dashboards till the end too? OpenSearch I know is for maven publishing reasons here.
Hey @gaiksaya for
|
Thanks for this change @gaiksaya should really fasten the release promotion. You can directly modify the existing release-promotion.jenkinsfile file right ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need x64 tar with OpenSearch to be the last as the native plugins of that would override in the end. Everything needs to happen before that run.
I am good with this parallel switch.
Thanks.
I believe the current set up will do the same. Worried that post stage may cause issues in the run. Trying to keep post stages for side activities rather than main workflow run. In serial run, the x64 wont trigger unless all parallel succeeds. |
We can just wanted to keep that as a back up incase this workflow causes some issues. Once we know it works we can replace the original one and deprecate this. WDYT? |
Description
The release promotion job today takes 1-2 hours to run as all jobs run parallel. This PR converts all those jobs to run parallelly reducing the time by 75%. The opensearch tarball promotion needs to be the last job that excutes as those artifacts are promoted to maven central. Hence triggering OpenSearch x64 is the last job that runs serially after all jobs are completed.
Issues Resolved
closes #4748
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.