Run promotion jobs in parallel #4747

gaiksaya · 2024-06-06T04:00:19Z

Description

The release promotion job today takes 1-2 hours to run as all jobs run parallel. This PR converts all those jobs to run parallelly reducing the time by 75%. The opensearch tarball promotion needs to be the last job that excutes as those artifacts are promoted to maven central. Hence triggering OpenSearch x64 is the last job that runs serially after all jobs are completed.

Issues Resolved

closes #4748

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Sayali Gaikawad <[email protected]>

getsaurabh02

Thanks for the changes @gaiksaya . LGTM, few minor questions/comments.

The release promotion job today takes 1-2 hours to run as all jobs run parallel.

Did we mean to say all jobs run serially (as of today)

getsaurabh02 · 2024-06-06T04:08:27Z

jenkins/promotion/release-promotion-parallel.jenkinsfile

+
+pipeline {
+    options {
+        timeout(time: 4, unit: 'HOURS')


Does 4 hour still remains a relevant timeout when jobs running in parallel should finish much faster?

Can be reduced to 2 now! Just to be safe. Sometimes bringing up new agents take time due to availability and other parallely running jobs so giving it a buffer.

sure, is there a way to make if configurable (dynamic) for better tuning too?
I am thinking of how we can get to a sweet spot of knowing failures early enough without triggering false positive. This can be followed separately in its own issue if you think its worth pursuing.

The timeout is max time allowed in case the job gets hung up due to some reason. Apart from that it has no use. Mainly added due to infrastructure constraints.

getsaurabh02 · 2024-06-06T04:12:16Z

tests/jenkins/TestOpenSearchReleasePromotionParallelTest.groovy

+    }
+
+    @Test
+    void shouldExecuteWithoutErrors() {


Is there a way to also check that job was actually executed in parallel? Since assertions only state it got executed.

Not really! What the end state of the job would be can only be mocked from our end or run actually on the jenkins.

hmm, I wish there was a mechanism to latch or clock counter.

Signed-off-by: Sayali Gaikawad <[email protected]>

getsaurabh02 · 2024-06-06T04:18:27Z

jenkins/promotion/release-promotion-parallel.jenkinsfile

+                stage('OpenSearch Yum promotion') {
+                    agent {
+                        docker {
+                            label AGENT_LINUX_X64
+                            image 'docker/library/alpine:3'
+                            registryUrl 'https://public.ecr.aws/'
+                            alwaysPull true
+                        }
+                    }
+                    steps {
+                        echo 'Triggering distribution-promote-repos for OpenSearch Yum'
+                        build job: 'distribution-promote-repos', wait: true, parameters: [string(name: 'DISTRIBUTION_JOB_NAME', value: 'distribution-build-opensearch'), 
+                                                                                            string(name: 'DISTRIBUTION_REPO_TYPE', value: 'yum'),
+                                                                                            string(name: 'DISTRIBUTION_BUILD_NUMBER', value: params.OPENSEARCH_RC_BUILD_NUMBER),
+                                                                                            string(name: 'INPUT_MANIFEST', value: "${params.RELEASE_VERSION}/opensearch-${params.RELEASE_VERSION}.yml"),
+                                                                                        ]
+                        echo 'Promotion successful for OpenSearch yum!'
+                    }
+                }


Since we are repeating Steps for each stages with some variation, I am wondering if there is some better way to do this with some code restructuring. Such as defining stages and steps first and then iterating over them?
I understand if that's totally not possible in scripts as this.

I believe it is possible by putting it into groovy scripts but we want to control each job execution and parameters at the lower level here. Putting it in once single script or library is prone to errors and debugging issues. Definitely there is a scope for improvement once we are sure of the process.

nice, maybe worth exploring this separately?

gaiksaya · 2024-06-06T04:22:54Z

Did we mean to say all jobs run serially (as of today)

Yes! That is correct. It was kind of POC which worked (thanks to @prudhvigodithi ).

getsaurabh02 · 2024-06-06T05:10:46Z

tests/jenkins/TestOpenSearchReleasePromotionParallelTest.groovy

+        assertCallStack().contains("release-promotion-parallel.string({name=DISTRIBUTION_NAME, value=tar})")
+        assertCallStack().contains("release-promotion-parallel.string({name=DISTRIBUTION_ARCHITECTURE, value=x64})")
+
+        // OpenSearch Linux tar x64


How do we know we have covered all the required steps in workflow like this? Is there a separate workflow model or state machine which can be source of truth?

The wait: true parameter is responsible for returning the status of the triggered job. It propagates back the state.

Divyaasm · 2024-06-06T06:07:58Z

jenkins/promotion/release-promotion-parallel.jenkinsfile

+                        echo 'Promotion successful for OpenSearch Dashboards Linux tar arm64!'
+                    }
+                }
+                stage('OpenSearch Dashboards Linux tar x64') {


Hey @gaiksaya , I think the idea is to run both OS and OSD tar x64 at the end.

I see! Is there any reason to wait for Dashboards till the end too? OpenSearch I know is for maven publishing reasons here.

prudhvigodithi · 2024-06-06T15:47:32Z

Hey @gaiksaya for OpenSearch Dashboards Linux tar x64 and OpenSearch Linux tar x64 can we move them to Post success? As today rule of thumb we trigger the x64 tar for OS and OSD at last once all other promotion jobs are completed, since this is now parallel we should allow all other promotions to run in parallel once succeeded then tun the x64 tar in parallel. Please check something like.

    post {
        success {
            stage('Triggering X64 Promotion Jobs') { 
                parallel {
                    stage('OpenSearch Dashboards Linux tar x64') {
                        agent {
                            docker {
                                label AGENT_LINUX_X64
                                image 'docker/library/alpine:3'
                                registryUrl 'https://public.ecr.aws/'
                                alwaysPull true
                            }
                        }
                        steps {
                            echo 'Triggering distribution-promote-artifacts for OpenSearch Dashboards Linux tar x64'
                            build job: 'distribution-promote-artifacts', wait: true, parameters: [string(name: 'DISTRIBUTION_JOB_NAME', value: 'distribution-build-opensearch-dashboards'), 
                                                                                                string(name: 'DISTRIBUTION_PLATFORM', value: 'linux'),
                                                                                                string(name: 'DISTRIBUTION_NAME', value: 'tar'),
                                                                                                string(name: 'DISTRIBUTION_ARCHITECTURE', value: 'x64'),
                                                                                                string(name: 'DISTRIBUTION_BUILD_NUMBER', value: params.OPENSEARCH_DASHBOARDS_RC_BUILD_NUMBER),
                                                                                                string(name: 'INPUT_MANIFEST', value: "${params.RELEASE_VERSION}/opensearch-dashboards-${params.RELEASE_VERSION}.yml"),
                                                                                            ]
                            echo 'Promotion successful for OpenSearch Dashboards Linux tar x64!'
                        }
                    }
                    stage('OpenSearch Linux tar x64') {
                        agent {
                            docker {
                                label AGENT_LINUX_X64
                                image 'docker/library/alpine:3'
                                registryUrl 'https://public.ecr.aws/'
                                alwaysPull true
                            }
                        }
                        steps {
                            echo 'Triggering distribution-promote-artifacts for OpenSearch Linux tar x64'
                            build job: 'distribution-promote-artifacts', wait: true, parameters: [string(name: 'DISTRIBUTION_JOB_NAME', value: 'distribution-build-opensearch'), 
                                                                                                string(name: 'DISTRIBUTION_PLATFORM', value: 'linux'),
                                                                                                string(name: 'DISTRIBUTION_NAME', value: 'tar'),
                                                                                                string(name: 'DISTRIBUTION_ARCHITECTURE', value: 'x64'),
                                                                                                string(name: 'DISTRIBUTION_BUILD_NUMBER', value: params.OPENSEARCH_RC_BUILD_NUMBER),
                                                                                                string(name: 'INPUT_MANIFEST', value: "${params.RELEASE_VERSION}/opensearch-${params.RELEASE_VERSION}.yml"),
                                                                                            ]
                            echo 'Promotion successful for OpenSearch Linux rpm x64!'
                        }
                    }
                }
            }
        }
        always {
            node(AGENT_LINUX_X64) {
                checkout scm
                script {
                    postCleanup()
                }
            }
        }
    }
    ```

prudhvigodithi · 2024-06-06T15:49:34Z

Thanks for this change @gaiksaya should really fasten the release promotion. You can directly modify the existing release-promotion.jenkinsfile file right ?

peterzhuamazon

We only need x64 tar with OpenSearch to be the last as the native plugins of that would override in the end. Everything needs to happen before that run.

I am good with this parallel switch.

Thanks.

gaiksaya · 2024-06-06T16:15:29Z

Hey @gaiksaya for OpenSearch Dashboards Linux tar x64 and OpenSearch Linux tar x64 can we move them to Post success? As today rule of thumb we trigger the x64 tar for OS and OSD at last once all other promotion jobs are completed, since this is now parallel we should allow all other promotions to run in parallel once succeeded then tun the x64 tar in parallel. Please check something like.

I believe the current set up will do the same. Worried that post stage may cause issues in the run. Trying to keep post stages for side activities rather than main workflow run. In serial run, the x64 wont trigger unless all parallel succeeds.

gaiksaya · 2024-06-06T16:16:11Z

Thanks for this change @gaiksaya should really fasten the release promotion. You can directly modify the existing release-promotion.jenkinsfile file right ?

We can just wanted to keep that as a back up incase this workflow causes some issues. Once we know it works we can replace the original one and deprecate this. WDYT?

Run promotion jobs parallely

9eb06c9

Signed-off-by: Sayali Gaikawad <[email protected]>

gaiksaya requested review from dblock, peterzhuamazon, rishabh6788, zelinh, prudhvigodithi, Divyaasm and tianleh as code owners June 6, 2024 04:00

github-actions bot added the distinguished-contributor label Jun 6, 2024

gaiksaya changed the title ~~Run promotion jobs parallely~~ Run promotion jobs in parallel Jun 6, 2024

getsaurabh02 reviewed Jun 6, 2024

View reviewed changes

Fix tests

372608c

Signed-off-by: Sayali Gaikawad <[email protected]>

getsaurabh02 reviewed Jun 6, 2024

View reviewed changes

getsaurabh02 approved these changes Jun 6, 2024

View reviewed changes

getsaurabh02 reviewed Jun 6, 2024

View reviewed changes

Divyaasm reviewed Jun 6, 2024

View reviewed changes

peterzhuamazon approved these changes Jun 6, 2024

View reviewed changes

peterzhuamazon mentioned this pull request Jun 6, 2024

[RELEASE] Release version 2.15.0 #4681

Closed

74 tasks

Divyaasm approved these changes Jun 6, 2024

View reviewed changes

prudhvigodithi approved these changes Jun 6, 2024

View reviewed changes

gaiksaya merged commit 04be2b9 into opensearch-project:main Jun 6, 2024
10 checks passed

gaiksaya deleted the add-parallel branch June 6, 2024 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run promotion jobs in parallel #4747

Run promotion jobs in parallel #4747

gaiksaya commented Jun 6, 2024 •

edited

Loading

getsaurabh02 left a comment

getsaurabh02 Jun 6, 2024

gaiksaya Jun 6, 2024

getsaurabh02 Jun 6, 2024

gaiksaya Jun 6, 2024

getsaurabh02 Jun 6, 2024

gaiksaya Jun 6, 2024

getsaurabh02 Jun 6, 2024

getsaurabh02 Jun 6, 2024

gaiksaya Jun 6, 2024 •

edited

Loading

getsaurabh02 Jun 6, 2024

gaiksaya commented Jun 6, 2024

getsaurabh02 Jun 6, 2024

gaiksaya Jun 6, 2024

Divyaasm Jun 6, 2024

gaiksaya Jun 6, 2024

prudhvigodithi commented Jun 6, 2024

prudhvigodithi commented Jun 6, 2024

peterzhuamazon left a comment

gaiksaya commented Jun 6, 2024 •

edited

Loading

gaiksaya commented Jun 6, 2024

Run promotion jobs in parallel #4747

Run promotion jobs in parallel #4747

Conversation

gaiksaya commented Jun 6, 2024 • edited Loading

Description

Issues Resolved

getsaurabh02 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gaiksaya Jun 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gaiksaya commented Jun 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prudhvigodithi commented Jun 6, 2024

prudhvigodithi commented Jun 6, 2024

peterzhuamazon left a comment

Choose a reason for hiding this comment

gaiksaya commented Jun 6, 2024 • edited Loading

gaiksaya commented Jun 6, 2024

gaiksaya commented Jun 6, 2024 •

edited

Loading

gaiksaya Jun 6, 2024 •

edited

Loading

gaiksaya commented Jun 6, 2024 •

edited

Loading