Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integ tests to OpenSearch distribution build jenkins job #1517

Merged
merged 14 commits into from
Feb 2, 2022

Conversation

ohltyler
Copy link
Member

@ohltyler ohltyler commented Jan 19, 2022

Description

Adds a call to an integ test groovy script which triggers the integ test jenkins job, inside of the distribution build jenkins job.

More details:

  • adds a TEST_MANIFEST parameter to the distribution-build job, needed for propagating to integ-test job
  • tests are triggered within the async build steps, so both x64 and arm64 integ tests can run in parallel
  • renames manifest arg used in many of the dependent groovy fns to inputManifest or buildManifest to make code more readable. Makes the fn calls explicitly set the input/build manifest args rather than relying on implicit env vars (e.g., relying on an existing INPUT_MANIFEST var to be set)

PR is currently in draft. Things to do before it is ready for merging:

  • add back the notification steps in distribution-build job
  • change references to test jobs (Playground/ohltyler-*) to production jobs
  • wrap integ test fn in a try/catch to prevent the distribution-build job to fail if the tests fail
  • propagate the test results in one of two ways:
    1. link the integ-test notification text with an associated distribution-build job (right now it only includes integ-test job info)
    2. propagate integ-test results inside of distribution-build job (not sure on the feasibility of that)

Issues Resolved

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ohltyler ohltyler requested a review from a team as a code owner January 19, 2022 22:57
@codecov-commenter
Copy link

codecov-commenter commented Jan 19, 2022

Codecov Report

Merging #1517 (debd67b) into main (259f9ac) will increase coverage by 5.65%.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff              @@
##               main     #1517      +/-   ##
=============================================
+ Coverage     94.34%   100.00%   +5.65%     
=============================================
  Files           148         4     -144     
  Lines          3168        63    -3105     
  Branches          9        14       +5     
=============================================
- Hits           2989        63    -2926     
+ Misses          172         0     -172     
+ Partials          7         0       -7     
Impacted Files Coverage Δ
src/jenkins/BuildManifest.groovy
src/jenkins/InputManifest.groovy
...sts/jenkins/jobs/ArchiveAssembleUpload_Jenkinsfile
tests/jenkins/jobs/AssembleUpload_Jenkinsfile
tests/jenkins/jobs/BuildArchive_Jenkinsfile
tests/jenkins/jobs/BuildDockerImage_Jenkinsfile
tests/jenkins/jobs/BuildManifest_Jenkinsfile
...ts/jenkins/jobs/BuildUploadManifestSHA_Jenkinsfile
tests/jenkins/jobs/GetManifestSHA_Jenkinsfile
tests/jenkins/jobs/InputManifest_Jenkinsfile
... and 142 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 259f9ac...debd67b. Read the comment docs.

echo "buildManifestUrl (x64): ${buildManifestUrl}"

String testManifest = "manifests/${TEST_MANIFEST}"
if (fileExists(testManifest)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dblock @peternied any other suggestion for how to handle this case? For example, plugins typically aren't on the next major version (e.g. 2.0.0), and there is no test manifest located in manifests/2.0.0, and I'm assuming it's for this reason.

This solution just skips running the tests and adds some output explaining that no tests were ran since no test manifest was found.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about not bothering to check? If the integration tests workflow can fail because the config doesn't exist seems appropriate to me.

We avoid having problems with build manifest not existing by having them automatically generated on new version detection GitHub action, Manifest workflow. we should update that logic to add a test manifest automatically. Alternatively this might be an appropriate time to merge the input manifest and test manifests into one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the reason there's no test manifest for 2.0.0 is because of the manual burden. And there really should be a test manifest for every input manifest. In that case I agree it makes sense to let the integ test workflow fail if no test manifest is passed.

Regarding automated test manifest generation, I'll look into it as a future PR.

@ohltyler ohltyler marked this pull request as draft January 19, 2022 23:02
echo "buildManifestUrl (x64): ${buildManifestUrl}"

String testManifest = "manifests/${TEST_MANIFEST}"
if (fileExists(testManifest)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about not bothering to check? If the integration tests workflow can fail because the config doesn't exist seems appropriate to me.

We avoid having problems with build manifest not existing by having them automatically generated on new version detection GitHub action, Manifest workflow. we should update that logic to add a test manifest automatically. Alternatively this might be an appropriate time to merge the input manifest and test manifests into one

def inputManifest = lib.jenkins.InputManifest.new(readYaml(file: args.manifest))
def buildManifest = lib.jenkins.BuildManifest.new(readYaml(file: args.manifest))
String artifactUrl = inputManifest.getPublicDistUrl("${PUBLIC_ARTIFACT_URL}", "${JOB_NAME}", "${BUILD_NUMBER}", buildManifest.build.platform, buildManifest.build.architecture)
echo "Setting env.\"ARTIFACT_URL_${buildManifest.build.platform}_${buildManifest.build.architecture}\"=${artifactUrl}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be deleted, it creates the part of the build notification with instructions on where the build is located.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding this isn't used by the build notifications - instead, messages are added in uploadArtifacts which is called in the build steps via buildAssembleUpload. Then, the message with the links to the manifests are added in the notification's extra field here.

But, these env vars are still used in buildDockerImage. These can be removed though, and we could set them explicitly in the pipeline itself (we already derive the value here). Then they can be passed explicitly to buildDockerImage() instead of relying on an env var being set. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for checking in on that.

I would prefer moving to parameters to remove these 'hidden' ENV variables

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agreed, I'll go ahead with that approach

@@ -0,0 +1,14 @@
void call(Map args = [:]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this file as indirection is very useful the amount of code to call this function is nearly the same as this file. Note; this would be more useful if it was automagically handling opensearch dashboards

FYI @tianleh

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this script isn't doing so much now, and is really only abstracting out the job name. I'm ok to remove and add directly in distribution-build workflow.

Regarding dashboards, ideally the integ-test job won't need new parameters - runIntegTestScript (which is called in the integ-test job) is where the logic can be added to handle dashboards. It may need some additional args to handle multiple build manifests, for example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussion about the support for OpenSearch Dashboards. It will need the artifact url for Dashboards from its own distribution build job (similar to the one in this PR) and one additional url for OpenSearch (which will be the latest url). Maybe there will be some common logic for the first part. However, to avoid using if/else as branching logic, I could image that it is very possible to be two different files. I am ok to perform some refactoring once I start working on the Dashboards part (after your work is merged).

def inputManifest = lib.jenkins.InputManifest.new(readYaml(file: args.manifest))
def buildManifest = lib.jenkins.BuildManifest.new(readYaml(file: args.manifest))
String artifactUrl = inputManifest.getPublicDistUrl("${PUBLIC_ARTIFACT_URL}", "${JOB_NAME}", "${BUILD_NUMBER}", buildManifest.build.platform, buildManifest.build.architecture)
echo "Setting env.\"ARTIFACT_URL_${buildManifest.build.platform}_${buildManifest.build.architecture}\"=${artifactUrl}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for checking in on that.

I would prefer moving to parameters to remove these 'hidden' ENV variables

@ohltyler
Copy link
Member Author

Still investigating the instability in integ tests. Some are related to plugin-level flaky tests, others are related to the Jenkins envs. Tracking in meta issue here: #1531

I feel we shouldn't merge these changes until the integ tests become stable enough where there's not constant distribution-build failures because of this. The other option is try/catch the integ test failures for now to prevent the builds from being blocked, and remove at a later time when the tests are more stable.

@peternied
Copy link
Member

Still investigating the instability in integ tests. Some are related to plugin-level flaky tests, others are related to the Jenkins envs. Tracking in meta issue here: #1531

I feel we shouldn't merge these changes until the integ tests become stable enough where there's not constant distribution-build failures because of this. The other option is try/catch the integ test failures for now to prevent the builds from being blocked, and remove at a later time when the tests are more stable.

I would prefer we run these tests more and start to flush out all the different random failures.

Can merge a version of this PR that does not stop the distribution jobs success notification and we get a second notification for the integ test pass/failure?

@ohltyler
Copy link
Member Author

Still investigating the instability in integ tests. Some are related to plugin-level flaky tests, others are related to the Jenkins envs. Tracking in meta issue here: #1531
I feel we shouldn't merge these changes until the integ tests become stable enough where there's not constant distribution-build failures because of this. The other option is try/catch the integ test failures for now to prevent the builds from being blocked, and remove at a later time when the tests are more stable.

I would prefer we run these tests more and start to flush out all the different random failures.

Can merge a version of this PR that does not stop the distribution jobs success notification and we get a second notification for the integ test pass/failure?

Agreed, that's why I've opened that other issue to try to throw in all of the different errors I've seen.

I think it makes sense to merge a non-blocking version for now and continue fixing the integ test failures.

Regarding notifications, the integ-test job itself already publishes those results in its own job. I think we could either:

  1. Link the integ-test notification text with an associated distribution-build job (right now it only includes integ-test job info)
  2. Propagate integ-test results inside of distribution-build job, but not sure on the feasibility of that just yet

@peternied
Copy link
Member

I'm holding off on reviewing until this switching over to ready for review whenever you are ready @ohltyler!

@ohltyler ohltyler force-pushed the add-to-distribution-build branch from 2f52a24 to 1f297e7 Compare January 29, 2022 00:36
@ohltyler ohltyler marked this pull request as ready for review January 29, 2022 00:42
Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the remaining renames

@ohltyler ohltyler requested a review from peternied January 31, 2022 16:45
@ohltyler ohltyler dismissed peternied’s stale review January 31, 2022 16:46

Changes have been addressed

@ohltyler ohltyler force-pushed the add-to-distribution-build branch from cbbb989 to bd91902 Compare January 31, 2022 17:41
Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super close, just a couple changes and lets get this merged

vars/buildDockerImage.groovy Show resolved Hide resolved
vars/runIntegTestScript.groovy Outdated Show resolved Hide resolved
@ohltyler ohltyler requested a review from peternied February 2, 2022 18:36
@ohltyler ohltyler dismissed peternied’s stale review February 2, 2022 19:08

Addressed in latest commit

@peternied peternied merged commit 6c54533 into opensearch-project:main Feb 2, 2022
peterzhuamazon pushed a commit to peterzhuamazon/opensearch-build that referenced this pull request Feb 16, 2022
…rch-project#1517)

* Hook up integ tests to distribution-build

Signed-off-by: Tyler Ohlsen <[email protected]>

* yamlfix test manifest file

Signed-off-by: Tyler Ohlsen <[email protected]>

* Remove unnecessary BuildManifest fns

Signed-off-by: Tyler Ohlsen <[email protected]>

* Remove runIntegTests(); refactor buildDockerImage vars as params

Signed-off-by: Tyler Ohlsen <[email protected]>

* Comment out notification block on dashboards dist build job

Signed-off-by: Tyler Ohlsen <[email protected]>

* Test out stabler integ test job

Signed-off-by: Tyler Ohlsen <[email protected]>

* Add test pipeline for pulling nested build info

Signed-off-by: Tyler Ohlsen <[email protected]>

* Remove test pipeline

Signed-off-by: Tyler Ohlsen <[email protected]>

* Add back notifications in OSD dist build

Signed-off-by: Tyler Ohlsen <[email protected]>

* Remove few extra test output lines

Signed-off-by: Tyler Ohlsen <[email protected]>

* Clean up buildDockerImage tests

Signed-off-by: Tyler Ohlsen <[email protected]>

* Update params in check-for-build cron; add icon in notifications

Signed-off-by: Tyler Ohlsen <[email protected]>

* Simplify notification

Signed-off-by: Tyler Ohlsen <[email protected]>

* Change references back to prod jenkins jobs

Signed-off-by: Tyler Ohlsen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants