Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.github: Fix caching of integration test runs #5041

Merged
merged 3 commits into from
Sep 7, 2023

Conversation

tamirms
Copy link
Contributor

@tamirms tamirms commented Sep 7, 2023

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

Failed integration tests failed on PRs would always succeed on subsequent retries because GithubCI would find an entry in the our test cache and assume that the tests had previously passed. This bug is caused by the fact that the cache key was the same for different integration tests.

The integration test matrix includes the following parameters:

  • os
  • go version
  • postgres version
  • ingestion-backend
  • protocol-version

Not all of these parameters were included in the cache key. In particular, the ingestion-backend parameter was not part of the cache key. If the integration tests succeeded with legacy ingestion via stellar core's postgres db but failed with captive core ingestion, the successful run would be stored in the cache using the same cache key as the failed run.

To fix this issue I have included the PREFIX environment variable in the cache key. The PREFIX environment variable includes all of the test matrix parameters.

Known limitations

Note we still might have this issue with flaky tests. After the flaky test succeeds for the first time, the test run will be included in the cache. Then all subsequent attempts at retrying the test will restore from the cache instead of executing the tests.

@tamirms tamirms marked this pull request as ready for review September 7, 2023 14:47
@tamirms
Copy link
Contributor Author

tamirms commented Sep 7, 2023

Note that the current test failures are due to #5035 which will only be fixed in stellar-core

@tamirms tamirms requested a review from a team September 7, 2023 14:48
Copy link
Contributor

@tsachiherman tsachiherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is great. One more suggestion:

  • in horizon_binary_test_hash, add lookup-only to avoid restoring the actual data ( we don't really need it ).
  • in verify-range, use the steps.horizon_binary_tests_hash.outputs.cache-hit != 'true' condition.

@tamirms
Copy link
Contributor Author

tamirms commented Sep 7, 2023

@tsachiherman I set lookup-only to true in horizon_binary_tests_hash. However, I don't think I can use steps.horizon_binary_tests_hash.outputs.cache-hit != 'true' in verify-range because verify-range is a separate from the integration tests job and therefore I don't think verify-range can reference the horizon_binary_tests_hash step.

@tsachiherman
Copy link
Contributor

@tsachiherman I set lookup-only to true in horizon_binary_tests_hash. However, I don't think I can use steps.horizon_binary_tests_hash.outputs.cache-hit != 'true' in verify-range because verify-range is a separate from the integration tests job and therefore I don't think verify-range can reference the horizon_binary_tests_hash step.

yes, I missed that. We might want to have a similar mechanism there as well - but it's out of scope for this PR.

@tamirms tamirms merged commit cac7701 into stellar:master Sep 7, 2023
@tamirms tamirms deleted the fix-integration-test-caching branch September 7, 2023 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants