.github: Fix caching of integration test runs #5041

tamirms · 2023-09-07T08:37:08Z

PR Checklist

PR Structure

This PR has reasonably narrow scope (if not, break it down into smaller PRs).
This PR avoids mixing refactoring changes with feature changes (split into two PRs
otherwise).
This PR's title starts with name of package that is most changed in the PR, ex.
services/friendbot, or all or doc if the changes are broad or impact many
packages.

Thoroughness

This PR adds tests for the most critical parts of the new functionality or fixes.
I've updated any docs (developer docs, .md
files, etc... affected by this change). Take a look in the docs folder for a given service,
like this one.

Release planning

I've updated the relevant CHANGELOG (here for Horizon) if
needed with deprecations, added features, breaking changes, and DB schema changes.
I've decided if this PR requires a new major/minor version according to
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.

What

Failed integration tests failed on PRs would always succeed on subsequent retries because GithubCI would find an entry in the our test cache and assume that the tests had previously passed. This bug is caused by the fact that the cache key was the same for different integration tests.

The integration test matrix includes the following parameters:

os
go version
postgres version
ingestion-backend
protocol-version

Not all of these parameters were included in the cache key. In particular, the ingestion-backend parameter was not part of the cache key. If the integration tests succeeded with legacy ingestion via stellar core's postgres db but failed with captive core ingestion, the successful run would be stored in the cache using the same cache key as the failed run.

To fix this issue I have included the PREFIX environment variable in the cache key. The PREFIX environment variable includes all of the test matrix parameters.

Known limitations

Note we still might have this issue with flaky tests. After the flaky test succeeds for the first time, the test run will be included in the cache. Then all subsequent attempts at retrying the test will restore from the cache instead of executing the tests.

tamirms · 2023-09-07T14:48:02Z

Note that the current test failures are due to #5035 which will only be fixed in stellar-core

tsachiherman

I think that this is great. One more suggestion:

in horizon_binary_test_hash, add lookup-only to avoid restoring the actual data ( we don't really need it ).
in verify-range, use the steps.horizon_binary_tests_hash.outputs.cache-hit != 'true' condition.

tamirms · 2023-09-07T15:22:18Z

@tsachiherman I set lookup-only to true in horizon_binary_tests_hash. However, I don't think I can use steps.horizon_binary_tests_hash.outputs.cache-hit != 'true' in verify-range because verify-range is a separate from the integration tests job and therefore I don't think verify-range can reference the horizon_binary_tests_hash step.

tsachiherman · 2023-09-07T15:25:41Z

@tsachiherman I set lookup-only to true in horizon_binary_tests_hash. However, I don't think I can use steps.horizon_binary_tests_hash.outputs.cache-hit != 'true' in verify-range because verify-range is a separate from the integration tests job and therefore I don't think verify-range can reference the horizon_binary_tests_hash step.

yes, I missed that. We might want to have a similar mechanism there as well - but it's out of scope for this PR.

tamirms added 2 commits September 7, 2023 09:36

Fix caching of integration test runs

1c92914

include prefix in hash

4f46cec

tamirms marked this pull request as ready for review September 7, 2023 14:47

tamirms requested a review from a team September 7, 2023 14:48

tsachiherman approved these changes Sep 7, 2023

View reviewed changes

set lookup-only

0d01fd1

tamirms merged commit cac7701 into stellar:master Sep 7, 2023

tamirms deleted the fix-integration-test-caching branch September 7, 2023 16:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github: Fix caching of integration test runs #5041

.github: Fix caching of integration test runs #5041

tamirms commented Sep 7, 2023 •

edited

Loading

tamirms commented Sep 7, 2023

tsachiherman left a comment •

edited

Loading

tamirms commented Sep 7, 2023

tsachiherman commented Sep 7, 2023

.github: Fix caching of integration test runs #5041

.github: Fix caching of integration test runs #5041

Conversation

tamirms commented Sep 7, 2023 • edited Loading

PR Structure

Thoroughness

Release planning

What

Known limitations

tamirms commented Sep 7, 2023

tsachiherman left a comment • edited Loading

Choose a reason for hiding this comment

tamirms commented Sep 7, 2023

tsachiherman commented Sep 7, 2023

tamirms commented Sep 7, 2023 •

edited

Loading

tsachiherman left a comment •

edited

Loading