Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v23.2.x] c/archival_stm: do not reset _last_replicate on timeout #15816

Merged

Conversation

nvartolomei
Copy link
Contributor

@nvartolomei nvartolomei commented Dec 20, 2023

Backport of:

Fixes #15783

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

  • none

@nvartolomei
Copy link
Contributor Author

/cdt

@nvartolomei
Copy link
Contributor Author

/cdt
tests/rptest/tests/archival_test.py

@nvartolomei
Copy link
Contributor Author

/cdt

@nvartolomei
Copy link
Contributor Author

/cdt
rp_version=build
dt-repeat=3
tests/rptest/tests/e2e_shadow_indexing_test.py

@nvartolomei
Copy link
Contributor Author

/dt

@nvartolomei
Copy link
Contributor Author

@dotnwat last one before Christmas. 🙏

@nvartolomei nvartolomei added this to the v23.2.22 milestone Dec 22, 2023
For same term syncing we store the replicate commands future so that it
can be awaited in the next sync call. However, the
`std::exchange(_last_replicate, std::nullopt)` before the wait is
problematic as it "forgets" the last replicate. If sync times out and we
retry it then it behaves as if the last replicate command succeeded.
This is not necessarily true as the newly added test assertion shows.

Use a shared future for _last_replicate so that it can be awaited
multiple times and reset it only after it is resolved. This guarantees
that sync will only return after last replicate command actually
resolved.

`ignore_ready_future` call is removed as the shared future does not
issue a warning if it wasn't consumed (basically a revert of
e221a0a).

(cherry picked from commit 299e12d)
Some operations try to sync with relatively low timeouts. Under stress,
they are expected to time out occasionally. This is a warning at most.

Downgrade the log level to avoid spooking users and ducktape tests.
This is already present in dev. Tests that rely on blocking s3 traffic
via iptables rely on this.
@nvartolomei nvartolomei merged commit e909bf6 into redpanda-data:v23.2.x Jan 3, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants