-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong raft messages may cause etcd panic #17081
Comments
sry,when will backport the fix to 3.5 |
Please feel free to raise two PRs, one for 3.5 and the other for 3.4. thx |
Hey @ahrtr I can attempt the backport! :) /assign |
This was referenced Apr 17, 2024
@ahrtr I think we can close this issue? |
Yes, with fix being backported to all branches and changelog updated we can close. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Bug report criteria
What happened?
Multiple etcd servers repeatedly panic with a message "tocommit(4432450) is out of range [lastIndex(4432444)]. Was the raft log corrupted, truncated, or lost?"
The first panic occurred one month after the etcd started running. At that time, there were some network issues, leading to TCP reconnections.
What did you expect to happen?
No panic.
How can we reproduce it (as minimally and precisely as possible)?
It's hard to reproduce the panic in real world, but we have reproduced it in a unit test.
Anything else we need to know?
After investigation, we suspect that a malfunctioning switch or a similar issue may be dispatching incorrect Raft messages, leading to the panic. Currently, etcd does not check the
To
field in messages.Etcd version (please run commands below)
We (tikv/pd) use embed etcd v3.4.21 85b640cee793
Etcd configuration (command line flags or environment variables)
Default settings.
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output
The text was updated successfully, but these errors were encountered: