-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3.4] panic: runtime error: invalid memory address or nil pointer dereference #14256
Comments
@ramses, any update on this issue? |
I suggest to diff the code between release-3.4 and (releasae-3.5 or main). cc @SimFG to take a look as well. If there is no any progress until early next week, then I may take a deep dive myself. |
@ahrtr Ok, I will look it. |
I suggest to resolve this issue firstly. @SimFG @ramses Comment on #14143,
|
I agree on both counts, @ahrtr. I'll keep digging down to be sure which one is it.
Noted, @SimFG . By the way, nice to meet you (in GitHub-verse) 🙂 |
@ahrtr I didn't find a solution to this problem. At first I thought it was because |
Thanks @SimFG for the feedback. Let me take care of this issue. |
This issue can happen on both The root cause is the etcd just stops immediately before (*readView) Range is called, so the tx is nil; accordingly the following range operation panics. I managed to reproduce this issue intentionally by simulating the situation mentioned above. The code change (on 3.4) is below,
Then the issue can always be reproduced when running test
The possibility of running into this issue is really low in production environment. It's even harder to reproduce this issue in the pipeline after merging #14290 . Since the etcdserver should have already been stopped when running into this issue, it means the last transaction should have already been committed, so it should be safe. So I don't think it's a blocker for 3.4.20 any more. I will think about a formal fix or refactoring in future. |
i think i've met the same issue, start etcd v3.4.20 via static pod in a k8s cluster
|
One quick question, did you just see this issue once or can you easily reproduce this issue? @JohnJAS |
hi @ahrtr |
@ahrtr I've reproduced this issue on another cluster and just hit on the same issue. I manually upgraded the etcd pods on each node one by one from v3.4.16 to v3.4.20. Then etcd pods crashed and exited like the following. Error logs:
Add something I have observed. When I try to use So I think this issue is not related the env and it is easy to reproduce. However, ETCD v3.4.16 works well, but we still got some CVEs on this version and that's why we want to upgrade ETCD. I will test etcd v3.4.19 which including the fix for those CVEs tomorrow. Update later. |
We hit the same problem with etcd version 3.4.20 and it is very easy to reproduce in our environment. |
I can confirm that etcd v3.4.19 doesn't have this issue and we are happy to see that the two CVEs are fixed in this version. It looks like some fix in v3.4.20 had imported the current issue. However, the error message in my case is not 100% as yours, though they are all about the Hope my research can help to narrow down the range of root cause. |
runtime error: index out of range [0] with length 0 |
I saw this error multiple times in 3.4 pipeline,
Refer to https://github.com/etcd-io/etcd/runs/7460637358?check_suite_focus=true
Please see my comment: #14256 (comment)
The text was updated successfully, but these errors were encountered: