-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI Failure (NodeCrash "Attempted to upgrade from incompatible logical version 10 to logical version 12!") in FeaturesUpgradeAssertionTest.test_upgrade_assertion
#11275
Labels
Comments
andijcr
added
kind/bug
Something isn't working
ci-failure
sev/high
loss of availability, pathological performance degradation, recoverable corruption
labels
Jun 7, 2023
This is a class of "failed to stop in 30 seconds" bugs, the assertion that is hit is explicitly tested for and the |
rockwotj
removed
the
sev/high
loss of availability, pathological performance degradation, recoverable corruption
label
Jun 12, 2023
rockwotj
added a commit
to rockwotj/redpanda
that referenced
this issue
Jun 12, 2023
In redpanda-data#11275 we get a false positive detecting a crash because of an expected log line. Let's ignore those when looking for crashes. Note the test is still failing, but due to a different reason than what this function is calling. Signed-off-by: Tyler Rockwood <[email protected]>
7 tasks
rockwotj
added a commit
to rockwotj/redpanda
that referenced
this issue
Jun 12, 2023
In redpanda-data#11275 we get a false positive detecting a crash because of an expected log line. Let's ignore those when looking for crashes. Note the test is still failing, but due to a different reason than what this function is calling. Signed-off-by: Tyler Rockwood <[email protected]>
rockwotj
added a commit
to rockwotj/redpanda
that referenced
this issue
Jun 12, 2023
In redpanda-data#11275 we get a false positive detecting a crash because of an expected log line. Let's ignore those when looking for crashes. Note the test is still failing, but due to a different reason than what this function is calling. Signed-off-by: Tyler Rockwood <[email protected]>
rockwotj
added a commit
to rockwotj/redpanda
that referenced
this issue
Jun 13, 2023
It seems like redpanda_pid can pick up false pids. When debugging redpanda-data#11275, it seems like there is an apport process who's pid is being picked up. Here's a process dump that includes the line. There are no running redpanda processes, but checking for redpanda processes times out, leading me to believe that this process is being picked up. In that case `stop_node` can fail if the node is already failed in CDT, as this process is picked up. ``` [DEBUG - 2023-06-12 02:54:07,216 - redpanda - _log_node_process_state - lineno:2215]: root 136962 99.6 6.0 1587948 1452804 ? R 02:53 0:49 /usr/bin/python3 /usr/shar e/apport/apport -p136954 -s5 -c18446744073709551615 -d1 -P136954 -u0 -g0 -- !opt!redpanda_installs!head!libexec!redpanda ``` Fixes: redpanda-data#11275 Signed-off-by: Tyler Rockwood <[email protected]>
7 tasks
rockwotj
added a commit
to rockwotj/redpanda
that referenced
this issue
Jun 13, 2023
It seems like redpanda_pid can pick up false pids in the current grep filtering approach. Use `pgrep` to not incorrectly pickup a command with a command line arg of `redpanda`, which seems likely in our environments. When debugging redpanda-data#11275, it seems like there is an apport process who's pid is being picked up. Here's a process dump that includes the line. There are no running redpanda processes, but checking for redpanda processes times out, leading me to believe that this process is being picked up. In that case `stop_node` can fail if the node is already failed in CDT, as this process is picked up. ``` [DEBUG - 2023-06-12 02:54:07,216 - redpanda - _log_node_process_state - lineno:2215]: root 136962 99.6 6.0 1587948 1452804 ? R 02:53 0:49 /usr/bin/python3 /usr/shar e/apport/apport -p136954 -s5 -c18446744073709551615 -d1 -P136954 -u0 -g0 -- !opt!redpanda_installs!head!libexec!redpanda ``` Fixes: redpanda-data#11275 Signed-off-by: Tyler Rockwood <[email protected]>
This was referenced Jun 14, 2023
vbotbuildovich
pushed a commit
to vbotbuildovich/redpanda
that referenced
this issue
Jun 14, 2023
It seems like redpanda_pid can pick up false pids in the current grep filtering approach. Use `pgrep` to not incorrectly pickup a command with a command line arg of `redpanda`, which seems likely in our environments. When debugging redpanda-data#11275, it seems like there is an apport process who's pid is being picked up. Here's a process dump that includes the line. There are no running redpanda processes, but checking for redpanda processes times out, leading me to believe that this process is being picked up. In that case `stop_node` can fail if the node is already failed in CDT, as this process is picked up. ``` [DEBUG - 2023-06-12 02:54:07,216 - redpanda - _log_node_process_state - lineno:2215]: root 136962 99.6 6.0 1587948 1452804 ? R 02:53 0:49 /usr/bin/python3 /usr/shar e/apport/apport -p136954 -s5 -c18446744073709551615 -d1 -P136954 -u0 -g0 -- !opt!redpanda_installs!head!libexec!redpanda ``` Fixes: redpanda-data#11275 Signed-off-by: Tyler Rockwood <[email protected]> (cherry picked from commit 4a2985f)
vbotbuildovich
pushed a commit
to vbotbuildovich/redpanda
that referenced
this issue
Jun 14, 2023
It seems like redpanda_pid can pick up false pids in the current grep filtering approach. Use `pgrep` to not incorrectly pickup a command with a command line arg of `redpanda`, which seems likely in our environments. When debugging redpanda-data#11275, it seems like there is an apport process who's pid is being picked up. Here's a process dump that includes the line. There are no running redpanda processes, but checking for redpanda processes times out, leading me to believe that this process is being picked up. In that case `stop_node` can fail if the node is already failed in CDT, as this process is picked up. ``` [DEBUG - 2023-06-12 02:54:07,216 - redpanda - _log_node_process_state - lineno:2215]: root 136962 99.6 6.0 1587948 1452804 ? R 02:53 0:49 /usr/bin/python3 /usr/shar e/apport/apport -p136954 -s5 -c18446744073709551615 -d1 -P136954 -u0 -g0 -- !opt!redpanda_installs!head!libexec!redpanda ``` Fixes: redpanda-data#11275 Signed-off-by: Tyler Rockwood <[email protected]> (cherry picked from commit 4a2985f)
vbotbuildovich
pushed a commit
to vbotbuildovich/redpanda
that referenced
this issue
Jun 14, 2023
It seems like redpanda_pid can pick up false pids in the current grep filtering approach. Use `pgrep` to not incorrectly pickup a command with a command line arg of `redpanda`, which seems likely in our environments. When debugging redpanda-data#11275, it seems like there is an apport process who's pid is being picked up. Here's a process dump that includes the line. There are no running redpanda processes, but checking for redpanda processes times out, leading me to believe that this process is being picked up. In that case `stop_node` can fail if the node is already failed in CDT, as this process is picked up. ``` [DEBUG - 2023-06-12 02:54:07,216 - redpanda - _log_node_process_state - lineno:2215]: root 136962 99.6 6.0 1587948 1452804 ? R 02:53 0:49 /usr/bin/python3 /usr/shar e/apport/apport -p136954 -s5 -c18446744073709551615 -d1 -P136954 -u0 -g0 -- !opt!redpanda_installs!head!libexec!redpanda ``` Fixes: redpanda-data#11275 Signed-off-by: Tyler Rockwood <[email protected]> (cherry picked from commit 4a2985f)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
arm
https://buildkite.com/redpanda/vtools/builds/7957#01889253-b38f-4abd-9b03-68e462adb10f
The text was updated successfully, but these errors were encountered: