-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: tpcc/headroom/n4cpu16 failed #37163
Comments
Looks like towards the end of the test the QPS for some types of transactions dropped to 0. And the we also failed to download the debug.zip. |
No I haven't, but I wouldn't be surprised if this is related to what we're seeing in #37199. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
The failures over the past 3 days are because of #37590. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Previous three issues addressed by #37701. |
This comment has been minimized.
This comment has been minimized.
Latest failure addressed by #37726. |
SHA: https://github.com/cockroachdb/cockroach/commits/699f675c73f8420802f92e46f65e6dce52abc12f Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1306272&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/db98d5fb943e0a45b3878bdf042838408e9aee40 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1308285&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/db98d5fb943e0a45b3878bdf042838408e9aee40 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1308281&tab=buildLog
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Minimized comments above addressed by #38022. |
SHA: https://github.com/cockroachdb/cockroach/commits/5a88de2233e1405c0553f2d5380fd24218fac3d2 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1324173&tab=buildLog
|
This was missed during cockroachdb#37726. Closes cockroachdb#37488. Touches cockroachdb#37163. Release note: None
SHA: https://github.com/cockroachdb/cockroach/commits/cbd571c7bf2ffad3514334a410fa7a728b1f5bf0 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1330352&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/cbd571c7bf2ffad3514334a410fa7a728b1f5bf0 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1335643&tab=buildLog
|
The last two failures (on release-19.1 both) show the
and then nothing for 6 minutes and then
And then tones of ctx canceled and network errors. The other nodes have network errors starting around 06:03. |
From the 6:01 goroutine dump at n1
There's also a heap profile (on n1) at 6:03 that shows a lots of memory tied up in This line
is actually pretty interesting because it's supposed to only ever print times that are close to ~10s: cockroach/pkg/storage/replica_write.go Lines 204 to 206 in 7e2ceae
cockroach/pkg/storage/replica_write.go Lines 176 to 179 in 7e2ceae
The fact that it took more than 30x that does suggest that something horrible happened to the machine. The CPU seems to be working pretty hard in the minutes leading up to the failure
"1513.2% utime" basically means the 16 cpus are maxing out. But still, I think this is already part of the downfall here, and I can't imagine overloading a machine to the point where lots of goroutines don't get scheduled for 300s. There's nothing in dmesg, but I found this in sysctl:
I tried googling for that message but with zero success. The kernel here is
|
^- anyway, this udev message doesn't seem like something we could trigger with a bug in crdb. Maybe the vm was "live" migrated? |
SHA: https://github.com/cockroachdb/cockroach/commits/fb50abbb49dc22d404100c925512364f330fb89b Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1344398&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/f1c9693da739fa5fc2c94d4d978fadd6710d17da Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1371441&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/1ca35fc4a0e2665e7f6efd945e65a0db97984fa7 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1396096&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/1ad0ecc8cbddf82c9fedb5a5c5e533e72a657ff7 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1399000&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/7111a67b2ea3a19c2f312f8d214b8823f431cac0 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1400942&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/86eab2ff0a1a4c2d9b5f7e7a45deda74c98c6c37 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1402541&tab=buildLog
|
SHA: https://github.com/cockroachdb/cockroach/commits/26edea51118a0e16b61748c08068bfa6f76543ca Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1404886&tab=buildLog
|
These recent failures have been identified as #39103. |
SHA: https://github.com/cockroachdb/cockroach/commits/a53852a8f6c02ca5573a22abc03c790326ef69ba
Parameters:
To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1262103&tab=buildLog
The text was updated successfully, but these errors were encountered: