-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: perturbation/metamorphic/backfill failed #131713
Comments
Unacceptable change in latency from the backfill caused the failure. Few odd things stuck out, such as the disk read bandwidth and IO overload at the start, prior to the workload: I don't think these caused the failure but they seem unexpected to me, the fill absolutely decimates L0. It'd probably be a good test in of itself for v2 rac with a send queue. Didn't look too much further for the actual failure -- The metamorphic vars might be playing a part here. We should consider disabling the pass-fail criteria for the metamorphic variants until after we enable RACv2 fully on master, post branch-cut in a month or so cc @andrewbaptist |
The IO overload during the fill is expected without RAC v2 since it is non-elastic writes that are sent to a non-leaseholder. The test is set up to wait until the overload dissipates before running the rest of it. In terms of this failure, I had left this as "non-Infinity" passing criteria as we had though that RAC v1 should have addressed this type of issue. In terms of disabling for metamorphic, I'm adding notes so we can easily recreate it later:
I'll also submit a PR to bump the passing criteria to be 40 rather than 20. I want to keep failures for metamorphic tests in place otherwise it will be hard to notice when they are failing. However we can leave them as P3 until we decide to tackle them. |
roachtest.perturbation/metamorphic/backfill failed with artifacts on master @ 0c0af9540ed3f9d63eba523bc870eeb6c7eebe90:
Parameters:
|
This failure is a process crash due to out of memory. The target node, n30, was killed by the kernel at 12:23:57 according to the logs and the CRDB runtime stats
I241003 12:19:15.013352 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 90 runtime stats: 522 MiB RSS, 1546 goroutines (stacks: 26 MiB), 367 MiB/422 MiB Go alloc/total (heap fragmentation: 17 MiB, heap reserved: 168 KiB, heap released: 1.8 MiB), 10 MiB/18 MiB CGO alloc/total (0.0 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (7x), 3.5 MiB/2.3 MiB (r/w)net
I241003 12:19:25.013029 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 100 runtime stats: 626 MiB RSS, 1652 goroutines (stacks: 24 MiB), 119 MiB/527 MiB Go alloc/total (heap fragmentation: 90 MiB, heap reserved: 279 MiB, heap released: 3.8 MiB), 14 MiB/22 MiB CGO alloc/total (265.5 CGO/sec), 18.8/5.5 %(u/s)time, 0.0 %gc (8x), 1.2 MiB/1.5 MiB (r/w)net
I241003 12:19:35.012879 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 103 runtime stats: 628 MiB RSS, 1643 goroutines (stacks: 25 MiB), 174 MiB/527 MiB Go alloc/total (heap fragmentation: 60 MiB, heap reserved: 254 MiB, heap released: 3.8 MiB), 14 MiB/22 MiB CGO alloc/total (4.2 CGO/sec), 9.5/4.3 %(u/s)time, 0.0 %gc (8x), 693 KiB/1.0 MiB (r/w)net
I241003 12:19:45.013201 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 105 runtime stats: 2.4 GiB RSS, 1659 goroutines (stacks: 23 MiB), 1.5 GiB/1.8 GiB Go alloc/total (heap fragmentation: 66 MiB, heap reserved: 176 MiB, heap released: 40 MiB), 528 MiB/561 MiB CGO alloc/total (674.7 CGO/sec), 129.4/133.5 %(u/s)time, 0.1 %gc (16x), 973 MiB/4.1 MiB (r/w)net
I241003 12:19:55.014263 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 107 runtime stats: 2.3 GiB RSS, 1655 goroutines (stacks: 23 MiB), 1.3 GiB/1.9 GiB Go alloc/total (heap fragmentation: 58 MiB, heap reserved: 437 MiB, heap released: 115 MiB), 330 MiB/373 MiB CGO alloc/total (1800.5 CGO/sec), 202.7/180.6 %(u/s)time, 0.1 %gc (24x), 1.0 GiB/4.8 MiB (r/w)net
I241003 12:20:05.015152 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 110 runtime stats: 2.5 GiB RSS, 1665 goroutines (stacks: 20 MiB), 671 MiB/2.0 GiB Go alloc/total (heap fragmentation: 70 MiB, heap reserved: 1.2 GiB, heap released: 3.7 MiB), 394 MiB/434 MiB CGO alloc/total (1814.2 CGO/sec), 200.6/169.5 %(u/s)time, 0.0 %gc (31x), 954 MiB/5.9 MiB (r/w)net
I241003 12:20:15.016002 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 114 runtime stats: 2.5 GiB RSS, 1671 goroutines (stacks: 24 MiB), 1.6 GiB/2.0 GiB Go alloc/total (heap fragmentation: 57 MiB, heap reserved: 243 MiB, heap released: 50 MiB), 392 MiB/434 MiB CGO alloc/total (1800.8 CGO/sec), 196.1/170.3 %(u/s)time, 0.0 %gc (36x), 869 MiB/4.9 MiB (r/w)net
I241003 12:20:25.014170 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 118 runtime stats: 2.5 GiB RSS, 1667 goroutines (stacks: 24 MiB), 1.3 GiB/1.9 GiB Go alloc/total (heap fragmentation: 63 MiB, heap reserved: 554 MiB, heap released: 224 MiB), 394 MiB/435 MiB CGO alloc/total (1741.8 CGO/sec), 189.1/172.8 %(u/s)time, 0.0 %gc (42x), 927 MiB/5.0 MiB (r/w)net
I241003 12:20:35.014458 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 120 runtime stats: 2.7 GiB RSS, 1673 goroutines (stacks: 17 MiB), 474 MiB/2.1 GiB Go alloc/total (heap fragmentation: 114 MiB, heap reserved: 1.5 GiB, heap released: 256 MiB), 398 MiB/440 MiB CGO alloc/total (1636.5 CGO/sec), 178.6/166.9 %(u/s)time, 0.1 %gc (48x), 886 MiB/4.5 MiB (r/w)net
I241003 12:20:45.016764 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 122 runtime stats: 2.5 GiB RSS, 1674 goroutines (stacks: 22 MiB), 1.6 GiB/1.9 GiB Go alloc/total (heap fragmentation: 70 MiB, heap reserved: 265 MiB, heap released: 442 MiB), 464 MiB/509 MiB CGO alloc/total (1653.8 CGO/sec), 179.2/171.0 %(u/s)time, 0.1 %gc (54x), 976 MiB/4.9 MiB (r/w)net
I241003 12:20:55.025204 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 130 runtime stats: 3.2 GiB RSS, 1717 goroutines (stacks: 25 MiB), 2.0 GiB/2.5 GiB Go alloc/total (heap fragmentation: 62 MiB, heap reserved: 346 MiB, heap released: 72 MiB), 645 MiB/704 MiB CGO alloc/total (2426.1 CGO/sec), 162.0/164.5 %(u/s)time, 0.0 %gc (59x), 809 MiB/4.4 MiB (r/w)net
I241003 12:21:05.019299 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 133 runtime stats: 4.0 GiB RSS, 1716 goroutines (stacks: 25 MiB), 1.8 GiB/2.5 GiB Go alloc/total (heap fragmentation: 60 MiB, heap reserved: 531 MiB, heap released: 70 MiB), 1.3 GiB/1.4 GiB CGO alloc/total (6643.1 CGO/sec), 185.2/177.5 %(u/s)time, 0.0 %gc (65x), 817 MiB/6.8 MiB (r/w)net
I241003 12:21:15.022883 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 135 runtime stats: 4.1 GiB RSS, 1711 goroutines (stacks: 24 MiB), 1.1 GiB/2.1 GiB Go alloc/total (heap fragmentation: 71 MiB, heap reserved: 863 MiB, heap released: 638 MiB), 1.6 GiB/1.9 GiB CGO alloc/total (6204.3 CGO/sec), 184.8/176.3 %(u/s)time, 0.0 %gc (71x), 802 MiB/15 MiB (r/w)net
I241003 12:21:25.024510 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 138 runtime stats: 4.0 GiB RSS, 1714 goroutines (stacks: 25 MiB), 1.4 GiB/1.9 GiB Go alloc/total (heap fragmentation: 63 MiB, heap reserved: 402 MiB, heap released: 820 MiB), 1.7 GiB/2.1 GiB CGO alloc/total (6264.9 CGO/sec), 146.4/151.8 %(u/s)time, 0.0 %gc (77x), 716 MiB/8.1 MiB (r/w)net
I241003 12:21:35.028061 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 143 runtime stats: 5.4 GiB RSS, 1749 goroutines (stacks: 26 MiB), 2.6 GiB/2.7 GiB Go alloc/total (heap fragmentation: 58 MiB, heap reserved: 28 MiB, heap released: 1.0 MiB), 1.9 GiB/2.7 GiB CGO alloc/total (8533.7 CGO/sec), 179.0/183.2 %(u/s)time, 0.0 %gc (83x), 1.1 GiB/5.0 MiB (r/w)net
I241003 12:21:45.030811 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 145 runtime stats: 5.8 GiB RSS, 1768 goroutines (stacks: 26 MiB), 1.5 GiB/3.0 GiB Go alloc/total (heap fragmentation: 79 MiB, heap reserved: 1.4 GiB, heap released: 34 MiB), 2.0 GiB/2.8 GiB CGO alloc/total (12846.7 CGO/sec), 201.2/200.8 %(u/s)time, 0.0 %gc (89x), 1.2 GiB/4.1 MiB (r/w)net
I241003 12:21:55.039574 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 147 runtime stats: 5.0 GiB RSS, 1766 goroutines (stacks: 21 MiB), 884 MiB/2.6 GiB Go alloc/total (heap fragmentation: 78 MiB, heap reserved: 1.7 GiB, heap released: 605 MiB), 2.1 GiB/2.3 GiB CGO alloc/total (12203.3 CGO/sec), 192.7/198.2 %(u/s)time, 0.1 %gc (96x), 1.3 GiB/4.0 MiB (r/w)net
I241003 12:22:05.036282 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 150 runtime stats: 4.8 GiB RSS, 1768 goroutines (stacks: 27 MiB), 1.7 GiB/2.5 GiB Go alloc/total (heap fragmentation: 67 MiB, heap reserved: 781 MiB, heap released: 708 MiB), 2.1 GiB/2.2 GiB CGO alloc/total (13158.7 CGO/sec), 186.9/197.2 %(u/s)time, 0.1 %gc (101x), 1.0 GiB/3.9 MiB (r/w)net
I241003 12:22:15.044622 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 152 runtime stats: 5.2 GiB RSS, 1765 goroutines (stacks: 19 MiB), 590 MiB/3.0 GiB Go alloc/total (heap fragmentation: 112 MiB, heap reserved: 2.2 GiB, heap released: 279 MiB), 2.0 GiB/2.1 GiB CGO alloc/total (17985.9 CGO/sec), 188.2/205.8 %(u/s)time, 0.1 %gc (107x), 1.1 GiB/3.7 MiB (r/w)net
I241003 12:22:25.051695 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 154 runtime stats: 5.1 GiB RSS, 1766 goroutines (stacks: 27 MiB), 2.9 GiB/3.0 GiB Go alloc/total (heap fragmentation: 53 MiB, heap reserved: 25 MiB, heap released: 204 MiB), 1.9 GiB/2.0 GiB CGO alloc/total (31842.0 CGO/sec), 205.4/219.8 %(u/s)time, 0.0 %gc (111x), 1.1 GiB/4.3 MiB (r/w)net
I241003 12:22:35.058734 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 159 runtime stats: 5.3 GiB RSS, 1771 goroutines (stacks: 24 MiB), 1.1 GiB/3.2 GiB Go alloc/total (heap fragmentation: 79 MiB, heap reserved: 2.0 GiB, heap released: 349 MiB), 2.0 GiB/2.1 GiB CGO alloc/total (17486.3 CGO/sec), 218.0/219.0 %(u/s)time, 0.0 %gc (117x), 1.2 GiB/6.7 MiB (r/w)net
I241003 12:22:45.076458 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 161 runtime stats: 5.1 GiB RSS, 1774 goroutines (stacks: 21 MiB), 773 MiB/2.8 GiB Go alloc/total (heap fragmentation: 88 MiB, heap reserved: 1.9 GiB, heap released: 724 MiB), 2.1 GiB/2.2 GiB CGO alloc/total (13290.0 CGO/sec), 188.3/200.7 %(u/s)time, 0.0 %gc (123x), 1.0 GiB/6.6 MiB (r/w)net
I241003 12:22:55.074632 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 163 runtime stats: 5.3 GiB RSS, 1774 goroutines (stacks: 27 MiB), 1.6 GiB/3.1 GiB Go alloc/total (heap fragmentation: 78 MiB, heap reserved: 1.4 GiB, heap released: 399 MiB), 2.0 GiB/2.2 GiB CGO alloc/total (36311.6 CGO/sec), 201.7/205.5 %(u/s)time, 0.0 %gc (127x), 942 MiB/6.2 MiB (r/w)net
I241003 12:23:05.046807 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 169 runtime stats: 5.5 GiB RSS, 1770 goroutines (stacks: 27 MiB), 2.0 GiB/3.2 GiB Go alloc/total (heap fragmentation: 68 MiB, heap reserved: 1.1 GiB, heap released: 327 MiB), 2.1 GiB/2.2 GiB CGO alloc/total (35743.1 CGO/sec), 193.6/193.9 %(u/s)time, 0.0 %gc (131x), 847 MiB/5.5 MiB (r/w)net
I241003 12:23:15.098309 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 173 runtime stats: 5.1 GiB RSS, 1767 goroutines (stacks: 27 MiB), 1.9 GiB/2.9 GiB Go alloc/total (heap fragmentation: 68 MiB, heap reserved: 844 MiB, heap released: 702 MiB), 2.1 GiB/2.2 GiB CGO alloc/total (32728.1 CGO/sec), 187.0/192.3 %(u/s)time, 0.0 %gc (135x), 899 MiB/5.8 MiB (r/w)net
I241003 12:23:25.082783 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 175 runtime stats: 5.3 GiB RSS, 1780 goroutines (stacks: 28 MiB), 2.3 GiB/2.8 GiB Go alloc/total (heap fragmentation: 65 MiB, heap reserved: 481 MiB, heap released: 720 MiB), 2.3 GiB/2.4 GiB CGO alloc/total (20698.2 CGO/sec), 177.5/189.3 %(u/s)time, 0.0 %gc (139x), 838 MiB/4.5 MiB (r/w)net
I241003 12:23:35.109772 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 177 runtime stats: 5.5 GiB RSS, 1780 goroutines (stacks: 28 MiB), 2.6 GiB/2.9 GiB Go alloc/total (heap fragmentation: 64 MiB, heap reserved: 165 MiB, heap released: 642 MiB), 2.4 GiB/2.5 GiB CGO alloc/total (64439.7 CGO/sec), 241.2/240.0 %(u/s)time, 0.0 %gc (144x), 1.0 GiB/6.6 MiB (r/w)net
I241003 12:23:45.093370 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 181 runtime stats: 6.7 GiB RSS, 1810 goroutines (stacks: 28 MiB), 2.9 GiB/4.0 GiB Go alloc/total (heap fragmentation: 56 MiB, heap reserved: 933 MiB, heap released: 4.0 MiB), 2.7 GiB/2.9 GiB CGO alloc/total (25762.5 CGO/sec), 201.3/213.6 %(u/s)time, 0.0 %gc (149x), 1.1 GiB/4.3 MiB (r/w)net
I241003 12:23:55.203032 318 2@server/status/runtime_log.go:43 ⋮ [T1,Vsystem,n30] 185 runtime stats: 6.8 GiB RSS, 1810 goroutines (stacks: 28 MiB), 2.4 GiB/3.6 GiB Go alloc/total (heap fragmentation: 64 MiB, heap reserved: 1.1 GiB, heap released: 411 MiB), 3.1 GiB/3.3 GiB CGO alloc/total (74758.0 CGO/sec), 241.8/264.3 %(u/s)time, 0.0 %gc (153x), 1.1 GiB/4.0 MiB (r/w)net
dmesg log
[Thu Oct 3 12:23:57 2024] cockroach invoked oom-killer: gfp_mask=0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO), order=0, oom_score_adj=0
[Thu Oct 3 12:23:57 2024] CPU: 6 PID: 16433 Comm: cockroach Not tainted 6.5.0-1016-gcp #16~22.04.1-Ubuntu
[Thu Oct 3 12:23:57 2024] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
[Thu Oct 3 12:23:57 2024] Call Trace:
[Thu Oct 3 12:23:57 2024]
[Thu Oct 3 12:23:57 2024] dump_stack_lvl+0x48/0x70
[Thu Oct 3 12:23:57 2024] dump_stack+0x10/0x20
[Thu Oct 3 12:23:57 2024] dump_header+0x50/0x270
[Thu Oct 3 12:23:57 2024] oom_kill_process+0x10d/0x1c0
[Thu Oct 3 12:23:57 2024] out_of_memory+0x103/0x340
[Thu Oct 3 12:23:57 2024] __alloc_pages_may_oom+0x112/0x1e0
[Thu Oct 3 12:23:57 2024] __alloc_pages_slowpath.constprop.0+0x462/0x9d0
[Thu Oct 3 12:23:57 2024] __alloc_pages+0x304/0x330
[Thu Oct 3 12:23:57 2024] __folio_alloc+0x1d/0x60
[Thu Oct 3 12:23:57 2024] ? policy_node+0x69/0x80
[Thu Oct 3 12:23:57 2024] vma_alloc_folio+0x9f/0x3d0
[Thu Oct 3 12:23:57 2024] ? task_tick_fair+0x87/0x690
[Thu Oct 3 12:23:57 2024] do_anonymous_page+0x76/0x350
[Thu Oct 3 12:23:57 2024] handle_pte_fault+0x16e/0x170
[Thu Oct 3 12:23:57 2024] __handle_mm_fault+0x666/0x730
[Thu Oct 3 12:23:57 2024] handle_mm_fault+0x14e/0x360
[Thu Oct 3 12:23:57 2024] do_user_addr_fault+0x14b/0x670
[Thu Oct 3 12:23:57 2024] exc_page_fault+0x83/0x190
[Thu Oct 3 12:23:57 2024] asm_exc_page_fault+0x27/0x30
[Thu Oct 3 12:23:57 2024] RIP: 0033:0x4d9a6b
[Thu Oct 3 12:23:57 2024] Code: 1f f8 c3 f3 44 0f 7f 3f f3 44 0f 7f 7c 1f f0 c3 f3 44 0f 7f 3f f3 44 0f 7f 7f 10 f3 44 0f 7f 7c 1f e0 f3 44 0f 7f 7c 1f f0 c3 44 0f 7f 3f f3 44 0f 7f 7f 10 f3 44 0f 7f 7f 20 f3 44 0f 7f 7f
[Thu Oct 3 12:23:57 2024] RSP: 002b:000000c01535b630 EFLAGS: 00010287
[Thu Oct 3 12:23:57 2024] RAX: 0000000000000000 RBX: 0000000000000076 RCX: 0000000000000000
[Thu Oct 3 12:23:57 2024] RDX: 000000c0f5002f8a RSI: 0000000000002800 RDI: 000000c0f5002f8a
[Thu Oct 3 12:23:57 2024] RBP: 000000c01535b698 R08: 0000000000000000 R09: 0000000000000001
[Thu Oct 3 12:23:57 2024] R10: 00007873c768d910 R11: 0000000000000000 R12: 000000c0f5000800
[Thu Oct 3 12:23:57 2024] R13: 0000000000000000 R14: 000000c014240c40 R15: 3fffffffffffffff
[Thu Oct 3 12:23:57 2024]
[Thu Oct 3 12:23:57 2024] Mem-Info:
[Thu Oct 3 12:23:57 2024] active_anon:257360 inactive_anon:1598281 isolated_anon:0
[Thu Oct 3 12:23:57 2024] Node 0 active_anon:1029440kB inactive_anon:6393124kB active_file:8424kB inactive_file:32592kB unevictable:27680kB isolated(anon):0kB isolated(file):0kB mapped:30676kB dirty:8016kB writeback:0kB shmem:1324kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 2048kB writeback_tmp:0kB kernel_stack:4304kB pagetables:22404kB sec_pagetables:0kB all_unreclaimable? no
[Thu Oct 3 12:23:57 2024] Node 0 DMA free:14336kB boost:0kB min:124kB low:152kB high:180kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15920kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Thu Oct 3 12:23:57 2024] lowmem_reserve[]: 0 2988 7905 7905 7905
[Thu Oct 3 12:23:57 2024] Node 0 DMA32 free:105436kB boost:57364kB min:82864kB low:89236kB high:95608kB reserved_highatomic:30720KB active_anon:312396kB inactive_anon:2528988kB active_file:588kB inactive_file:9588kB unevictable:0kB writepending:3236kB present:3126072kB managed:3060504kB mlocked:0kB bounce:0kB free_pcp:192kB local_pcp:0kB free_cma:0kB
[Thu Oct 3 12:23:57 2024] lowmem_reserve[]: 0 0 4916 4916 4916
[Thu Oct 3 12:23:57 2024] Node 0 Normal free:96324kB boost:94392kB min:136344kB low:146832kB high:157320kB reserved_highatomic:6144KB active_anon:715196kB inactive_anon:3864952kB active_file:6060kB inactive_file:21940kB unevictable:27680kB writepending:4936kB present:5242880kB managed:5042836kB mlocked:27680kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Thu Oct 3 12:23:57 2024] lowmem_reserve[]: 0 0 0 0 0
[Thu Oct 3 12:23:57 2024] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 3*4096kB (M) = 14336kB
[Thu Oct 3 12:23:57 2024] Node 0 DMA32: 471*4kB (UMEH) 269*8kB (UMEH) 6202*16kB (UMH) 70*32kB (UMH) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 105508kB
[Thu Oct 3 12:23:57 2024] Node 0 Normal: 1020*4kB (UME) 2132*8kB (UME) 4599*16kB (UMEH) 25*32kB (UMH) 6*64kB (MH) 1*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 96032kB
[Thu Oct 3 12:23:57 2024] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Thu Oct 3 12:23:57 2024] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Thu Oct 3 12:23:57 2024] 12916 total pagecache pages
[Thu Oct 3 12:23:57 2024] 0 pages in swap cache
[Thu Oct 3 12:23:57 2024] Free swap = 0kB
[Thu Oct 3 12:23:57 2024] Total swap = 0kB
[Thu Oct 3 12:23:57 2024] 2096218 pages RAM
[Thu Oct 3 12:23:57 2024] 0 pages HighMem/MovableOnly
[Thu Oct 3 12:23:57 2024] 66543 pages reserved
[Thu Oct 3 12:23:57 2024] 0 pages hwpoisoned
[Thu Oct 3 12:23:57 2024] Tasks state (memory values in pages):
[Thu Oct 3 12:23:57 2024] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[Thu Oct 3 12:23:57 2024] [ 166] 0 166 12297 1312 106496 0 -250 systemd-journal
[Thu Oct 3 12:23:57 2024] [ 207] 0 207 72337 6848 110592 0 -1000 multipathd
[Thu Oct 3 12:23:57 2024] [ 217] 0 217 2904 1147 65536 0 -1000 systemd-udevd
[Thu Oct 3 12:23:57 2024] [ 442] 100 442 4065 992 73728 0 0 systemd-network
[Thu Oct 3 12:23:57 2024] [ 445] 101 445 6385 1856 90112 0 0 systemd-resolve
[Thu Oct 3 12:23:57 2024] [ 560] 102 560 2151 960 57344 0 -900 dbus-daemon
[Thu Oct 3 12:23:57 2024] [ 577] 0 577 402671 2466 249856 0 0 google_osconfig
[Thu Oct 3 12:23:57 2024] [ 591] 0 591 8271 3104 102400 0 0 networkd-dispat
[Thu Oct 3 12:23:57 2024] [ 604] 104 604 55601 1376 81920 0 0 rsyslogd
[Thu Oct 3 12:23:57 2024] [ 618] 0 618 440722 3596 299008 0 -900 snapd
[Thu Oct 3 12:23:57 2024] [ 757] 0 757 569403 3034 266240 0 -999 google_guest_ag
[Thu Oct 3 12:23:57 2024] [ 767] 0 767 1555 544 53248 0 0 agetty
[Thu Oct 3 12:23:57 2024] [ 776] 0 776 1544 544 49152 0 0 agetty
[Thu Oct 3 12:23:57 2024] [ 784] 0 784 58861 1193 94208 0 0 polkitd
[Thu Oct 3 12:23:57 2024] [ 1016] 0 1016 4114 1248 77824 0 0 systemd-logind
[Thu Oct 3 12:23:57 2024] [ 9925] 0 9925 74021 1664 163840 0 0 packagekitd
[Thu Oct 3 12:23:57 2024] [ 13930] 113 13930 4730 807 57344 0 0 chronyd
[Thu Oct 3 12:23:57 2024] [ 13931] 113 13931 2648 433 57344 0 0 chronyd
[Thu Oct 3 12:23:57 2024] [ 14799] 0 14799 3859 1280 65536 0 -1000 sshd
[Thu Oct 3 12:23:57 2024] [ 15997] 0 15997 177348 4573 196608 0 0 side-eye-agent
[Thu Oct 3 12:23:57 2024] [ 16348] 1000 16348 1941 544 57344 0 0 bash
[Thu Oct 3 12:23:57 2024] [ 16353] 1000 16353 3282142 1839794 20742144 0 0 cockroach
[Thu Oct 3 12:23:57 2024] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=cockroach-system.service,mems_allowed=0,global_oom,task_memcg=/system.slice/cockroach-system.service,task=cockroach,pid=16353,uid=1000
[Thu Oct 3 12:23:57 2024] Out of memory: Killed process 16353 (cockroach) total-vm:13128568kB, anon-rss:7335368kB, file-rss:23680kB, shmem-rss:0kB, UID:1000 pgtables:20256kB oom_score_adj:0
|
Adding storage and bumping to a P-1 since this is a process crash during a normal operation. Attaching memory profiles and stats. |
Reproduction steps using roachperf (This doesn't quite reproduce but shows a large memory spike).
|
roachtest.perturbation/metamorphic/backfill failed with artifacts on master @ f842c3b4b5adc040d411bd17d7d10005273fc1b6:
Parameters:
|
roachtest.perturbation/metamorphic/backfill failed with artifacts on master @ dcce4cafa234525fc859d32745c11ed87890dc7b:
Parameters:
|
@andrewbaptist what is the expectation regarding #131713 (comment)? AC/Storage doesn't in general investigate OOMs (AC has no awareness of memory) unless there is a clear sign that cgo memory usage was too high. The profile only shows 1GB of memory. How much memory were these nodes provisioned with? |
I'll have to run again to get a memory profile, but I believe the memory is all in Raft / Storage and encoding/decoding of Batch and Raft protobufs. This was a simple KV workload with relatively low concurrency, so it shouldn't have used much memory above. This is a metamorphic test so it runs with a different number of CPUS (and therefore memory) with each run. In the reproduction steps above it was only 4 vCPU. I'm not sure who should own this, but it was concerning that we had an OOM with this test. |
roachtest.perturbation/metamorphic/backfill failed with artifacts on master @ fd4b1464dbd6e385c6e51af26fe294fd2023a259:
Parameters:
|
roachtest.perturbation/metamorphic/backfill failed with artifacts on master @ 645eb8c99796b3b88f5631aa0fc92a011010ce64:
Parameters:
|
roachtest.perturbation/metamorphic/backfill failed with artifacts on master @ 5be5b0b52ff79b98689b2282a8b25cf9eb50ec40:
Parameters:
|
roachtest.perturbation/metamorphic/backfill failed with artifacts on master @ 42f40f59cae3c0fd8842e194d6991c951ab4382f:
Parameters:
|
roachtest.perturbation/metamorphic/backfill failed with artifacts on master @ 833dadd212fa4b12b1442ae8e00e85ee80a8cdce:
Parameters:
|
133115: roachtest: change to use standard memory configuration r=arulajmani a=andrewbaptist Previously the perturbation/* roachtests were configured with low memory configurations. This resulted in OOMs for backfill tests. This change makes the memory configuration a metamorphic parameter but excludes low memory configurations. The perturbation/full tests are run with standard memory. Informs: #133114 Fixes: #133086 Fixes: #131713 Epic: none Release note: None Co-authored-by: Andrew Baptist <[email protected]>
Based on the specified backports for linked PR #133115, I applied the following new label(s) to this issue: branch-release-24.3. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Previously the perturbation/* roachtests were configured with low memory configurations. This resulted in OOMs for backfill tests. This change makes the memory configuration a metamorphic parameter but excludes low memory configurations. The perturbation/full tests are run with standard memory. Informs: #133114 Fixes: #133086 Fixes: #131713 Epic: none Release note: None
roachtest.perturbation/metamorphic/backfill failed with artifacts on master @ 74333311616b937fea6a995462215a1cb5962686:
Parameters:
ROACHTEST_arch=amd64
ROACHTEST_cloud=gce
ROACHTEST_coverageBuild=false
ROACHTEST_cpu=32
ROACHTEST_encrypted=false
ROACHTEST_fs=ext4
ROACHTEST_localSSD=true
ROACHTEST_runtimeAssertionsBuild=false
ROACHTEST_ssd=2
Help
See: roachtest README
See: How To Investigate (internal)
See: Grafana
This test on roachdash | Improve this report!
Jira issue: CRDB-42656
The text was updated successfully, but these errors were encountered: