Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calico netatop exception cause node restart "exception RIP: kmem_cache_alloc" #7974

Closed
ming12713 opened this issue Aug 30, 2023 · 3 comments
Closed

Comments

@ming12713
Copy link

Hello, i have fews baremeta server frequent unexpected restart. An analysis using kdump indicates that the abnormal triggering of Calico is the cause.
Screenshot from 2023-08-30 11-07-29

dmesg error

[ 2376.974624] IPv6: ADDRCONF(NETDEV_CHANGE): cali9bef92c240d: link becomes ready
[ 2379.510962] general protection fault, probably for non-canonical address 0x8b30ba20faa16f31: 0000 [#1] SMP NOPTI                                                                                                
[ 2379.511009] CPU: 17 PID: 339161 Comm: calico Kdump: loaded Tainted: G           OE     5.15.0-79-generic #86-Ubuntu                                                                                             
[ 2379.511041] Hardware name: Dell Inc. PowerEdge R650/0Y2G81, BIOS 1.8.2 09/14/2022
[ 2379.511062] RIP: 0010:kmem_cache_alloc+0xfd/0x2f0                                                                                                                                                               [ 2379.511083] Code: 8b 50 08 49 8b 00 49 83 78 10 00 48 89 45 c8 0f 84 96 01 00 00 48 85 c0 0f 84 8d 01 00 00 41 8b 4c 24 28 49 8b 3c 24 48 01 c1 <48> 8b 19 48 89 ce 49 33 9c 24 b8 00 00 00 48 8d 4a 01 48 0f ce
 48                                                                                                      
[ 2379.511136] RSP: 0018:ff5e9a946e83b820 EFLAGS: 00010092                       
[ 2379.511154] RAX: 8b30ba20faa16ef9 RBX: 0000000000000078 RCX: 8b30ba20faa16f31 
[ 2379.511176] RDX: 0000000000000170 RSI: 0000000000000a20 RDI: 007bb5963f23d8e0 
[ 2379.511197] RBP: ff5e9a946e83b860 R08: ff909a943e43d8e0 R09: ff14e4cec6f00000 
[ 2379.511217] R10: 0000000000052cd9 R11: 0000000000000006 R12: ff14e4ee4ddd2d00 
[ 2379.511238] R13: 0000000000000000 R14: 0000000000000a20 R15: 0000000000000a20
[ 2379.511258] FS:  000000c000780490(0000) GS:ff14e4fdff200000(0000) knlGS:0000000000000000                                                                                                                        [ 2379.511282] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                                                                                   [ 2379.511299] CR2: 000000c001007740 CR3: 00000024457ae003 CR4: 0000000000771ee0                                                                                                                                   [ 2379.511320] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                                                                                                                   [ 2379.511341] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400                                                                                                                                   
[ 2379.511361] PKRU: 55555554                                                                                                                                                                                      [ 2379.511371] Call Trace:                                                                                                                                                                                         [ 2379.511381]  <TASK>                                                                                                                                                                                             [ 2379.511391]  ? get_taskinfo+0xac/0x1b0 [netatop]                                                                                                                                                                
[ 2379.511411]  get_taskinfo+0xac/0x1b0 [netatop]                                                                                                                                                                  
[ 2379.511427]  sock2task+0x1ac/0x480 [netatop]
[ 2379.511443]  analyze_tcpv4_packet+0x1bd/0x210 [netatop]
[ 2379.511465]  ipv4_hookout+0x86/0xf0 [netatop]
[ 2379.511482]  nf_hook_slow+0x41/0xc0
[ 2379.511501]  __ip_local_out+0xd6/0x150
[ 2379.511517]  ? ip_output+0x100/0x100
[ 2379.511533]  ip_local_out+0x1d/0x70
[ 2379.511548]  __ip_queue_xmit+0x184/0x440
[ 2379.511565]  ip_queue_xmit+0x15/0x20
[ 2379.511579]  __tcp_transmit_skb+0x910/0x9c0
[ 2379.511599]  tcp_write_xmit+0x3e9/0xb40
[ 2379.511615]  ? __check_object_size.part.0+0x4a/0x150
[ 2379.511638]  __tcp_push_pending_frames+0x37/0x100 
[ 2379.511657]  tcp_push+0xd9/0x110
[ 2379.511670]  tcp_sendmsg_locked+0x89a/0xc90
[ 2379.511687]  tcp_sendmsg+0x2d/0x50
[ 2379.512488]  inet_sendmsg+0x43/0x80
[ 2379.513269]  sock_sendmsg+0x62/0x70
[ 2379.514042]  sock_write_iter+0x93/0xf0
[ 2379.514786]  new_sync_write+0x18d/0x1a0
[ 2379.515505]  vfs_write+0x1d5/0x270
[ 2379.516196]  ksys_write+0xb5/0xf0
[ 2379.516860]  __x64_sys_write+0x19/0x20
[ 2379.511579]  __tcp_transmit_skb+0x910/0x9c0
[ 2379.511599]  tcp_write_xmit+0x3e9/0xb40
[ 2379.511615]  ? __check_object_size.part.0+0x4a/0x150
[ 2379.511638]  __tcp_push_pending_frames+0x37/0x100 
[ 2379.511657]  tcp_push+0xd9/0x110
[ 2379.511670]  tcp_sendmsg_locked+0x89a/0xc90
[ 2379.511687]  tcp_sendmsg+0x2d/0x50
[ 2379.512488]  inet_sendmsg+0x43/0x80
[ 2379.513269]  sock_sendmsg+0x62/0x70
[ 2379.514042]  sock_write_iter+0x93/0xf0
[ 2379.514786]  new_sync_write+0x18d/0x1a0
[ 2379.515505]  vfs_write+0x1d5/0x270
[ 2379.516196]  ksys_write+0xb5/0xf0
[ 2379.516860]  __x64_sys_write+0x19/0x20
[ 2379.517502]  do_syscall_64+0x59/0xc0
[ 2379.518119]  ? syscall_exit_to_user_mode+0x27/0x50
[ 2379.518720]  ? do_syscall_64+0x69/0xc0
[ 2379.519300]  ? do_syscall_64+0x69/0xc0
[ 2379.519852]  ? irqentry_exit+0x1d/0x30
[ 2379.520384]  ? sysvec_reschedule_ipi+0x78/0xe0
[ 2379.520909]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 2379.521423] RIP: 0033:0x403ace
[ 2379.521912] Code: 48 89 6c 24 38 48 8d 6c 24 38 e8 0d 00 00 00 48 8b 6c 24 38 48 83 c4 40 c3 cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff
 48
[ 2379.522950] RSP: 002b:000000c000ce9640 EFLAGS: 00000206 ORIG_RAX: 0000000000000001
[ 2379.523484] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 0000000000403ace
[ 2379.524017] RDX: 000000000000008f RSI: 000000c0002f4000 RDI: 0000000000000007
[ 2379.524546] RBP: 000000c000ce9680 R08: 0000000000000000 R09: 0000000000000000
[ 2379.525073] R10: 0000000000000000 R11: 0000000000000206 R12: 000000c000ce97c0
[ 2379.525619] R13: 0000000000000000 R14: 000000c000517ba0 R15: 000000c000780400
[ 2379.526129]  </TASK>
[ 2379.526622] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vxlan xt_set ipt_rpfilter ip_set_hash_ip ip_set_hash_net ip_
set xfrm_user xfrm_algo wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel xt_multiport veth nf_conntrack_netlink xt_recent xt
_nat xt_statistic xt_addrtype ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment nft_compat nft_counter nf_tables nf
netlink netatop(OE) sunrpc binfmt_misc nls_iso8859_1 xfs intel_rapl_msr intel_rapl_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm rapl dell_wmi ledtrig_audio sparse
_keymap intel_cstate video dell_smbios dcdbas dell_wmi_descriptor wmi_bmof isst_if_mbox_pci isst_if_mmio mei_me isst_if_common mei intel_pch_thermal acpi_ipmi
[ 2379.526686]  ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter mac_hid sch_fq_codel dm_multipath scsi_dh_rdac br_netfilter scsi_dh_emc scsi_dh_alua bridge stp llc overlay msr ramoops reed_solomon pstore_b
lk pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath lin
ear mlx5_ib ib_uverbs ib_core mgag200 i2c_algo_bit drm_kms_helper crct10dif_pclmul syscopyarea crc32_pclmul sysfillrect sysimgblt ghash_clmulni_intel fb_sys_fops mlx5_core cec aesni_intel mlxfw psample crypto_si
md i2c_i801 xhci_pci rc_core tls cryptd ahci drm megaraid_sas tg3 pci_hyperv_intf i2c_smbus libahci xhci_pci_renesas intel_pmt wmi


Your Environment

  • Calico version: v3.26.1
  • k8s : v1.26.6
  • Operating System and version: ubuntu 22.04
  • Link to your project (optional):
@ming12713
Copy link
Author

update calico-ipam hang coredump errors
Screenshot from 2023-08-30 13-42-45

@lwr20
Copy link
Member

lwr20 commented Aug 30, 2023

Isn't this a kernel bug? It should never be possible for a user-space program to cause the kernel to crash.

Can you raise with your kernel/distro provider please?

@ming12713
Copy link
Author

Isn't this a kernel bug? It should never be possible for a user-space program to cause the kernel to crash.

Can you raise with your kernel/distro provider please?

ok,thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants