Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARC write error makes pool unusable #15466

Open
heppu opened this issue Oct 30, 2023 · 3 comments
Open

ARC write error makes pool unusable #15466

heppu opened this issue Oct 30, 2023 · 3 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@heppu
Copy link

heppu commented Oct 30, 2023

System information

Type Version/Name
Distribution Name Voidlinux
Distribution Version latest
Kernel Version 6.5.8_1
Architecture amd64
OpenZFS Version 2.2.0

Describe the problem you're observing

Hole pool becomes unusable and can be only imported in readonly mode with zil replay disabled.

Describe how to reproduce the problem

This has happened to me on two different machines multiple times. It has now happened twice on one machine while storing data on smb share from windows machine. Today it happened again and I was able to capture dmesg output before machine became unresponsive.

Include any warning/errors/backtraces from the system logs

dmesg while issue occured:

[391177.318526] smbd(39990): Attempt to set a LOCK_MAND lock via flock(2). This support has been removed and the request ignored.
[401363.194859] BUG: kernel NULL pointer dereference, address: 0000000000000000
[401363.195221] #PF: supervisor read access in kernel mode
[401363.195480] #PF: error_code(0x0000) - not-present page
[401363.195737] PGD 0 P4D 0 
[401363.195996] Oops: 0000 [#1] PREEMPT SMP NOPTI
[401363.196255] CPU: 64 PID: 92355 Comm: smbd Tainted: P           OE      6.5.8_1 #1
[401363.196522] Hardware name: ASUS System Product Name/Pro WS WRX80E-SAGE SE WIFI, BIOS 1106 02/10/2023
[401363.196795] RIP: 0010:arc_write+0x6c/0x490 [zfs]
[401363.197338] Code: 7a 40 48 89 b5 50 ff ff ff 41 8b 72 30 4d 8b 5a 20 48 89 95 60 ff ff ff 4d 8b 42 28 41 8b 12 48 89 8d 58 ff ff ff 45 8b 72 38 <49> 8b 1c 24 89 b5 4c ff ff ff 48 89 bd 40 ff ff ff 65 48 8b 0c 25
[401363.197913] RSP: 0018:ffffae54a3d33a18 EFLAGS: 00010286
[401363.198222] RAX: ffffae54a3d33b80 RBX: 00000000000147bc RCX: ffffae57047af3f8
[401363.198521] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffae54a3d33b60
[401363.198820] RBP: ffffae54a3d33ae8 R08: ffff9b397e533760 R09: 0000000000000000
[401363.199122] R10: ffffae54a3d33af8 R11: ffffffffc1137c40 R12: 0000000000000000
[401363.199426] R13: 0000000000000000 R14: 0000000000000080 R15: ffff9b3dd424ba30
[401363.199733] FS:  00007f86b10146c0(0000) GS:ffff9b9573e00000(0000) knlGS:0000000000000000
[401363.200061] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[401363.200372] CR2: 0000000000000000 CR3: 0000000436b78000 CR4: 0000000000350ee0
[401363.200689] Call Trace:
[401363.201000]  <TASK>
[401363.201309]  ? __die+0x23/0x70
[401363.201623]  ? page_fault_oops+0x159/0x460
[401363.201939]  ? srso_return_thunk+0x5/0x10
[401363.202255]  ? _raw_spin_lock+0x17/0x40
[401363.202570]  ? zio_add_child_first+0x112/0x130 [zfs]
[401363.203160]  ? srso_return_thunk+0x5/0x10
[401363.203472]  ? do_user_addr_fault+0x69/0x640
[401363.203784]  ? exc_page_fault+0x77/0x170
[401363.204123]  ? asm_exc_page_fault+0x26/0x30
[401363.204474]  ? __pfx_dmu_sync_done+0x10/0x10 [zfs]
[401363.205081]  ? arc_write+0x6c/0x490 [zfs]
[401363.205655]  ? __pfx_dmu_sync_ready+0x10/0x10 [zfs]
[401363.206231]  ? srso_return_thunk+0x5/0x10
[401363.206529]  ? __kmem_cache_alloc_node+0x16d/0x2d0
[401363.206825]  ? spl_kmem_alloc+0xf1/0x130 [spl]
[401363.207135]  dmu_sync+0x3ce/0x520 [zfs]
[401363.207693]  ? __pfx_dmu_sync_ready+0x10/0x10 [zfs]
[401363.208245]  ? __pfx_dmu_sync_done+0x10/0x10 [zfs]
[401363.208794]  ? __pfx_zfs_get_done+0x10/0x10 [zfs]
[401363.209440]  zfs_get_data+0x330/0x400 [zfs]
[401363.209989]  zil_lwb_write_issue+0xa79/0xd20 [zfs]
[401363.210553]  zil_commit_impl+0x21f/0x1330 [zfs]
[401363.211111]  zfs_fsync+0x95/0x130 [zfs]
[401363.211655]  zpl_fsync+0x107/0x190 [zfs]
[401363.212190]  ? srso_return_thunk+0x5/0x10
[401363.212474]  __x64_sys_fsync+0x3b/0x70
[401363.212756]  do_syscall_64+0x5f/0x90
[401363.213032]  ? srso_return_thunk+0x5/0x10
[401363.213306]  ? syscall_exit_to_user_mode+0x2b/0x40
[401363.213582]  ? srso_return_thunk+0x5/0x10
[401363.213854]  ? do_syscall_64+0x6b/0x90
[401363.214125]  ? srso_return_thunk+0x5/0x10
[401363.214396]  ? do_syscall_64+0x6b/0x90
[401363.214662]  ? srso_return_thunk+0x5/0x10
[401363.214926]  ? do_syscall_64+0x6b/0x90
[401363.215181]  ? srso_return_thunk+0x5/0x10
[401363.215427]  ? do_syscall_64+0x6b/0x90
[401363.215669]  ? do_syscall_64+0x6b/0x90
[401363.215906]  ? do_syscall_64+0x6b/0x90
[401363.216141]  ? do_syscall_64+0x6b/0x90
[401363.216371]  ? do_syscall_64+0x6b/0x90
[401363.216600]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[401363.216831] RIP: 0033:0x7f86b5d51c2a
[401363.217056] Code: 48 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c 24 0c e8 33 65 f8 ff 8b 7c 24 0c 89 c2 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 36 89 d7 89 44 24 0c e8 93 65 f8 ff 8b 44 24
[401363.217538] RSP: 002b:00007f86b1013b90 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[401363.217782] RAX: ffffffffffffffda RBX: 000055bf8ea042f0 RCX: 00007f86b5d51c2a
[401363.218027] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000004f6
[401363.218274] RBP: 00007f86b1013c00 R08: 0000000000000000 R09: 000055bf8e7dd340
[401363.218524] R10: 0000000000000000 R11: 0000000000000293 R12: 000055bf8e7dd318
[401363.218774] R13: 00007f86b5a4cf00 R14: 000055bf8e909670 R15: 000055bf8e7dd2e0
[401363.219033]  </TASK>
[401363.219275] Modules linked in: rpcsec_gss_krb5 xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables x_tables br_netfilter bridge overlay nfsd nfs auth_rpcgss nfs_acl 8021q garp lockd netfs mrp stp llc grace fscache ipmi_ssif intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd eeepc_wmi asus_wmi kvm battery iwlmvm ledtrig_audio irqbypass sparse_keymap platform_profile rapl video wmi_bmof acpi_cpufreq pcspkr mac80211 libarc4 snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_usb_audio snd_hda_codec iwlwifi snd_usbmidi_lib btusb snd_hda_core snd_rawmidi btrtl btbcm mlx4_ib mc snd_hwdep ch341 btintel cfg80211 snd_pcm ib_uverbs input_leds btmtk usbserial joydev acpi_ipmi ipmi_si k10temp i2c_piix4 ipmi_devintf tpm_crb evdev mac_hid ipmi_msghandler i2c_designware_platform tpm_tis tpm_tis_core i2c_designware_core tiny_power_button tpm rpcrdma rdma_cm iw_cm configfs ib_ipoib
[401363.219441]  ib_cm ib_umad ib_core snd_seq snd_seq_device snd_timer snd soundcore vhost_vsock vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb tap hci_vhci bluetooth ecdh_generic rfkill ecc crc16 vfio_iommu_type1 vfio iommufd uhid dm_mod uinput userio ppp_generic slhc tun loop nvram btrfs blake2b_generic xor raid6_pq libcrc32c crc32c_generic cuse fuse hid_multitouch hid_generic crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel usbhid sha512_ssse3 hid aesni_intel ast sd_mod crypto_simd cryptd i2c_algo_bit mpt3sas xhci_pci ahci drm_shmem_helper xhci_pci_renesas libahci drm_kms_helper zfs(POE) xhci_hcd mxm_wmi raid_class libata mlx4_core ixgbe drm ccp scsi_transport_sas spl(OE) usbcore rng_core xfrm_algo agpgart scsi_mod dca mdio sp5100_tco usb_common scsi_common wmi button sunrpc
[401363.224153] CR2: 0000000000000000
[401363.224460] ---[ end trace 0000000000000000 ]---
[401363.238814] pstore: backend (erst) writing error (-28)
[401363.239135] RIP: 0010:arc_write+0x6c/0x490 [zfs]
[401363.239715] Code: 7a 40 48 89 b5 50 ff ff ff 41 8b 72 30 4d 8b 5a 20 48 89 95 60 ff ff ff 4d 8b 42 28 41 8b 12 48 89 8d 58 ff ff ff 45 8b 72 38 <49> 8b 1c 24 89 b5 4c ff ff ff 48 89 bd 40 ff ff ff 65 48 8b 0c 25
[401363.240377] RSP: 0018:ffffae54a3d33a18 EFLAGS: 00010286
[401363.240713] RAX: ffffae54a3d33b80 RBX: 00000000000147bc RCX: ffffae57047af3f8
[401363.241052] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffae54a3d33b60
[401363.241393] RBP: ffffae54a3d33ae8 R08: ffff9b397e533760 R09: 0000000000000000
[401363.241735] R10: ffffae54a3d33af8 R11: ffffffffc1137c40 R12: 0000000000000000
[401363.242078] R13: 0000000000000000 R14: 0000000000000080 R15: ffff9b3dd424ba30
[401363.242424] FS:  00007f86b10146c0(0000) GS:ffff9b9573e00000(0000) knlGS:0000000000000000
[401363.242774] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[401363.243123] CR2: 0000000000000000 CR3: 0000000436b78000 CR4: 0000000000350ee0
[401363.243477] note: smbd[92355] exited with irqs disabled
[401365.603326] BUG: kernel NULL pointer dereference, address: 0000000000000000
[401365.604476] #PF: supervisor read access in kernel mode
[401365.604955] #PF: error_code(0x0000) - not-present page
[401365.605387] PGD 0 P4D 0 
[401365.605819] Oops: 0000 [#2] PREEMPT SMP NOPTI
[401365.606242] CPU: 15 PID: 4701 Comm: dp_sync_taskq Tainted: P      D    OE      6.5.8_1 #1
[401365.606669] Hardware name: ASUS System Product Name/Pro WS WRX80E-SAGE SE WIFI, BIOS 1106 02/10/2023
[401365.607078] RIP: 0010:arc_write+0x6c/0x490 [zfs]
[401365.607774] Code: 7a 40 48 89 b5 50 ff ff ff 41 8b 72 30 4d 8b 5a 20 48 89 95 60 ff ff ff 4d 8b 42 28 41 8b 12 48 89 8d 58 ff ff ff 45 8b 72 38 <49> 8b 1c 24 89 b5 4c ff ff ff 48 89 bd 40 ff ff ff 65 48 8b 0c 25
[401365.608568] RSP: 0018:ffffae54b6caf950 EFLAGS: 00010286
[401365.608972] RAX: ffffae54b6cafad0 RBX: ffff9b3545a21c00 RCX: ffff9b3545a21c50
[401365.609371] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffffae54b6cafab0
[401365.609775] RBP: ffffae54b6cafa20 R08: ffff9b48f2698620 R09: 0000000000000000
[401365.610187] R10: ffffae54b6cafa30 R11: ffffffffc112c3d0 R12: 0000000000000000
[401365.610586] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[401365.610984] FS:  0000000000000000(0000) GS:ffff9b95731c0000(0000) knlGS:0000000000000000
[401365.611384] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[401365.611781] CR2: 0000000000000000 CR3: 00000041bc54a000 CR4: 0000000000350ee0
[401365.612173] Call Trace:
[401365.612566]  <TASK>
[401365.612921]  ? __die+0x23/0x70
[401365.613315]  ? page_fault_oops+0x159/0x460
[401365.613651]  ? srso_return_thunk+0x5/0x10
[401365.613978]  ? do_user_addr_fault+0x69/0x640
[401365.614305]  ? exc_page_fault+0x77/0x170
[401365.614627]  ? asm_exc_page_fault+0x26/0x30
[401365.614949]  ? __pfx_dbuf_write_done+0x10/0x10 [zfs]
[401365.615543]  ? arc_write+0x6c/0x490 [zfs]
[401365.616119]  ? srso_return_thunk+0x5/0x10
[401365.616430]  ? taskq_init_ent+0x3c/0x80 [spl]
[401365.616752]  ? srso_return_thunk+0x5/0x10
[401365.617056]  ? __pfx_dbuf_write_ready+0x10/0x10 [zfs]
[401365.617634]  ? srso_return_thunk+0x5/0x10
[401365.617940]  ? __slab_free+0xb4/0x2d0
[401365.618245]  ? srso_return_thunk+0x5/0x10
[401365.618548]  ? preempt_count_add+0x6e/0xa0
[401365.618849]  dbuf_write+0x397/0x580 [zfs]
[401365.619410]  ? __pfx_dbuf_write_ready+0x10/0x10 [zfs]
[401365.619969]  ? __pfx_dbuf_write_done+0x10/0x10 [zfs]
[401365.620527]  ? srso_return_thunk+0x5/0x10
[401365.620819]  ? srso_return_thunk+0x5/0x10
[401365.621108]  ? dbuf_hold_impl+0x109/0x750 [zfs]
[401365.621666]  dbuf_sync_leaf+0x139/0x6a0 [zfs]
[401365.622217]  ? srso_return_thunk+0x5/0x10
[401365.622501]  ? zio_nowait+0xb8/0x1a0 [zfs]
[401365.623054]  dbuf_sync_list+0xc3/0x120 [zfs]
[401365.623607]  dbuf_sync_indirect+0xe0/0x170 [zfs]
[401365.624151]  dbuf_sync_list+0x51/0x120 [zfs]
[401365.624690]  dnode_sync+0x513/0xb70 [zfs]
[401365.625250]  ? srso_return_thunk+0x5/0x10
[401365.625520]  ? preempt_count_add+0x6e/0xa0
[401365.625793]  ? preempt_count_add+0x6e/0xa0
[401365.626059]  ? srso_return_thunk+0x5/0x10
[401365.626326]  sync_dnodes_task+0x75/0xb0 [zfs]
[401365.626865]  taskq_thread+0x2c1/0x4e0 [spl]
[401365.627138]  ? __pfx_default_wake_function+0x10/0x10
[401365.627393]  ? __pfx_taskq_thread+0x10/0x10 [spl]
[401365.627648]  kthread+0xf7/0x130
[401365.627886]  ? __pfx_kthread+0x10/0x10
[401365.628122]  ret_from_fork+0x34/0x50
[401365.628355]  ? __pfx_kthread+0x10/0x10
[401365.628587]  ret_from_fork_asm+0x1b/0x30
[401365.628828]  </TASK>
[401365.629049] Modules linked in: rpcsec_gss_krb5 xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables x_tables br_netfilter bridge overlay nfsd nfs auth_rpcgss nfs_acl 8021q garp lockd netfs mrp stp llc grace fscache ipmi_ssif intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd eeepc_wmi asus_wmi kvm battery iwlmvm ledtrig_audio irqbypass sparse_keymap platform_profile rapl video wmi_bmof acpi_cpufreq pcspkr mac80211 libarc4 snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_usb_audio snd_hda_codec iwlwifi snd_usbmidi_lib btusb snd_hda_core snd_rawmidi btrtl btbcm mlx4_ib mc snd_hwdep ch341 btintel cfg80211 snd_pcm ib_uverbs input_leds btmtk usbserial joydev acpi_ipmi ipmi_si k10temp i2c_piix4 ipmi_devintf tpm_crb evdev mac_hid ipmi_msghandler i2c_designware_platform tpm_tis tpm_tis_core i2c_designware_core tiny_power_button tpm rpcrdma rdma_cm iw_cm configfs ib_ipoib
[401365.629216]  ib_cm ib_umad ib_core snd_seq snd_seq_device snd_timer snd soundcore vhost_vsock vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb tap hci_vhci bluetooth ecdh_generic rfkill ecc crc16 vfio_iommu_type1 vfio iommufd uhid dm_mod uinput userio ppp_generic slhc tun loop nvram btrfs blake2b_generic xor raid6_pq libcrc32c crc32c_generic cuse fuse hid_multitouch hid_generic crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel usbhid sha512_ssse3 hid aesni_intel ast sd_mod crypto_simd cryptd i2c_algo_bit mpt3sas xhci_pci ahci drm_shmem_helper xhci_pci_renesas libahci drm_kms_helper zfs(POE) xhci_hcd mxm_wmi raid_class libata mlx4_core ixgbe drm ccp scsi_transport_sas spl(OE) usbcore rng_core xfrm_algo agpgart scsi_mod dca mdio sp5100_tco usb_common scsi_common wmi button sunrpc
[401365.633518] CR2: 0000000000000000
[401365.633800] ---[ end trace 0000000000000000 ]---
[401365.633801] BUG: kernel NULL pointer dereference, address: 0000000000000000
[401365.634617] #PF: supervisor read access in kernel mode
[401365.634995] #PF: error_code(0x0000) - not-present page
[401365.635368] PGD 0 P4D 0 
[401365.635722] Oops: 0000 [#3] PREEMPT SMP NOPTI
[401365.636076] CPU: 101 PID: 4727 Comm: dp_sync_taskq Tainted: P      D    OE      6.5.8_1 #1
[401365.636437] Hardware name: ASUS System Product Name/Pro WS WRX80E-SAGE SE WIFI, BIOS 1106 02/10/2023
[401365.636805] RIP: 0010:arc_write+0x6c/0x490 [zfs]
[401365.637479] Code: 7a 40 48 89 b5 50 ff ff ff 41 8b 72 30 4d 8b 5a 20 48 89 95 60 ff ff ff 4d 8b 42 28 41 8b 12 48 89 8d 58 ff ff ff 45 8b 72 38 <49> 8b 1c 24 89 b5 4c ff ff ff 48 89 bd 40 ff ff ff 65 48 8b 0c 25
[401365.638254] RSP: 0018:ffffae54b6d7f950 EFLAGS: 00010286
[401365.638639] RAX: ffffae54b6d7fad0 RBX: ffff9b2bc1844600 RCX: ffff9b2bc1844650
[401365.639030] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffffae54b6d7fab0
[401365.639729] RBP: ffffae54b6d7fa20 R08: ffff9b38f43b7720 R09: 0000000000000000
[401365.640124] R10: ffffae54b6d7fa30 R11: ffffffffc112c3d0 R12: 0000000000000000
[401365.640521] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[401365.640917] FS:  0000000000000000(0000) GS:ffff9b9574740000(0000) knlGS:0000000000000000
[401365.641317] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[401365.641716] CR2: 0000000000000000 CR3: 00000001eebbe000 CR4: 0000000000350ee0
[401365.642123] Call Trace:
[401365.642523]  <TASK>
[401365.642919]  ? __die+0x23/0x70
[401365.643321]  ? page_fault_oops+0x159/0x460
[401365.643725]  ? srso_return_thunk+0x5/0x10
[401365.644427]  ? do_user_addr_fault+0x69/0x640
[401365.644830]  ? exc_page_fault+0x77/0x170
[401365.645231]  ? asm_exc_page_fault+0x26/0x30
[401365.645636]  ? __pfx_dbuf_write_done+0x10/0x10 [zfs]
[401365.646354]  ? arc_write+0x6c/0x490 [zfs]
[401365.647054]  ? srso_return_thunk+0x5/0x10
[401365.647441]  ? taskq_init_ent+0x3c/0x80 [spl]
[401365.647838]  ? srso_return_thunk+0x5/0x10
[401365.648212]  ? __pfx_dbuf_write_ready+0x10/0x10 [zfs]
[401365.648668] RIP: 0010:arc_write+0x6c/0x490 [zfs]
[401365.649201]  ? srso_return_thunk+0x5/0x10
[401365.649713] Code: 7a 40 48 89 b5 50 ff ff ff 41 8b 72 30 4d 8b 5a 20 48 89 95 60 ff ff ff 4d 8b 42 28 41 8b 12 48 89 8d 58 ff ff ff 45 8b 72 38 <49> 8b 1c 24 89 b5 4c ff ff ff 48 89 bd 40 ff ff ff 65 48 8b 0c 25
[401365.650250]  ? __slab_free+0xb4/0x2d0
[401365.651182] RSP: 0018:ffffae54a3d33a18 EFLAGS: 00010286
[401365.651753]  ? srso_return_thunk+0x5/0x10
[401365.652278] RAX: ffffae54a3d33b80 RBX: 00000000000147bc RCX: ffffae57047af3f8
[401365.652831]  ? preempt_count_add+0x6e/0xa0
[401365.653357] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffae54a3d33b60
[401365.653910]  dbuf_write+0x397/0x580 [zfs]
[401365.654417] RBP: ffffae54a3d33ae8 R08: ffff9b397e533760 R09: 0000000000000000
[401365.654963]  ? __pfx_dbuf_write_ready+0x10/0x10 [zfs]
[401365.655469] R10: ffffae54a3d33af8 R11: ffffffffc1137c40 R12: 0000000000000000
[401365.656010]  ? __pfx_dbuf_write_done+0x10/0x10 [zfs]
[401365.656511] R13: 0000000000000000 R14: 0000000000000080 R15: ffff9b3dd424ba30
[401365.657048]  ? srso_return_thunk+0x5/0x10
[401365.657546] FS:  0000000000000000(0000) GS:ffff9b95731c0000(0000) knlGS:0000000000000000
[401365.658087]  ? srso_return_thunk+0x5/0x10
[401365.658601] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[401365.659146]  ? dbuf_hold_impl+0x109/0x750 [zfs]
[401365.659648] CR2: 0000000000000000 CR3: 00000041bc54a000 CR4: 0000000000350ee0
[401365.660192]  dbuf_sync_leaf+0x139/0x6a0 [zfs]
[401365.660680] note: dp_sync_taskq[4701] exited with irqs disabled
[401365.661205]  ? srso_return_thunk+0x5/0x10
[401365.662181]  ? zio_nowait+0xb8/0x1a0 [zfs]
[401365.662500]  dbuf_sync_list+0xc3/0x120 [zfs]
[401365.662817]  dbuf_sync_indirect+0xe0/0x170 [zfs]
[401365.664205]  dbuf_sync_list+0x51/0x120 [zfs]
[401365.664513]  dnode_sync+0x513/0xb70 [zfs]
[401365.665590]  ? srso_return_thunk+0x5/0x10
[401365.665963]  ? preempt_count_add+0x6e/0xa0
[401365.665969]  ? preempt_count_add+0x6e/0xa0
[401365.665973]  ? srso_return_thunk+0x5/0x10
[401365.665981]  sync_dnodes_task+0x75/0xb0 [zfs]
[401365.667698]  taskq_thread+0x2c1/0x4e0 [spl]
[401365.667726]  ? __pfx_default_wake_function+0x10/0x10
[401365.667741]  ? __pfx_taskq_thread+0x10/0x10 [spl]
[401365.667761]  kthread+0xf7/0x130
[401365.667768]  ? __pfx_kthread+0x10/0x10
[401365.667774]  ret_from_fork+0x34/0x50
[401365.667779]  ? __pfx_kthread+0x10/0x10
[401365.667785]  ret_from_fork_asm+0x1b/0x30
[401365.667800]  </TASK>
[401365.667801] Modules linked in: rpcsec_gss_krb5 xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables x_tables br_netfilter bridge overlay nfsd nfs auth_rpcgss nfs_acl 8021q garp lockd netfs mrp stp llc grace fscache ipmi_ssif intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd eeepc_wmi asus_wmi kvm battery iwlmvm ledtrig_audio irqbypass sparse_keymap platform_profile rapl video wmi_bmof acpi_cpufreq pcspkr mac80211 libarc4 snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_usb_audio snd_hda_codec iwlwifi snd_usbmidi_lib btusb snd_hda_core snd_rawmidi btrtl btbcm mlx4_ib mc snd_hwdep ch341 btintel cfg80211 snd_pcm ib_uverbs input_leds btmtk usbserial joydev acpi_ipmi ipmi_si k10temp i2c_piix4 ipmi_devintf tpm_crb evdev mac_hid ipmi_msghandler i2c_designware_platform tpm_tis tpm_tis_core i2c_designware_core tiny_power_button tpm rpcrdma rdma_cm iw_cm configfs ib_ipoib
[401365.667957]  ib_cm ib_umad ib_core snd_seq snd_seq_device snd_timer snd soundcore vhost_vsock vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb tap hci_vhci bluetooth ecdh_generic rfkill ecc crc16 vfio_iommu_type1 vfio iommufd uhid dm_mod uinput userio ppp_generic slhc tun loop nvram btrfs blake2b_generic xor raid6_pq libcrc32c crc32c_generic cuse fuse hid_multitouch hid_generic crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel usbhid sha512_ssse3 hid aesni_intel ast sd_mod crypto_simd cryptd i2c_algo_bit mpt3sas xhci_pci ahci drm_shmem_helper xhci_pci_renesas libahci drm_kms_helper zfs(POE) xhci_hcd mxm_wmi raid_class libata mlx4_core ixgbe drm ccp scsi_transport_sas spl(OE) usbcore rng_core xfrm_algo agpgart scsi_mod dca mdio sp5100_tco usb_common scsi_common wmi button sunrpc
[401365.668092] CR2: 0000000000000000
[401365.668096] ---[ end trace 0000000000000000 ]---
[401365.680096] RIP: 0010:arc_write+0x6c/0x490 [zfs]
[401365.680366] Code: 7a 40 48 89 b5 50 ff ff ff 41 8b 72 30 4d 8b 5a 20 48 89 95 60 ff ff ff 4d 8b 42 28 41 8b 12 48 89 8d 58 ff ff ff 45 8b 72 38 <49> 8b 1c 24 89 b5 4c ff ff ff 48 89 bd 40 ff ff ff 65 48 8b 0c 25
[401365.680369] RSP: 0018:ffffae54a3d33a18 EFLAGS: 00010286
[401365.680373] RAX: ffffae54a3d33b80 RBX: 00000000000147bc RCX: ffffae57047af3f8
[401365.680376] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffae54a3d33b60
[401365.680378] RBP: ffffae54a3d33ae8 R08: ffff9b397e533760 R09: 0000000000000000
[401365.680380] R10: ffffae54a3d33af8 R11: ffffffffc1137c40 R12: 0000000000000000
[401365.680382] R13: 0000000000000000 R14: 0000000000000080 R15: ffff9b3dd424ba30
[401365.680385] FS:  0000000000000000(0000) GS:ffff9b9574740000(0000) knlGS:0000000000000000
[401365.680388] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[401365.680390] CR2: 0000000000000000 CR3: 00000001eebbe000 CR4: 0000000000350ee0
[401365.680393] note: dp_sync_taskq[4727] exited with irqs disabled

dmesg when trying to import after reboot

[  498.013646] BUG: unable to handle page fault for address: ffffb0f4a072006c
[  498.013671] #PF: supervisor read access in kernel mode
[  498.013683] #PF: error_code(0x0000) - not-present page
[  498.013693] PGD 100000067 P4D 100000067 PUD 1a4b86067 PMD 1b0d98067 PTE 0
[  498.013711] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  498.013721] CPU: 32 PID: 4873 Comm: dmu_objset_find Tainted: P           OE      6.5.8_1 #1
[  498.013737] Hardware name: ASUS System Product Name/Pro WS WRX80E-SAGE SE WIFI, BIOS 1106 02/10/2023
[  498.013753] RIP: 0010:zil_claim_log_record+0x47/0xd0 [zfs]
[  498.013913] Code: 83 f8 09 74 6c 48 83 f8 18 75 55 48 85 d2 74 50 48 83 7e 40 00 48 8b 6f 38 74 45 45 31 e4 31 d2 48 89 d0 48 c1 e0 07 48 01 d8 <f6> 40 7c 80 75 22 48 83 78 48 00 75 07 48 83 78 50 00 74 14 48 c1
[  498.013941] RSP: 0018:ffffb0f4ebe3b9d0 EFLAGS: 00010286
[  498.013953] RAX: ffffb0f4a071fff0 RBX: ffffb0f4a06e26f0 RCX: 0000000000000000
[  498.013967] RDX: 00000000000007b2 RSI: 0000000000000100 RDI: 00000000ffffffff
[  498.013980] RBP: ffff8ad51ef6c000 R08: 0000000000000001 R09: ffff8ad56c452088
[  498.013993] R10: ffff8ad4e8a71c08 R11: ffff8ad691ef9c50 R12: 00000000000007b2
[  498.014005] R13: ffff8ad68f6cb380 R14: ffffffffffffffff R15: 0000000000003abc
[  498.014018] FS:  0000000000000000(0000) GS:ffff8b53fd600000(0000) knlGS:0000000000000000
[  498.014033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  498.014044] CR2: ffffb0f4a072006c CR3: 0000007f89a20000 CR4: 0000000000350ee0
[  498.014057] Call Trace:
[  498.014267]  <TASK>
[  498.014431]  ? __die+0x23/0x70
[  498.014596]  ? page_fault_oops+0x159/0x460
[  498.014757]  ? srso_return_thunk+0x5/0x10
[  498.014911]  ? fixup_exception+0x26/0x310
[  498.015063]  ? srso_return_thunk+0x5/0x10
[  498.015217]  ? exc_page_fault+0xd1/0x170
[  498.015369]  ? asm_exc_page_fault+0x26/0x30
[  498.015523]  ? zil_claim_log_record+0x47/0xd0 [zfs]
[  498.015818]  zil_parse+0x54a/0x980 [zfs]
[  498.016105]  ? __pfx_zil_claim_log_record+0x10/0x10 [zfs]
[  498.016393]  ? __pfx_zil_claim_log_block+0x10/0x10 [zfs]
[  498.016694]  ? _raw_spin_lock+0x17/0x40
[  498.016848]  ? srso_return_thunk+0x5/0x10
[  498.016998]  ? zrl_exit+0x4c/0x60 [zfs]
[  498.017283]  ? srso_return_thunk+0x5/0x10
[  498.017432]  ? srso_return_thunk+0x5/0x10
[  498.017579]  ? dmu_objset_open_impl+0x5cf/0x9c0 [zfs]
[  498.017885]  ? srso_return_thunk+0x5/0x10
[  498.018036]  ? preempt_count_add+0x6e/0xa0
[  498.018186]  ? srso_return_thunk+0x5/0x10
[  498.018333]  ? _raw_spin_lock+0x17/0x40
[  498.018480]  ? srso_return_thunk+0x5/0x10
[  498.018627]  ? srso_return_thunk+0x5/0x10
[  498.018772]  ? preempt_count_add+0x6e/0xa0
[  498.018918]  ? srso_return_thunk+0x5/0x10
[  498.019064]  ? srso_return_thunk+0x5/0x10
[  498.019210]  ? srso_return_thunk+0x5/0x10
[  498.019354]  ? dmu_objset_from_ds+0x85/0x160 [zfs]
[  498.019644]  ? srso_return_thunk+0x5/0x10
[  498.019787]  ? dmu_objset_own_obj+0xa2/0xd0 [zfs]
[  498.020076]  zil_claim+0x115/0x290 [zfs]
[  498.020360]  dmu_objset_find_dp_impl+0x145/0x3e0 [zfs]
[  498.020651]  dmu_objset_find_dp_cb+0x29/0x40 [zfs]
[  498.020934]  taskq_thread+0x2c1/0x4e0 [spl]
[  498.021082]  ? __pfx_default_wake_function+0x10/0x10
[  498.021223]  ? __pfx_taskq_thread+0x10/0x10 [spl]
[  498.021360]  kthread+0xf7/0x130
[  498.021486]  ? __pfx_kthread+0x10/0x10
[  498.021612]  ret_from_fork+0x34/0x50
[  498.021736]  ? __pfx_kthread+0x10/0x10
[  498.021859]  ret_from_fork_asm+0x1b/0x30
[  498.021985]  </TASK>
[  498.022102] Modules linked in: 8021q garp mrp stp llc iwlmvm intel_rapl_msr intel_rapl_common ipmi_ssif amd64_edac edac_mce_amd snd_usb_audio mac80211 kvm_amd snd_usbmidi_lib snd_rawmidi ch341 mc libarc4 usbserial joydev input_leds kvm raid1 eeepc_wmi asus_wmi battery ledtrig_audio irqbypass md_mod sparse_keymap platform_profile iwlwifi snd_hda_intel video rapl snd_intel_dspcfg wmi_bmof btusb snd_intel_sdw_acpi acpi_cpufreq pcspkr snd_hda_codec btrtl cfg80211 btbcm snd_hda_core btintel acpi_ipmi btmtk evdev snd_hwdep ipmi_si mac_hid snd_pcm i2c_designware_platform tpm_crb i2c_designware_core tpm_tis tpm_tis_core ipmi_devintf k10temp i2c_piix4 tpm ipmi_msghandler tiny_power_button sg snd_seq snd_seq_device snd_timer snd soundcore vhost_vsock vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb tap hci_vhci bluetooth ecdh_generic rfkill ecc vfio_iommu_type1 vfio iommufd uhid uinput userio ppp_generic slhc tun nvram cuse fuse ext4 crc16 mbcache jbd2 dm_snapshot dm_bufio isofs cdrom squashfs hid_multitouch
[  498.022205]  uas usb_storage hid_generic usbhid hid crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel ast sd_mod sha512_ssse3 i2c_algo_bit aesni_intel nvme_tcp drm_shmem_helper xhci_pci crypto_simd cryptd zfs(POE) nvme_fabrics xhci_pci_renesas drm_kms_helper ahci xhci_hcd libahci mxm_wmi ccp drm mlx4_core spl(OE) mpt3sas ixgbe usbcore libata rng_core agpgart xfrm_algo raid_class dca usb_common sp5100_tco scsi_transport_sas wmi button sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_mod scsi_common loop btrfs blake2b_generic xor raid6_pq libcrc32c crc32c_generic crc32c_intel
[  498.024485] CR2: ffffb0f4a072006c
[  498.024635] ---[ end trace 0000000000000000 ]---
[  498.029725] pstore: backend (erst) writing error (-28)
[  498.029887] RIP: 0010:zil_claim_log_record+0x47/0xd0 [zfs]
[  498.030186] Code: 83 f8 09 74 6c 48 83 f8 18 75 55 48 85 d2 74 50 48 83 7e 40 00 48 8b 6f 38 74 45 45 31 e4 31 d2 48 89 d0 48 c1 e0 07 48 01 d8 <f6> 40 7c 80 75 22 48 83 78 48 00 75 07 48 83 78 50 00 74 14 48 c1
[  498.030514] RSP: 0018:ffffb0f4ebe3b9d0 EFLAGS: 00010286
[  498.030680] RAX: ffffb0f4a071fff0 RBX: ffffb0f4a06e26f0 RCX: 0000000000000000
[  498.030848] RDX: 00000000000007b2 RSI: 0000000000000100 RDI: 00000000ffffffff
[  498.031018] RBP: ffff8ad51ef6c000 R08: 0000000000000001 R09: ffff8ad56c452088
[  498.031188] R10: ffff8ad4e8a71c08 R11: ffff8ad691ef9c50 R12: 00000000000007b2
[  498.031360] R13: ffff8ad68f6cb380 R14: ffffffffffffffff R15: 0000000000003abc
[  498.031533] FS:  0000000000000000(0000) GS:ffff8b53fd600000(0000) knlGS:0000000000000000
[  498.031709] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  498.031884] CR2: ffffb0f4a072006c CR3: 0000007f89a20000 CR4: 0000000000350ee0
[  498.032062] note: dmu_objset_find[4873] exited with irqs disabled
@heppu heppu added the Type: Defect Incorrect behavior (e.g. crash, hang) label Oct 30, 2023
@heppu
Copy link
Author

heppu commented Oct 30, 2023

Similar type of issue with same zil_claim_log_record in stack strace wich could be related: #15275

@Kitt3120
Copy link

Kitt3120 commented Nov 7, 2023

Same problem here. There have been so many reports the last couple of days here on the issues page. For me, upgrading to a kernel > 6.5 helped so far that at least my system does not completely freeze anymore. However, I still have I/O-heavy programs running that will eventually freeze after a couple of hours, like the MariaDB of my Nextcloud. Those processes then can't be killed (and thus not restarted) and doing a shutdown/reboot results in the system locking up, so I have to cut the power.

Good to know that it's ZFS tho. Because the kernel dumps in my journal were referencing processes all over the place, I thought that it must be a hardware fault. I have now temporarily shut down all my I/O-heavy programs to prevent potential pool corruption.

@mtippmann
Copy link

looks like it's related to #15485 - i've hit the pool corruption with a similiar stacktrace - #15485 (comment) it still happens even with the committed fix for #15275

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants