Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpio:[ 261.010291] Unable to handle kernel NULL pointer dereference at virtual address 00000070 #186

Closed
shrd20190131 opened this issue Jan 14, 2020 · 1 comment

Comments

@shrd20190131
Copy link

This pin controls our device, when I pull this pin high, the bmc system sometimes will hang

root@jbof:/sys/class/gpio/gpio425# echo 1 > value
[ 261.010291] Unable to handle kernel NULL pointer dereference at virtual address 00000070
[ 261.018412] pgd = 05e00a87
[ 261.021122] [00000070] *pgd=00000000
[ 261.024717] Internal error: Oops: 17 [#1] ARM
[ 261.029087] CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-c01e98d-dirty-0a2bbcd #1
[ 261.036652] Hardware name: Generic DT based system
[ 261.041459] PC is at aspeed_i2c_master_irq+0x138/0x4b0
[ 261.046602] LR is at aspeed_i2c_bus_irq+0x50/0xe8
[ 261.051301] pc : [<804b0060>] lr : [<804b0428>] psr: 80000193
[ 261.057557] sp : 80a01d60 ip : 00000001 fp : 80a01d84
[ 261.062773] r10: 80a01de8 r9 : 00000000 r8 : 00000002
[ 261.067991] r7 : 00000025 r6 : 00000001 r5 : 00000001 r4 : 9d633a20
[ 261.074508] r3 : 00000000 r2 : 00000000 r1 : 9f8321c0 r0 : 00000070
[ 261.081028] Flags: Nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
[ 261.088237] Control: 00c5387d Table: 9b85c008 DAC: 00000051
[ 261.093981] Process swapper (pid: 0, stack limit = 0x180199c9)
[ 261.099807] Stack: (0x80a01d60 to 0x80a02000)
[ 261.104168] 1d60: 00000001 9d633a20 9d5267a0 00000025 80a03008 00000000 80a01da4 80a01d88
[ 261.112338] 1d80: 804b0428 804aff34 9d635400 9d4020e0 9d5267a0 00000025 80a01de4 80a01da8
[ 261.120510] 1da0: 8014bfb8 804b03e4 80a01de4 8015d7f0 80a01dfc 9d5267a0 8015d7f0 80a03008
[ 261.128683] 1dc0: 9d4020e0 9d5267a0 80a2e724 80a03008 80a00000 80938a38 80a01e0c 80a01de8
[ 261.136855] 1de0: 8014c16c 8014bf7c 00000000 e62d968e 80a00000 9d5267a0 9d4020e0 9d41f910
[ 261.145028] 1e00: 80a01e24 80a01e10 8014c1fc 8014c140 9d5267a0 9d4020e0 80a01e3c 80a01e28
[ 261.153201] 1e20: 8014fe50 8014c1d0 00000006 9d4020e0 80a01e4c 80a01e40 8014b748 8014fdcc
[ 261.161373] 1e40: 80a01e7c 80a01e50 8038a6fc 8014b724 00000040 e62d968e 9d413600 00000010
[ 261.169546] 1e60: 80a4a038 00000001 00000000 9d413600 80a01e8c 80a01e80 8014b748 8038a690
[ 261.177720] 1e80: 80a01eb4 80a01e90 8014b7b4 8014b724 9d402e40 80a01ed0 ffffffff 80a01f04
[ 261.185891] 1ea0: 807d8c30 80a00000 80a01ecc 80a01eb8 801021e4 8014b768 80103b74 60000013
[ 261.194064] 1ec0: 80a01f2c 80a01ed0 80101a6c 80102188 00000000 00000000 d0bbb400 00000000
[ 261.202236] 1ee0: ffffe000 00000000 80a0305c 80a4cf00 807d8c30 00000001 80938a38 80a01f2c
[ 261.210410] 1f00: 80a01f30 80a01f20 80103b70 80103b74 60000013 ffffffff 00000051 00000000
[ 261.218581] 1f20: 80a01f3c 80a01f30 80690c90 80103b48 80a01f64 80a01f40 8013f330 80690c64
[ 261.226756] 1f40: 80a0ce80 00000001 ffffffff 80a613a0 80a03008 80a613a0 80a01f74 80a01f68
[ 261.234927] 1f60: 8013f61c 8013f298 80a01f8c 80a01f78 8068a6c8 8013f60c 80a613ec 00000001
[ 261.243100] 1f80: 80a01f9c 80a01f90 80900d08 8068a658 80a01ff4 80a01fa0 8090119c 80900cfc
[ 261.251272] 1fa0: ffffffff ffffffff 00000000 8090061c 00000000 80a03000 00000000 80938a38
[ 261.259445] 1fc0: e628948e 00000000 00000000 80900330 00000051 00c0387d 000022b8 9ea1c000
[ 261.267617] 1fe0: 410fb767 00c5387d 00000000 80a01ff8 00000000 80900d90 00000000 00000000
[ 261.275780] Backtrace:
[ 261.278245] [<804aff28>] (aspeed_i2c_master_irq) from [<804b0428>] (aspeed_i2c_bus_irq+0x50/0xe8)
[ 261.287123] r9:00000000 r8:80a03008 r7:00000025 r6:9d5267a0 r5:9d633a20 r4:00000001
[ 261.294887] [<804b03d8>] (aspeed_i2c_bus_irq) from [<8014bfb8>] (__handle_irq_event_percpu+0x48/0x1c4)
[ 261.304192] r7:00000025 r6:9d5267a0 r5:9d4020e0 r4:9d635400
[ 261.309861] [<8014bf70>] (__handle_irq_event_percpu) from [<8014c16c>] (handle_irq_event_percpu+0x38/0x90)
[ 261.319506] r10:80938a38 r9:80a00000 r8:80a03008 r7:80a2e724 r6:9d5267a0 r5:9d4020e0
[ 261.327322] r4:80a03008
[ 261.329869] [<8014c134>] (handle_irq_event_percpu) from [<8014c1fc>] (handle_irq_event+0x38/0x4c)
[ 261.338731] r6:9d41f910 r5:9d4020e0 r4:9d5267a0
[ 261.343356] [<8014c1c4>] (handle_irq_event) from [<8014fe50>] (handle_simple_irq+0x90/0xa4)
[ 261.351696] r5:9d4020e0 r4:9d5267a0
[ 261.355277] [<8014fdc0>] (handle_simple_irq) from [<8014b748>] (generic_handle_irq+0x30/0x44)
[ 261.363794] r5:9d4020e0 r4:00000006
[ 261.367379] [<8014b718>] (generic_handle_irq) from [<8038a6fc>] (aspeed_i2c_ic_irq_handler+0x78/0x104)
[ 261.376680] [<8038a684>] (aspeed_i2c_ic_irq_handler) from [<8014b748>] (generic_handle_irq+0x30/0x44)
[ 261.385889] r8:9d413600 r7:00000000 r6:00000001 r5:80a4a038 r4:00000010
[ 261.392595] [<8014b718>] (generic_handle_irq) from [<8014b7b4>] (__handle_domain_irq+0x58/0xb8)
[ 261.401300] [<8014b75c>] (__handle_domain_irq) from [<801021e4>] (avic_handle_irq+0x68/0x70)
[ 261.409735] r9:80a00000 r8:807d8c30 r7:80a01f04 r6:ffffffff r5:80a01ed0 r4:9d402e40
[ 261.417475] [<8010217c>] (avic_handle_irq) from [<80101a6c>] (__irq_svc+0x6c/0x90)
[ 261.425032] Exception stack(0x80a01ed0 to 0x80a01f18)
[ 261.430079] 1ec0: 00000000 00000000 d0bbb400 00000000
[ 261.438256] 1ee0: ffffe000 00000000 80a0305c 80a4cf00 807d8c30 00000001 80938a38 80a01f2c
[ 261.446427] 1f00: 80a01f30 80a01f20 80103b70 80103b74 60000013 ffffffff
[ 261.453035] r5:60000013 r4:80103b74
[ 261.456624] [<80103b3c>] (arch_cpu_idle) from [<80690c90>] (default_idle_call+0x38/0x3c)
[ 261.464733] [<80690c58>] (default_idle_call) from [<8013f330>] (do_idle+0xa4/0xec)
[ 261.472306] [<8013f28c>] (do_idle) from [<8013f61c>] (cpu_startup_entry+0x1c/0x20)
[ 261.479874] r9:80a613a0 r8:80a03008 r7:80a613a0 r6:ffffffff r5:00000001 r4:80a0ce80
[ 261.487620] [<8013f600>] (cpu_startup_entry) from [<8068a6c8>] (rest_init+0x7c/0x9c)
[ 261.495369] [<8068a64c>] (rest_init) from [<80900d08>] (arch_call_rest_init+0x18/0x1c)
[ 261.503276] r5:00000001 r4:80a613ec
[ 261.506856] [<80900cf0>] (arch_call_rest_init) from [<8090119c>] (start_kernel+0x418/0x4cc)
[ 261.515206] [<80900d84>] (start_kernel) from [<00000000>] (0x0)
[ 261.521130] Code: e59431bc e5941194 e283c001 e584c1bc (e7d03003)
[ 261.527237] ---[ end trace 0ab992cf2f4efb95 ]---
[ 261.531854] Kernel panic - not syncing: Fatal exception in interrupt
[ 261.538213] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Below is the system version

root@jbof:~# cat /etc/os-release
ID="openbmc-phosphor"
VERSION="2.7.0-0"
VERSION_ID="2.7.0-0-1.0-20191226"
BUILD_ID="2.7.0"

@shenki
Copy link
Member

shenki commented Jan 16, 2020

Please re-test using the dev-5.4 branch of the openbmc kernel.

@shenki shenki closed this as completed Jan 16, 2020
shenki pushed a commit that referenced this issue Oct 8, 2020
[ Upstream commit 1264c1e ]

This patch causes a regression betwen Kernel 5.7 and 5.8 at wlcore:
with it applied, WiFi stops working, and the Kernel starts printing
this message every second:

   wlcore: PHY firmware version: Rev 8.2.0.0.242
   wlcore: firmware booted (Rev 8.9.0.0.79)
   wlcore: ERROR command execute failure 14
   ------------[ cut here ]------------
   WARNING: CPU: 0 PID: 133 at drivers/net/wireless/ti/wlcore/main.c:795 wl12xx_queue_recovery_work.part.0+0x6c/0x74 [wlcore]
   Modules linked in: wl18xx wlcore mac80211 libarc4 cfg80211 rfkill snd_soc_hdmi_codec crct10dif_ce wlcore_sdio adv7511 cec kirin9xx_drm(C) kirin9xx_dw_drm_dsi(C) drm_kms_helper drm ip_tables x_tables ipv6 nf_defrag_ipv6
   CPU: 0 PID: 133 Comm: kworker/0:1 Tainted: G        WC        5.8.0+ #186
   Hardware name: HiKey970 (DT)
   Workqueue: events_freezable ieee80211_restart_work [mac80211]
   pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
   pc : wl12xx_queue_recovery_work.part.0+0x6c/0x74 [wlcore]
   lr : wl12xx_queue_recovery_work+0x24/0x30 [wlcore]
   sp : ffff8000126c3a60
   x29: ffff8000126c3a60 x28: 00000000000025de
   x27: 0000000000000010 x26: 0000000000000005
   x25: ffff0001a5d49e80 x24: ffff8000092cf580
   x23: ffff0001b7c12623 x22: ffff0001b6fcf2e8
   x21: ffff0001b7e46200 x20: 00000000fffffffb
   x19: ffff0001a78e6400 x18: 0000000000000030
   x17: 0000000000000001 x16: 0000000000000001
   x15: ffff0001b7e46670 x14: ffffffffffffffff
   x13: ffff8000926c37d7 x12: ffff8000126c37e0
   x11: ffff800011e01000 x10: ffff8000120526d0
   x9 : 0000000000000000 x8 : 3431206572756c69
   x7 : 6166206574756365 x6 : 0000000000000c2c
   x5 : 0000000000000000 x4 : ffff0001bf1361e8
   x3 : ffff0001bf1790b0 x2 : 0000000000000000
   x1 : ffff0001a5d49e80 x0 : 0000000000000001
   Call trace:
    wl12xx_queue_recovery_work.part.0+0x6c/0x74 [wlcore]
    wl12xx_queue_recovery_work+0x24/0x30 [wlcore]
    wl1271_cmd_set_sta_key+0x258/0x25c [wlcore]
    wl1271_set_key+0x7c/0x2dc [wlcore]
    wlcore_set_key+0xe4/0x360 [wlcore]
    wl18xx_set_key+0x48/0x1d0 [wl18xx]
    wlcore_op_set_key+0xa4/0x180 [wlcore]
    ieee80211_key_enable_hw_accel+0xb0/0x2d0 [mac80211]
    ieee80211_reenable_keys+0x70/0x110 [mac80211]
    ieee80211_reconfig+0xa00/0xca0 [mac80211]
    ieee80211_restart_work+0xc4/0xfc [mac80211]
    process_one_work+0x1cc/0x350
    worker_thread+0x13c/0x470
    kthread+0x154/0x160
    ret_from_fork+0x10/0x30
   ---[ end trace b1f722abf9af5919 ]---
   wlcore: WARNING could not set keys
   wlcore: ERROR Could not add or replace key
   wlan0: failed to set key (4, ff:ff:ff:ff:ff:ff) to hardware (-5)
   wlcore: Hardware recovery in progress. FW ver: Rev 8.9.0.0.79
   wlcore: pc: 0x0, hint_sts: 0x00000040 count: 39
   wlcore: down
   wlcore: down
   ieee80211 phy0: Hardware restart was requested
   mmc_host mmc0: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
   mmc_host mmc0: Bus speed (slot 0) = 25000000Hz (slot req 25000000Hz, actual 25000000HZ div = 0)
   wlcore: PHY firmware version: Rev 8.2.0.0.242
   wlcore: firmware booted (Rev 8.9.0.0.79)
   wlcore: ERROR command execute failure 14
   ------------[ cut here ]------------

Tested on Hikey 970.

This reverts commit 2b7aadd.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/f0a2cb7ea606f1a284d4c23cbf983da2954ce9b6.1598420968.git.mchehab+huawei@kernel.org
Signed-off-by: Sasha Levin <[email protected]>
shenki pushed a commit that referenced this issue Mar 22, 2022
[ Upstream commit a0f0cf8 ]

When using the flushoncommit mount option, during almost every transaction
commit we trigger a warning from __writeback_inodes_sb_nr():

  $ cat fs/fs-writeback.c:
  (...)
  static void __writeback_inodes_sb_nr(struct super_block *sb, ...
  {
        (...)
        WARN_ON(!rwsem_is_locked(&sb->s_umount));
        (...)
  }
  (...)

The trace produced in dmesg looks like the following:

  [947.473890] WARNING: CPU: 5 PID: 930 at fs/fs-writeback.c:2610 __writeback_inodes_sb_nr+0x7e/0xb3
  [947.481623] Modules linked in: nfsd nls_cp437 cifs asn1_decoder cifs_arc4 fscache cifs_md4 ipmi_ssif
  [947.489571] CPU: 5 PID: 930 Comm: btrfs-transacti Not tainted 95.16.3-srb-asrock-00001-g36437ad63879 #186
  [947.497969] RIP: 0010:__writeback_inodes_sb_nr+0x7e/0xb3
  [947.502097] Code: 24 10 4c 89 44 24 18 c6 (...)
  [947.519760] RSP: 0018:ffffc90000777e10 EFLAGS: 00010246
  [947.523818] RAX: 0000000000000000 RBX: 0000000000963300 RCX: 0000000000000000
  [947.529765] RDX: 0000000000000000 RSI: 000000000000fa51 RDI: ffffc90000777e50
  [947.535740] RBP: ffff888101628a90 R08: ffff888100955800 R09: ffff888100956000
  [947.541701] R10: 0000000000000002 R11: 0000000000000001 R12: ffff888100963488
  [947.547645] R13: ffff888100963000 R14: ffff888112fb7200 R15: ffff888100963460
  [947.553621] FS:  0000000000000000(0000) GS:ffff88841fd40000(0000) knlGS:0000000000000000
  [947.560537] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [947.565122] CR2: 0000000008be50c4 CR3: 000000000220c000 CR4: 00000000001006e0
  [947.571072] Call Trace:
  [947.572354]  <TASK>
  [947.573266]  btrfs_commit_transaction+0x1f1/0x998
  [947.576785]  ? start_transaction+0x3ab/0x44e
  [947.579867]  ? schedule_timeout+0x8a/0xdd
  [947.582716]  transaction_kthread+0xe9/0x156
  [947.585721]  ? btrfs_cleanup_transaction.isra.0+0x407/0x407
  [947.590104]  kthread+0x131/0x139
  [947.592168]  ? set_kthread_struct+0x32/0x32
  [947.595174]  ret_from_fork+0x22/0x30
  [947.597561]  </TASK>
  [947.598553] ---[ end trace 644721052755541c ]---

This is because we started using writeback_inodes_sb() to flush delalloc
when committing a transaction (when using -o flushoncommit), in order to
avoid deadlocks with filesystem freeze operations. This change was made
by commit ce8ea7c ("btrfs: don't call btrfs_start_delalloc_roots
in flushoncommit"). After that change we started producing that warning,
and every now and then a user reports this since the warning happens too
often, it spams dmesg/syslog, and a user is unsure if this reflects any
problem that might compromise the filesystem's reliability.

We can not just lock the sb->s_umount semaphore before calling
writeback_inodes_sb(), because that would at least deadlock with
filesystem freezing, since at fs/super.c:freeze_super() sync_filesystem()
is called while we are holding that semaphore in write mode, and that can
trigger a transaction commit, resulting in a deadlock. It would also
trigger the same type of deadlock in the unmount path. Possibly, it could
also introduce some other locking dependencies that lockdep would report.

To fix this call try_to_writeback_inodes_sb() instead of
writeback_inodes_sb(), because that will try to read lock sb->s_umount
and then will only call writeback_inodes_sb() if it was able to lock it.
This is fine because the cases where it can't read lock sb->s_umount
are during a filesystem unmount or during a filesystem freeze - in those
cases sb->s_umount is write locked and sync_filesystem() is called, which
calls writeback_inodes_sb(). In other words, in all cases where we can't
take a read lock on sb->s_umount, writeback is already being triggered
elsewhere.

An alternative would be to call btrfs_start_delalloc_roots() with a
number of pages different from LONG_MAX, for example matching the number
of delalloc bytes we currently have, in which case we would end up
starting all delalloc with filemap_fdatawrite_wbc() and not with an
async flush via filemap_flush() - that is only possible after the rather
recent commit e076ab2 ("btrfs: shrink delalloc pages instead of
full inodes"). However that creates a whole new can of worms due to new
lock dependencies, which lockdep complains, like for example:

[ 8948.247280] ======================================================
[ 8948.247823] WARNING: possible circular locking dependency detected
[ 8948.248353] 5.17.0-rc1-btrfs-next-111 #1 Not tainted
[ 8948.248786] ------------------------------------------------------
[ 8948.249320] kworker/u16:18/933570 is trying to acquire lock:
[ 8948.249812] ffff9b3de1591690 (sb_internal#2){.+.+}-{0:0}, at: find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.250638]
               but task is already holding lock:
[ 8948.251140] ffff9b3e09c717d8 (&root->delalloc_mutex){+.+.}-{3:3}, at: start_delalloc_inodes+0x78/0x400 [btrfs]
[ 8948.252018]
               which lock already depends on the new lock.

[ 8948.252710]
               the existing dependency chain (in reverse order) is:
[ 8948.253343]
               -> #2 (&root->delalloc_mutex){+.+.}-{3:3}:
[ 8948.253950]        __mutex_lock+0x90/0x900
[ 8948.254354]        start_delalloc_inodes+0x78/0x400 [btrfs]
[ 8948.254859]        btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs]
[ 8948.255408]        btrfs_commit_transaction+0x32f/0xc00 [btrfs]
[ 8948.255942]        btrfs_mksubvol+0x380/0x570 [btrfs]
[ 8948.256406]        btrfs_mksnapshot+0x81/0xb0 [btrfs]
[ 8948.256870]        __btrfs_ioctl_snap_create+0x17f/0x190 [btrfs]
[ 8948.257413]        btrfs_ioctl_snap_create_v2+0xbb/0x140 [btrfs]
[ 8948.257961]        btrfs_ioctl+0x1196/0x3630 [btrfs]
[ 8948.258418]        __x64_sys_ioctl+0x83/0xb0
[ 8948.258793]        do_syscall_64+0x3b/0xc0
[ 8948.259146]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 8948.259709]
               -> #1 (&fs_info->delalloc_root_mutex){+.+.}-{3:3}:
[ 8948.260330]        __mutex_lock+0x90/0x900
[ 8948.260692]        btrfs_start_delalloc_roots+0x97/0x2a0 [btrfs]
[ 8948.261234]        btrfs_commit_transaction+0x32f/0xc00 [btrfs]
[ 8948.261766]        btrfs_set_free_space_cache_v1_active+0x38/0x60 [btrfs]
[ 8948.262379]        btrfs_start_pre_rw_mount+0x119/0x180 [btrfs]
[ 8948.262909]        open_ctree+0x1511/0x171e [btrfs]
[ 8948.263359]        btrfs_mount_root.cold+0x12/0xde [btrfs]
[ 8948.263863]        legacy_get_tree+0x30/0x50
[ 8948.264242]        vfs_get_tree+0x28/0xc0
[ 8948.264594]        vfs_kern_mount.part.0+0x71/0xb0
[ 8948.265017]        btrfs_mount+0x11d/0x3a0 [btrfs]
[ 8948.265462]        legacy_get_tree+0x30/0x50
[ 8948.265851]        vfs_get_tree+0x28/0xc0
[ 8948.266203]        path_mount+0x2d4/0xbe0
[ 8948.266554]        __x64_sys_mount+0x103/0x140
[ 8948.266940]        do_syscall_64+0x3b/0xc0
[ 8948.267300]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 8948.267790]
               -> #0 (sb_internal#2){.+.+}-{0:0}:
[ 8948.268322]        __lock_acquire+0x12e8/0x2260
[ 8948.268733]        lock_acquire+0xd7/0x310
[ 8948.269092]        start_transaction+0x44c/0x6e0 [btrfs]
[ 8948.269591]        find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.270087]        btrfs_reserve_extent+0x14b/0x280 [btrfs]
[ 8948.270588]        cow_file_range+0x17e/0x490 [btrfs]
[ 8948.271051]        btrfs_run_delalloc_range+0x345/0x7a0 [btrfs]
[ 8948.271586]        writepage_delalloc+0xb5/0x170 [btrfs]
[ 8948.272071]        __extent_writepage+0x156/0x3c0 [btrfs]
[ 8948.272579]        extent_write_cache_pages+0x263/0x460 [btrfs]
[ 8948.273113]        extent_writepages+0x76/0x130 [btrfs]
[ 8948.273573]        do_writepages+0xd2/0x1c0
[ 8948.273942]        filemap_fdatawrite_wbc+0x68/0x90
[ 8948.274371]        start_delalloc_inodes+0x17f/0x400 [btrfs]
[ 8948.274876]        btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs]
[ 8948.275417]        flush_space+0x1f2/0x630 [btrfs]
[ 8948.275863]        btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs]
[ 8948.276438]        process_one_work+0x252/0x5a0
[ 8948.276829]        worker_thread+0x55/0x3b0
[ 8948.277189]        kthread+0xf2/0x120
[ 8948.277506]        ret_from_fork+0x22/0x30
[ 8948.277868]
               other info that might help us debug this:

[ 8948.278548] Chain exists of:
                 sb_internal#2 --> &fs_info->delalloc_root_mutex --> &root->delalloc_mutex

[ 8948.279601]  Possible unsafe locking scenario:

[ 8948.280102]        CPU0                    CPU1
[ 8948.280508]        ----                    ----
[ 8948.280915]   lock(&root->delalloc_mutex);
[ 8948.281271]                                lock(&fs_info->delalloc_root_mutex);
[ 8948.281915]                                lock(&root->delalloc_mutex);
[ 8948.282487]   lock(sb_internal#2);
[ 8948.282800]
                *** DEADLOCK ***

[ 8948.283333] 4 locks held by kworker/u16:18/933570:
[ 8948.283750]  #0: ffff9b3dc00a9d48 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1d2/0x5a0
[ 8948.284609]  #1: ffffa90349dafe70 ((work_completion)(&fs_info->async_data_reclaim_work)){+.+.}-{0:0}, at: process_one_work+0x1d2/0x5a0
[ 8948.285637]  #2: ffff9b3e14db5040 (&fs_info->delalloc_root_mutex){+.+.}-{3:3}, at: btrfs_start_delalloc_roots+0x97/0x2a0 [btrfs]
[ 8948.286674]  #3: ffff9b3e09c717d8 (&root->delalloc_mutex){+.+.}-{3:3}, at: start_delalloc_inodes+0x78/0x400 [btrfs]
[ 8948.287596]
              stack backtrace:
[ 8948.287975] CPU: 3 PID: 933570 Comm: kworker/u16:18 Not tainted 5.17.0-rc1-btrfs-next-111 #1
[ 8948.288677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 8948.289649] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
[ 8948.290298] Call Trace:
[ 8948.290517]  <TASK>
[ 8948.290700]  dump_stack_lvl+0x59/0x73
[ 8948.291026]  check_noncircular+0xf3/0x110
[ 8948.291375]  ? start_transaction+0x228/0x6e0 [btrfs]
[ 8948.291826]  __lock_acquire+0x12e8/0x2260
[ 8948.292241]  lock_acquire+0xd7/0x310
[ 8948.292714]  ? find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.293241]  ? lock_is_held_type+0xea/0x140
[ 8948.293601]  start_transaction+0x44c/0x6e0 [btrfs]
[ 8948.294055]  ? find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.294518]  find_free_extent+0x141e/0x1590 [btrfs]
[ 8948.294957]  ? _raw_spin_unlock+0x29/0x40
[ 8948.295312]  ? btrfs_get_alloc_profile+0x124/0x290 [btrfs]
[ 8948.295813]  btrfs_reserve_extent+0x14b/0x280 [btrfs]
[ 8948.296270]  cow_file_range+0x17e/0x490 [btrfs]
[ 8948.296691]  btrfs_run_delalloc_range+0x345/0x7a0 [btrfs]
[ 8948.297175]  ? find_lock_delalloc_range+0x247/0x270 [btrfs]
[ 8948.297678]  writepage_delalloc+0xb5/0x170 [btrfs]
[ 8948.298123]  __extent_writepage+0x156/0x3c0 [btrfs]
[ 8948.298570]  extent_write_cache_pages+0x263/0x460 [btrfs]
[ 8948.299061]  extent_writepages+0x76/0x130 [btrfs]
[ 8948.299495]  do_writepages+0xd2/0x1c0
[ 8948.299817]  ? sched_clock_cpu+0xd/0x110
[ 8948.300160]  ? lock_release+0x155/0x4a0
[ 8948.300494]  filemap_fdatawrite_wbc+0x68/0x90
[ 8948.300874]  ? do_raw_spin_unlock+0x4b/0xa0
[ 8948.301243]  start_delalloc_inodes+0x17f/0x400 [btrfs]
[ 8948.301706]  ? lock_release+0x155/0x4a0
[ 8948.302055]  btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs]
[ 8948.302564]  flush_space+0x1f2/0x630 [btrfs]
[ 8948.302970]  btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs]
[ 8948.303510]  process_one_work+0x252/0x5a0
[ 8948.303860]  ? process_one_work+0x5a0/0x5a0
[ 8948.304221]  worker_thread+0x55/0x3b0
[ 8948.304543]  ? process_one_work+0x5a0/0x5a0
[ 8948.304904]  kthread+0xf2/0x120
[ 8948.305184]  ? kthread_complete_and_exit+0x20/0x20
[ 8948.305598]  ret_from_fork+0x22/0x30
[ 8948.305921]  </TASK>

It all comes from the fact that btrfs_start_delalloc_roots() takes the
delalloc_root_mutex, in the transaction commit path we are holding a
read lock on one of the superblock's freeze semaphores (via
sb_start_intwrite()), the async reclaim task can also do a call to
btrfs_start_delalloc_roots(), which ends up triggering writeback with
calls to filemap_fdatawrite_wbc(), resulting in extent allocation which
in turn can call btrfs_start_transaction(), which will result in taking
the freeze semaphore via sb_start_intwrite(), forming a nasty dependency
on all those locks which can be taken in different orders by different
code paths.

So just adopt the simple approach of calling try_to_writeback_inodes_sb()
at btrfs_start_delalloc_flush().

Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Reviewed-by: Omar Sandoval <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
[ add more link reports ]
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
dkodihal pushed a commit to NVIDIA/linux that referenced this issue Oct 15, 2024
commit bb32500 upstream.

The following can crash the kernel:

 # cd /sys/kernel/tracing
 # echo 'p:sched schedule' > kprobe_events
 # exec 5>>events/kprobes/sched/enable
 # > kprobe_events
 # exec 5>&-

The above commands:

 1. Change directory to the tracefs directory
 2. Create a kprobe event (doesn't matter what one)
 3. Open bash file descriptor 5 on the enable file of the kprobe event
 4. Delete the kprobe event (removes the files too)
 5. Close the bash file descriptor 5

The above causes a crash!

 BUG: kernel NULL pointer dereference, address: 0000000000000028
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [openbmc#1] PREEMPT SMP PTI
 CPU: 6 PID: 877 Comm: bash Not tainted 6.5.0-rc4-test-00008-g2c6b6b1029d4-dirty openbmc#186
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
 RIP: 0010:tracing_release_file_tr+0xc/0x50

What happens here is that the kprobe event creates a trace_event_file
"file" descriptor that represents the file in tracefs to the event. It
maintains state of the event (is it enabled for the given instance?).
Opening the "enable" file gets a reference to the event "file" descriptor
via the open file descriptor. When the kprobe event is deleted, the file is
also deleted from the tracefs system which also frees the event "file"
descriptor.

But as the tracefs file is still opened by user space, it will not be
totally removed until the final dput() is called on it. But this is not
true with the event "file" descriptor that is already freed. If the user
does a write to or simply closes the file descriptor it will reference the
event "file" descriptor that was just freed, causing a use-after-free bug.

To solve this, add a ref count to the event "file" descriptor as well as a
new flag called "FREED". The "file" will not be freed until the last
reference is released. But the FREE flag will be set when the event is
removed to prevent any more modifications to that event from happening,
even if there's still a reference to the event "file" descriptor.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]

Cc: [email protected]
Cc: Mark Rutland <[email protected]>
Fixes: f5ca233 ("tracing: Increase trace array ref count on enable and filter files")
Reported-by: Beau Belgrave <[email protected]>
Tested-by: Beau Belgrave <[email protected]>
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants