Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GPIO Kconfig bug & add BT host driver #1

Merged
merged 2 commits into from
Sep 29, 2015

Conversation

apopple
Copy link

@apopple apopple commented Sep 29, 2015

No description provided.

The Aspeed GPIO menu entry was added under "PCI GPIO expanders" which
depends on PCI which may not be enabled for Aspeed. Move the entry to
the "Memory mapped GPIO drivers" which describes the Aspeed GPIO
driver more correctly.

Signed-off-by: Alistair Popple <[email protected]>
This patch adds a simple device driver to expose the iBT interface on
Aspeed chips as a character device (/dev/bt).

Signed-off-by: Alistair Popple <[email protected]>
@apopple apopple merged this pull request into openbmc:dev Sep 29, 2015
shenki pushed a commit that referenced this pull request Nov 13, 2015
…ap_device_late_init

Kernel fails to boot 50% of times (form build to build) with
RT-patchset applied due to the following race - on late boot
stages deferred_probe_work_func->omap_hsmmc_probe races with omap_device_late_ini.

The same issue has been reported now on linux-next (4.3) by Keerthy [1]

late_initcall
 - deferred_probe_initcal() tries to re-probe all pending driver's probe.

- later on, some driver is probing in this case It's cpsw.c
  (but could be any other drivers)
  cpsw_init
  - platform_driver_register
    - really_probe
       - driver_bound
         - driver_deferred_probe_trigger
  and boot proceed.
  So, at this moment we have deferred_probe_work_func scheduled.

late_initcall_sync
  - omap_device_late_init
    - omap_device_idle

CPU1					CPU2
  - deferred_probe_work_func
    - really_probe
      - omap_hsmmc_probe
	- pm_runtime_get_sync
					late_initcall_sync
					- omap_device_late_init
						if (od->_driver_status != BUS_NOTIFY_BOUND_DRIVER) {
							if (od->_state == OMAP_DEVICE_STATE_ENABLED) {
								- omap_device_idle [ops - IP is disabled]
	- [fail]
	- pm_runtime_put_sync
          - omap_hsmmc_runtime_suspend [ooops!]

== log ==
 omap_hsmmc 480b4000.mmc: unable to get vmmc regulator -517
 davinci_mdio 48485000.mdio: davinci mdio revision 1.6
 davinci_mdio 48485000.mdio: detected phy mask fffffff3
 libphy: 48485000.mdio: probed
 davinci_mdio 48485000.mdio: phy[2]: device 48485000.mdio:02, driver unknown
 davinci_mdio 48485000.mdio: phy[3]: device 48485000.mdio:03, driver unknown
 omap_hsmmc 480b4000.mmc: unable to get vmmc regulator -517
 cpsw 48484000.ethernet: Detected MACID = b4:99:4c:c7:d2:48
 cpsw 48484000.ethernet: cpsw: Detected MACID = b4:99:4c:c7:d2:49
 hctosys: unable to open rtc device (rtc0)
 omap_hsmmc 480b4000.mmc: omap_device_late_idle: enabled but no driver.  Idling
 ldousb: disabling
 Unhandled fault: imprecise external abort (0x1406) at 0x00000000
 [00000000] *pgd=00000000
 Internal error: : 1406 [#1] PREEMPT SMP ARM
 Modules linked in:
 CPU: 1 PID: 58 Comm: kworker/u4:1 Not tainted 4.1.2-rt1-00467-g6da3c0a-dirty #5
 Hardware name: Generic DRA74X (Flattened Device Tree)
 Workqueue: deferwq deferred_probe_work_func
 task: ee6ddb00 ti: edd3c000 task.ti: edd3c000
 PC is at omap_hsmmc_runtime_suspend+0x1c/0x12c
 LR is at _od_runtime_suspend+0xc/0x24
 pc : [<c0471998>]    lr : [<c0029590>]    psr: a0000013
 sp : edd3dda0  ip : ee6ddb00  fp : c07be540
 r10: 00000000  r9 : c07be540  r8 : 00000008
 r7 : 00000000  r6 : ee646c10  r5 : ee646c10  r4 : edd79380
 r3 : fa0b4100  r2 : 00000000  r1 : 00000000  r0 : ee646c10
 Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
 Control: 10c5387d  Table: 8000406a  DAC: 00000015
 Process kworker/u4:1 (pid: 58, stack limit = 0xedd3c218)
 Stack: (0xedd3dda0 to 0xedd3e000)
 dda0: ee646c70 ee646c10 c0029584 00000000 00000008 c0029590 ee646c70 ee646c10
 ddc0: c0029584 c03adfb8 ee646c10 00000004 0000000c c03adff0 ee646c10 00000004
 dde0: 0000000c c03ae4ec 00000000 edd3c000 ee646c10 00000004 ee646c70 00000004
 de00: fa0b4000 c03aec20 ee6ddb00 ee646c10 00000004 ee646c70 ee646c10 fffffdfb
 de20: edd79380 00000000 fa0b4000 c03aee90 fffffdfb edd79000 ee646c00 c0474290
 de40: 00000000 edda24c0 edd79380 edc81f00 00000000 00000200 00000001 c06dd488
 de60: edda3960 ee646c10 ee646c10 c0824cc4 fffffdfb c0880c94 00000002 edc92600
 de80: c0836378 c03a7f84 ee646c10 c0824cc4 00000000 c0880c80 c0880c94 c03a6568
 dea0: 00000000 ee646c10 c03a66ac ee4f8000 00000000 00000001 edc92600 c03a4b40
 dec0: ee404c94 edc83c4c ee646c10 ee646c10 ee646c44 c03a63c4 ee646c10 ee646c10
 dee0: c0814448 c03a5aa8 ee646c10 c0814220 edd3c000 c03a5ec0 c0814250 ee6be400
 df00: edd3c000 c004e5bc ee6ddb01 00000078 ee6ddb00 ee4f8000 ee6be418 edd3c000
 df20: ee4f8028 00000088 c0836045 ee4f8000 ee6be400 c004e928 ee4f8028 00000000
 df40: c004e8ec 00000000 ee6bf1c0 ee6be400 c004e8ec 00000000 00000000 00000000
 df60: 00000000 c0053450 2e56fa97 00000000 afdffbd7 ee6be400 00000000 00000000
 df80: edd3df80 edd3df80 00000000 00000000 edd3df90 edd3df90 edd3dfac ee6bf1c0
 dfa0: c0053384 00000000 00000000 c000f668 00000000 00000000 00000000 00000000
 dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 f1fc9d7e febfbdff
 [<c0471998>] (omap_hsmmc_runtime_suspend) from [<c0029590>] (_od_runtime_suspend+0xc/0x24)
 [<c0029590>] (_od_runtime_suspend) from [<c03adfb8>] (__rpm_callback+0x24/0x3c)
 [<c03adfb8>] (__rpm_callback) from [<c03adff0>] (rpm_callback+0x20/0x80)
 [<c03adff0>] (rpm_callback) from [<c03ae4ec>] (rpm_suspend+0xe4/0x618)
 [<c03ae4ec>] (rpm_suspend) from [<c03aee90>] (__pm_runtime_idle+0x60/0x80)
 [<c03aee90>] (__pm_runtime_idle) from [<c0474290>] (omap_hsmmc_probe+0x6bc/0xa7c)
 [<c0474290>] (omap_hsmmc_probe) from [<c03a7f84>] (platform_drv_probe+0x44/0xa4)
 [<c03a7f84>] (platform_drv_probe) from [<c03a6568>] (driver_probe_device+0x170/0x2b4)
 [<c03a6568>] (driver_probe_device) from [<c03a4b40>] (bus_for_each_drv+0x64/0x98)
 [<c03a4b40>] (bus_for_each_drv) from [<c03a63c4>] (device_attach+0x70/0x88)
 [<c03a63c4>] (device_attach) from [<c03a5aa8>] (bus_probe_device+0x84/0xac)
 [<c03a5aa8>] (bus_probe_device) from [<c03a5ec0>] (deferred_probe_work_func+0x58/0x88)
 [<c03a5ec0>] (deferred_probe_work_func) from [<c004e5bc>] (process_one_work+0x134/0x464)
 [<c004e5bc>] (process_one_work) from [<c004e928>] (worker_thread+0x3c/0x4fc)
 [<c004e928>] (worker_thread) from [<c0053450>] (kthread+0xcc/0xe4)
 [<c0053450>] (kthread) from [<c000f668>] (ret_from_fork+0x14/0x2c)
 Code: e594302c e593202c e584205c e594302c (e5932128)
 ---[ end trace 0000000000000002 ]---

The issue happens because omap_device_late_init() do not take into
account that some drivers are present, but their probes were not
finished successfully and where deferred instead. This is the valid
case, and omap_device_late_init() should not idle such devices.

To fix this issue, the value of omap_device->_driver_status field
should be checked not only for BUS_NOTIFY_BOUND_DRIVER (driver is
present and has been bound to device successfully), but also checked
for BUS_NOTIFY_BIND_DRIVER (driver about to be bound) - which means
driver is present and there was try to bind it to device.

[1] http://www.spinics.net/lists/arm-kernel/msg441880.html
Cc: Tero Kristo <[email protected]>
Cc: Keerthy <[email protected]>
Tested-by: Keerthy <[email protected]>
Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Tony Lindgren <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
On old ARM chips, unaligned accesses to memory are not trapped and
fixed.  On module load, symbols are relocated, and the relocation of
__bug_table symbols is done on a u32 basis. Yet the section is not
aligned to a multiple of 4 address, but to a multiple of 2.

This triggers an Oops on pxa architecture, where address 0xbf0021ea
is the first relocation in the __bug_table section :
  apply_relocate(): pxa3xx_nand: section 13 reloc 0 sym ''
  Unable to handle kernel paging request at virtual address bf0021ea
  pgd = e1cd0000
  [bf0021ea] *pgd=c1cce851, *pte=c1cde04f, *ppte=c1cde01f
  Internal error: Oops: 23 [#1] ARM
  Modules linked in:
  CPU: 0 PID: 606 Comm: insmod Not tainted 4.2.0-rc8-next-20150828-cm-x300+ torvalds#887
  Hardware name: CM-X300 module
  task: e1c68700 ti: e1c3e000 task.ti: e1c3e000
  PC is at apply_relocate+0x2f4/0x3d4
  LR is at 0xbf0021ea
  pc : [<c000e7c8>]    lr : [<bf0021ea>]    psr: 80000013
  sp : e1c3fe30  ip : 60000013  fp : e49e8c60
  r10: e49e8fa8  r9 : 00000000  r8 : e49e7c58
  r7 : e49e8c38  r6 : e49e8a58  r5 : e49e8920  r4 : e49e8918
  r3 : bf0021ea  r2 : bf007034  r1 : 00000000  r0 : bf000000
  Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
  Control: 0000397f  Table: c1cd0018  DAC: 00000051
  Process insmod (pid: 606, stack limit = 0xe1c3e198)
  [<c000e7c8>] (apply_relocate) from [<c005ce5c>] (load_module+0x1248/0x1f5c)
  [<c005ce5c>] (load_module) from [<c005dc54>] (SyS_init_module+0xe4/0x170)
  [<c005dc54>] (SyS_init_module) from [<c000a420>] (ret_fast_syscall+0x0/0x38)

Fix this by ensuring entries in __bug_table are all aligned to at least
of multiple of 4. This transforms a module section  __bug_table as :
-   [12] __bug_table       PROGBITS        00000000 002232 000018 00   A  0   0  1
+   [12] __bug_table       PROGBITS        00000000 002232 000018 00   A  0   0  4

Signed-off-by: Robert Jarzmik <[email protected]>
Reviewed-by: Dave Martin <[email protected]>
Signed-off-by: Russell King <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
The renesas-irqc interrupt controller is cascaded to the GIC. Hence when
propagating wake-up settings to its parent interrupt controller, the
following lockdep warning is printed:

    =============================================
    [ INFO: possible recursive locking detected ]
    4.2.0-ape6evm-10725-g50fcd7643c034198 torvalds#280 Not tainted
    ---------------------------------------------
    s2ram/1072 is trying to acquire lock:
    (&irq_desc_lock_class){-.-...}, at: [<c008d3fc>] __irq_get_desc_lock+0x58/0x98

    but task is already holding lock:
    (&irq_desc_lock_class){-.-...}, at: [<c008d3fc>] __irq_get_desc_lock+0x58/0x98

    other info that might help us debug this:
    Possible unsafe locking scenario:

	  CPU0
	  ----
     lock(&irq_desc_lock_class);
     lock(&irq_desc_lock_class);

    *** DEADLOCK ***

    May be due to missing lock nesting notation

    6 locks held by s2ram/1072:
    #0:  (sb_writers#7){.+.+.+}, at: [<c012eb14>] __sb_start_write+0xa0/0xa8
    #1:  (&of->mutex){+.+.+.}, at: [<c019396c>] kernfs_fop_write+0x4c/0x1bc
    #2:  (s_active#24){.+.+.+}, at: [<c0193974>] kernfs_fop_write+0x54/0x1bc
    #3:  (pm_mutex){+.+.+.}, at: [<c008213c>] pm_suspend+0x10c/0x510
    #4:  (&dev->mutex){......}, at: [<c02af3c4>] __device_suspend+0xdc/0x2cc
    #5:  (&irq_desc_lock_class){-.-...}, at: [<c008d3fc>] __irq_get_desc_lock+0x58/0x98

    stack backtrace:
    CPU: 0 PID: 1072 Comm: s2ram Not tainted 4.2.0-ape6evm-10725-g50fcd7643c034198 torvalds#280
    Hardware name: Generic R8A73A4 (Flattened Device Tree)
    [<c0018078>] (unwind_backtrace) from [<c00144f0>] (show_stack+0x10/0x14)
    [<c00144f0>] (show_stack) from [<c0451f14>] (dump_stack+0x88/0x98)
    [<c0451f14>] (dump_stack) from [<c007b29c>] (__lock_acquire+0x15cc/0x20e4)
    [<c007b29c>] (__lock_acquire) from [<c007c6e0>] (lock_acquire+0xac/0x12c)
    [<c007c6e0>] (lock_acquire) from [<c0457c00>] (_raw_spin_lock_irqsave+0x40/0x54)
    [<c0457c00>] (_raw_spin_lock_irqsave) from [<c008d3fc>] (__irq_get_desc_lock+0x58/0x98)
    [<c008d3fc>] (__irq_get_desc_lock) from [<c008ebbc>] (irq_set_irq_wake+0x20/0xf8)
    [<c008ebbc>] (irq_set_irq_wake) from [<c0260770>] (irqc_irq_set_wake+0x20/0x4c)
    [<c0260770>] (irqc_irq_set_wake) from [<c008ec28>] (irq_set_irq_wake+0x8c/0xf8)
    [<c008ec28>] (irq_set_irq_wake) from [<c02cb8c0>] (gpio_keys_suspend+0x74/0xc0)
    [<c02cb8c0>] (gpio_keys_suspend) from [<c02ae8cc>] (dpm_run_callback+0x54/0x124)

Avoid this false positive by using a separate lockdep class for IRQC
interrupts.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Cc: Grygorii Strashko <[email protected]>
Cc: Magnus Damm <[email protected]>
Cc: Jason Cooper <[email protected]>
Link: http://lkml.kernel.org/r/1441798974-25716-2-git-send-email-geert%[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
The renesas-intc-irqpin interrupt controller is cascaded to the GIC.
Hence when propagating wake-up settings to its parent interrupt
controller, the following lockdep warning is printed:

    =============================================
    [ INFO: possible recursive locking detected ]
    4.2.0-armadillo-10725-g50fcd7643c034198 torvalds#781 Not tainted
    ---------------------------------------------
    s2ram/1179 is trying to acquire lock:
    (&irq_desc_lock_class){-.-...}, at: [<c005bb54>] __irq_get_desc_lock+0x78/0x94

    but task is already holding lock:
    (&irq_desc_lock_class){-.-...}, at: [<c005bb54>] __irq_get_desc_lock+0x78/0x94

    other info that might help us debug this:
    Possible unsafe locking scenario:

	  CPU0
	  ----
     lock(&irq_desc_lock_class);
     lock(&irq_desc_lock_class);

    *** DEADLOCK ***

    May be due to missing lock nesting notation

    7 locks held by s2ram/1179:
    #0:  (sb_writers#7){.+.+.+}, at: [<c00c9708>] __sb_start_write+0x64/0xb8
    #1:  (&of->mutex){+.+.+.}, at: [<c0125a00>] kernfs_fop_write+0x78/0x1a0
    #2:  (s_active#23){.+.+.+}, at: [<c0125a08>] kernfs_fop_write+0x80/0x1a0
    #3:  (autosleep_lock){+.+.+.}, at: [<c0058244>] pm_autosleep_lock+0x18/0x20
    #4:  (pm_mutex){+.+.+.}, at: [<c0057e50>] pm_suspend+0x54/0x248
    #5:  (&dev->mutex){......}, at: [<c0243a20>] __device_suspend+0xdc/0x240
    #6:  (&irq_desc_lock_class){-.-...}, at: [<c005bb54>] __irq_get_desc_lock+0x78/0x94

    stack backtrace:
    CPU: 0 PID: 1179 Comm: s2ram Not tainted 4.2.0-armadillo-10725-g50fcd7643c034198

    Hardware name: Generic R8A7740 (Flattened Device Tree)
    [<c00129f4>] (dump_backtrace) from [<c0012bec>] (show_stack+0x18/0x1c)
    [<c0012bd4>] (show_stack) from [<c03f5d94>] (dump_stack+0x20/0x28)
    [<c03f5d74>] (dump_stack) from [<c00514d4>] (__lock_acquire+0x67c/0x1b88)
    [<c0050e58>] (__lock_acquire) from [<c0052df8>] (lock_acquire+0x9c/0xbc)
    [<c0052d5c>] (lock_acquire) from [<c03fb068>] (_raw_spin_lock_irqsave+0x44/0x58)
    [<c03fb024>] (_raw_spin_lock_irqsave) from [<c005bb54>] (__irq_get_desc_lock+0x78/0x94
    [<c005badc>] (__irq_get_desc_lock) from [<c005c3d8>] (irq_set_irq_wake+0x28/0x100)
    [<c005c3b0>] (irq_set_irq_wake) from [<c01e50d0>] (intc_irqpin_irq_set_wake+0x24/0x4c)
    [<c01e50ac>] (intc_irqpin_irq_set_wake) from [<c005c17c>] (set_irq_wake_real+0x3c/0x50
    [<c005c140>] (set_irq_wake_real) from [<c005c414>] (irq_set_irq_wake+0x64/0x100)
    [<c005c3b0>] (irq_set_irq_wake) from [<c02a19b4>] (gpio_keys_suspend+0x60/0xa0)
    [<c02a1954>] (gpio_keys_suspend) from [<c023b750>] (platform_pm_suspend+0x3c/0x5c)

Avoid this false positive by using a separate lockdep class for INTC
External IRQ Pin interrupts.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Cc: Grygorii Strashko <[email protected]>
Cc: Magnus Damm <[email protected]>
Cc: Jason Cooper <[email protected]>
Link: http://lkml.kernel.org/r/1441798974-25716-3-git-send-email-geert%[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
The OPP list needs to be protected against concurrent accesses. Using
simple RCU read locks does the trick and gets rid of the following
lockdep warning:

	===============================
	[ INFO: suspicious RCU usage. ]
	4.2.0-next-20150908 #1 Not tainted
	-------------------------------
	drivers/base/power/opp.c:460 Missing rcu_read_lock() or dev_opp_list_lock protection!

	other info that might help us debug this:

	rcu_scheduler_active = 1, debug_locks = 0
	4 locks held by kworker/u8:0/6:
	 #0:  ("%s""deferwq"){++++.+}, at: [<c0040d8c>] process_one_work+0x118/0x4bc
	 #1:  (deferred_probe_work){+.+.+.}, at: [<c0040d8c>] process_one_work+0x118/0x4bc
	 #2:  (&dev->mutex){......}, at: [<c03b8194>] __device_attach+0x20/0x118
	 #3:  (prepare_lock){+.+...}, at: [<c054bc08>] clk_prepare_lock+0x10/0xf8

	stack backtrace:
	CPU: 2 PID: 6 Comm: kworker/u8:0 Not tainted 4.2.0-next-20150908 #1
	Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
	Workqueue: deferwq deferred_probe_work_func
	[<c001802c>] (unwind_backtrace) from [<c00135a4>] (show_stack+0x10/0x14)
	[<c00135a4>] (show_stack) from [<c02a8418>] (dump_stack+0x94/0xd4)
	[<c02a8418>] (dump_stack) from [<c03c6f6c>] (dev_pm_opp_find_freq_ceil+0x108/0x114)
	[<c03c6f6c>] (dev_pm_opp_find_freq_ceil) from [<c0551a3c>] (dfll_calculate_rate_request+0xb8/0x170)
	[<c0551a3c>] (dfll_calculate_rate_request) from [<c0551b10>] (dfll_clk_round_rate+0x1c/0x2c)
	[<c0551b10>] (dfll_clk_round_rate) from [<c054de2c>] (clk_calc_new_rates+0x1b8/0x228)
	[<c054de2c>] (clk_calc_new_rates) from [<c054e44c>] (clk_core_set_rate_nolock+0x44/0xac)
	[<c054e44c>] (clk_core_set_rate_nolock) from [<c054e4d8>] (clk_set_rate+0x24/0x34)
	[<c054e4d8>] (clk_set_rate) from [<c0512460>] (tegra124_cpufreq_probe+0x120/0x230)
	[<c0512460>] (tegra124_cpufreq_probe) from [<c03b9cbc>] (platform_drv_probe+0x44/0xac)
	[<c03b9cbc>] (platform_drv_probe) from [<c03b84c8>] (driver_probe_device+0x218/0x304)
	[<c03b84c8>] (driver_probe_device) from [<c03b69b0>] (bus_for_each_drv+0x60/0x94)
	[<c03b69b0>] (bus_for_each_drv) from [<c03b8228>] (__device_attach+0xb4/0x118)
	ata1: SATA link down (SStatus 0 SControl 300)
	[<c03b8228>] (__device_attach) from [<c03b77c8>] (bus_probe_device+0x88/0x90)
	[<c03b77c8>] (bus_probe_device) from [<c03b7be8>] (deferred_probe_work_func+0x58/0x8c)
	[<c03b7be8>] (deferred_probe_work_func) from [<c0040dfc>] (process_one_work+0x188/0x4bc)
	[<c0040dfc>] (process_one_work) from [<c004117c>] (worker_thread+0x4c/0x4f4)
	[<c004117c>] (worker_thread) from [<c0047230>] (kthread+0xe4/0xf8)
	[<c0047230>] (kthread) from [<c000f7d0>] (ret_from_fork+0x14/0x24)

Signed-off-by: Thierry Reding <[email protected]>
Fixes: c4fe70a ("clk: tegra: Add closed loop support for the DFLL")
[[email protected]: Unlock rcu on error path]
Signed-off-by: Vince Hsu <[email protected]>
[[email protected]: Dropped second hunk that nested the rcu
read lock unnecessarily]
Signed-off-by: Stephen Boyd <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
If filelayout_decode_layout fail, _filelayout_free_lseg will causes
a double freeing of fh_array.

[ 1179.279800] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1179.280198] IP: [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.281010] PGD 0
[ 1179.281443] Oops: 0000 [#1]
[ 1179.281831] Modules linked in: nfs_layout_nfsv41_files(OE) nfsv4(OE) nfs(OE) fscache(E) xfs libcrc32c coretemp nfsd crct10dif_pclmul ppdev crc32_pclmul crc32c_intel auth_rpcgss ghash_clmulni_intel nfs_acl lockd vmw_balloon grace sunrpc parport_pc vmw_vmci parport shpchp i2c_piix4 vmwgfx drm_kms_helper ttm drm serio_raw mptspi scsi_transport_spi mptscsih e1000 mptbase ata_generic pata_acpi [last unloaded: fscache]
[ 1179.283891] CPU: 0 PID: 13336 Comm: cat Tainted: G           OE   4.3.0-rc1-pnfs+ torvalds#244
[ 1179.284323] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[ 1179.285206] task: ffff8800501d48c0 ti: ffff88003e3c4000 task.ti: ffff88003e3c4000
[ 1179.285668] RIP: 0010:[<ffffffffa027222d>]  [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.286612] RSP: 0018:ffff88003e3c77f8  EFLAGS: 00010202
[ 1179.287092] RAX: 0000000000000000 RBX: ffff88001fe78900 RCX: 0000000000000000
[ 1179.287731] RDX: ffffea0000f40760 RSI: ffff88001fe789c8 RDI: ffff88001fe789c0
[ 1179.288383] RBP: ffff88003e3c7810 R08: ffffea0000f40760 R09: 0000000000000000
[ 1179.289170] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88001fe789c8
[ 1179.289959] R13: ffff88001fe789c0 R14: ffff88004ec05a80 R15: ffff88004f935b88
[ 1179.290791] FS:  00007f4e66bb5700(0000) GS:ffffffff81c29000(0000) knlGS:0000000000000000
[ 1179.291580] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1179.292209] CR2: 0000000000000000 CR3: 00000000203f8000 CR4: 00000000001406f0
[ 1179.292731] Stack:
[ 1179.293195]  ffff88001fe78900 00000000000000d0 ffff88001fe78178 ffff88003e3c7868
[ 1179.293676]  ffffffffa0272737 0000000000000001 0000000000000001 ffff88001fe78800
[ 1179.294151]  00000000614fffce ffffffff81727671 ffff88001fe78100 ffff88001fe78100
[ 1179.294623] Call Trace:
[ 1179.295092]  [<ffffffffa0272737>] filelayout_alloc_lseg+0xa7/0x2d0 [nfs_layout_nfsv41_files]
[ 1179.295625]  [<ffffffff81727671>] ? out_of_line_wait_on_bit+0x81/0xb0
[ 1179.296133]  [<ffffffffa040407e>] pnfs_layout_process+0xae/0x320 [nfsv4]
[ 1179.296632]  [<ffffffffa03e0a01>] nfs4_proc_layoutget+0x2b1/0x360 [nfsv4]
[ 1179.297134]  [<ffffffffa0402983>] pnfs_update_layout+0x853/0xb30 [nfsv4]
[ 1179.297632]  [<ffffffffa039db24>] ? nfs_get_lock_context+0x74/0x170 [nfs]
[ 1179.298158]  [<ffffffffa0271807>] filelayout_pg_init_read+0x37/0x50 [nfs_layout_nfsv41_files]
[ 1179.298834]  [<ffffffffa03a72d9>] __nfs_pageio_add_request+0x119/0x460 [nfs]
[ 1179.299385]  [<ffffffffa03a6bd7>] ? nfs_create_request.part.9+0x37/0x2e0 [nfs]
[ 1179.299872]  [<ffffffffa03a7cc3>] nfs_pageio_add_request+0xa3/0x1b0 [nfs]
[ 1179.300362]  [<ffffffffa03a8635>] readpage_async_filler+0x85/0x260 [nfs]
[ 1179.300907]  [<ffffffff81180cb1>] read_cache_pages+0x91/0xd0
[ 1179.301391]  [<ffffffffa03a85b0>] ? nfs_read_completion+0x220/0x220 [nfs]
[ 1179.301867]  [<ffffffffa03a8dc8>] nfs_readpages+0x128/0x200 [nfs]
[ 1179.302330]  [<ffffffff81180ef3>] __do_page_cache_readahead+0x203/0x280
[ 1179.302784]  [<ffffffff81180dc8>] ? __do_page_cache_readahead+0xd8/0x280
[ 1179.303413]  [<ffffffff81181116>] ondemand_readahead+0x1a6/0x2f0
[ 1179.303855]  [<ffffffff81181371>] page_cache_sync_readahead+0x31/0x50
[ 1179.304286]  [<ffffffff811750a6>] generic_file_read_iter+0x4a6/0x5c0
[ 1179.304711]  [<ffffffffa03a0316>] ? __nfs_revalidate_mapping+0x1f6/0x240 [nfs]
[ 1179.305132]  [<ffffffffa039ccf2>] nfs_file_read+0x52/0xa0 [nfs]
[ 1179.305540]  [<ffffffff811e343c>] __vfs_read+0xcc/0x100
[ 1179.305936]  [<ffffffff811e3d15>] vfs_read+0x85/0x130
[ 1179.306326]  [<ffffffff811e4a98>] SyS_read+0x58/0xd0
[ 1179.306708]  [<ffffffff8172caaf>] entry_SYSCALL_64_fastpath+0x12/0x76
[ 1179.307094] Code: c4 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 8b 07 49 89 f4 85 c0 74 47 48 8b 06 49 89 fd <48> 8b 38 48 85 ff 74 22 31 db eb 0c 48 63 d3 48 8b 3c d0 48 85
[ 1179.308357] RIP  [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.309177]  RSP <ffff88003e3c77f8>
[ 1179.309582] CR2: 0000000000000000

Signed-off-by: Kinglong Mee <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
If we didn't call ATMARP_MKIP before ATMARP_ENCAP the VCC descriptor is
non-existant and we'll end up dereferencing a NULL ptr:

[1033173.491930] kasan: GPF could be caused by NULL-ptr deref or user memory accessirq event stamp: 123386
[1033173.493678] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[1033173.493689] Modules linked in:
[1033173.493697] CPU: 9 PID: 23815 Comm: trinity-c64 Not tainted 4.2.0-next-20150911-sasha-00043-g353d875-dirty #2545
[1033173.493706] task: ffff8800630c4000 ti: ffff880063110000 task.ti: ffff880063110000
[1033173.493823] RIP: clip_ioctl (net/atm/clip.c:320 net/atm/clip.c:689)
[1033173.493826] RSP: 0018:ffff880063117a88  EFLAGS: 00010203
[1033173.493828] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 000000000000000c
[1033173.493830] RDX: 0000000000000002 RSI: ffffffffb3f10720 RDI: 0000000000000014
[1033173.493832] RBP: ffff880063117b80 R08: ffff88047574d9a4 R09: 0000000000000000
[1033173.493834] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff1000c622f53
[1033173.493836] R13: ffff8800cb905500 R14: ffff8808d6da2000 R15: 00000000fffffdfd
[1033173.493840] FS:  00007fa56b92d700(0000) GS:ffff880478000000(0000) knlGS:0000000000000000
[1033173.493843] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[1033173.493845] CR2: 0000000000000000 CR3: 00000000630e8000 CR4: 00000000000006a0
[1033173.493855] Stack:
[1033173.493862]  ffffffffb0b60444 000000000000eaea 0000000041b58ab3 ffffffffb3c3ce32
[1033173.493867]  ffffffffb0b6f3e0 ffffffffb0b60444 ffffffffb5ea2e50 1ffff1000c622f5e
[1033173.493873]  ffff8800630c4cd8 00000000000ee09a ffffffffb3ec4888 ffffffffb5ea2de8
[1033173.493874] Call Trace:
[1033173.494108] do_vcc_ioctl (net/atm/ioctl.c:170)
[1033173.494113] vcc_ioctl (net/atm/ioctl.c:189)
[1033173.494116] svc_ioctl (net/atm/svc.c:605)
[1033173.494200] sock_do_ioctl (net/socket.c:874)
[1033173.494204] sock_ioctl (net/socket.c:958)
[1033173.494244] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607)
[1033173.494290] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613)
[1033173.494295] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:186)
[1033173.494362] Code: fa 48 c1 ea 03 80 3c 02 00 0f 85 50 09 00 00 49 8b 9e 60 06 00 00 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 14 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 14 09 00
All code

========
   0:   fa                      cli
   1:   48 c1 ea 03             shr    $0x3,%rdx
   5:   80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1)
   9:   0f 85 50 09 00 00       jne    0x95f
   f:   49 8b 9e 60 06 00 00    mov    0x660(%r14),%rbx
  16:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
  1d:   fc ff df
  20:   48 8d 7b 14             lea    0x14(%rbx),%rdi
  24:   48 89 fa                mov    %rdi,%rdx
  27:   48 c1 ea 03             shr    $0x3,%rdx
  2b:*  0f b6 04 02             movzbl (%rdx,%rax,1),%eax               <-- trapping instruction
  2f:   48 89 fa                mov    %rdi,%rdx
  32:   83 e2 07                and    $0x7,%edx
  35:   38 d0                   cmp    %dl,%al
  37:   7f 08                   jg     0x41
  39:   84 c0                   test   %al,%al
  3b:   0f 85 14 09 00 00       jne    0x955

Code starting with the faulting instruction
===========================================
   0:   0f b6 04 02             movzbl (%rdx,%rax,1),%eax
   4:   48 89 fa                mov    %rdi,%rdx
   7:   83 e2 07                and    $0x7,%edx
   a:   38 d0                   cmp    %dl,%al
   c:   7f 08                   jg     0x16
   e:   84 c0                   test   %al,%al
  10:   0f 85 14 09 00 00       jne    0x92a
[1033173.494366] RIP clip_ioctl (net/atm/clip.c:320 net/atm/clip.c:689)
[1033173.494368]  RSP <ffff880063117a88>

Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
Commit b6d3096 (Input: uinput - switch to
using for_each_set_bit()) switched driver to use for_each_set_bit().
However during initial write of the uinput structure that contains min/max
data for all possible axes none of them are reflected in dev->absbit yet
and so we were skipping over all of them and were not allocating absinfo
memory which caused crash later when driver tried to sens EV_ABS events:

<1>[   15.064330] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024
<1>[   15.064336] IP: [<ffffffff8163f142>] input_handle_event+0x232/0x4e0
<4>[   15.064343] PGD 0
<4>[   15.064345] Oops: 0000 [#1] SMP

Fixes: b6d3096
Cc: [email protected]
Reported-by: Stephen Chandler Paul <[email protected]>
Tested-by: Stephen Chandler Paul <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
A deadlock trace is seen in netcp driver with lockup detector enabled.
The trace log is provided below for reference. This patch fixes the
bug by removing the usage of netcp_modules_lock within ndo_ops functions.
ndo_{open/close/ioctl)() is already called with rtnl_lock held. So there
is no need to hold another mutex for serialization across processes on
multiple cores.  So remove use of netcp_modules_lock mutex from these
ndo ops functions.

ndo_set_rx_mode() shouldn't be using a mutex as it is called from atomic
context. In the case of ndo_set_rx_mode(), there can be call to this API
without rtnl_lock held from an atomic context. As the underlying modules
are expected to add address to a hardware table, it is to be protected
across concurrent updates and hence a spin lock is used to synchronize
the access. Same with ndo_vlan_rx_add_vid() & ndo_vlan_rx_kill_vid().

Probably the netcp_modules_lock is used to protect the module not being
removed as part of rmmod. Currently this is not fully implemented and
assumes the interface is brought down before doing rmmod of modules.
The support for rmmmod while interface is up is expected in a future
patch set when additional modules such as pa, qos are added. For now
all of the tests such as if up/down, reboot, iperf works fine with this
patch applied.

Deadlock trace seen with lockup detector enabled is shown below for
reference.

[   16.863014] ======================================================
[   16.869183] [ INFO: possible circular locking dependency detected ]
[   16.875441] 4.1.6-01265-gfb1e101 #1 Tainted: G        W
[   16.881176] -------------------------------------------------------
[   16.887432] ifconfig/1662 is trying to acquire lock:
[   16.892386]  (netcp_modules_lock){+.+.+.}, at: [<c03e8110>]
netcp_ndo_open+0x168/0x518
[   16.900321]
[   16.900321] but task is already holding lock:
[   16.906144]  (rtnl_mutex){+.+.+.}, at: [<c053a418>] devinet_ioctl+0xf8/0x7e4
[   16.913206]
[   16.913206] which lock already depends on the new lock.
[   16.913206]
[   16.921372]
[   16.921372] the existing dependency chain (in reverse order) is:
[   16.928844]
-> #1 (rtnl_mutex){+.+.+.}:
[   16.932865]        [<c06023f0>] mutex_lock_nested+0x68/0x4a8
[   16.938521]        [<c04c5758>] register_netdev+0xc/0x24
[   16.943831]        [<c03e65c0>] netcp_module_probe+0x214/0x2ec
[   16.949660]        [<c03e8a54>] netcp_register_module+0xd4/0x140
[   16.955663]        [<c089654c>] keystone_gbe_init+0x10/0x28
[   16.961233]        [<c000977c>] do_one_initcall+0xb8/0x1f8
[   16.966714]        [<c0867e04>] kernel_init_freeable+0x148/0x1e8
[   16.972720]        [<c05f9994>] kernel_init+0xc/0xe8
[   16.977682]        [<c0010038>] ret_from_fork+0x14/0x3c
[   16.982905]
-> #0 (netcp_modules_lock){+.+.+.}:
[   16.987619]        [<c006eab0>] lock_acquire+0x118/0x320
[   16.992928]        [<c06023f0>] mutex_lock_nested+0x68/0x4a8
[   16.998582]        [<c03e8110>] netcp_ndo_open+0x168/0x518
[   17.004064]        [<c04c48f0>] __dev_open+0xa8/0x10c
[   17.009112]        [<c04c4b74>] __dev_change_flags+0x94/0x144
[   17.014853]        [<c04c4c3c>] dev_change_flags+0x18/0x48
[   17.020334]        [<c053a9fc>] devinet_ioctl+0x6dc/0x7e4
[   17.025729]        [<c04a59ec>] sock_ioctl+0x1d0/0x2a8
[   17.030865]        [<c0142844>] do_vfs_ioctl+0x41c/0x688
[   17.036173]        [<c0142ae4>] SyS_ioctl+0x34/0x5c
[   17.041046]        [<c000ff60>] ret_fast_syscall+0x0/0x54
[   17.046441]
[   17.046441] other info that might help us debug this:
[   17.046441]
[   17.054434]  Possible unsafe locking scenario:
[   17.054434]
[   17.060343]        CPU0                    CPU1
[   17.064862]        ----                    ----
[   17.069381]   lock(rtnl_mutex);
[   17.072522]                                lock(netcp_modules_lock);
[   17.078875]                                lock(rtnl_mutex);
[   17.084532]   lock(netcp_modules_lock);
[   17.088366]
[   17.088366]  *** DEADLOCK ***
[   17.088366]
[   17.094279] 1 lock held by ifconfig/1662:
[   17.098278]  #0:  (rtnl_mutex){+.+.+.}, at: [<c053a418>]
devinet_ioctl+0xf8/0x7e4
[   17.105774]
[   17.105774] stack backtrace:
[   17.110124] CPU: 1 PID: 1662 Comm: ifconfig Tainted: G        W
4.1.6-01265-gfb1e101 #1
[   17.118637] Hardware name: Keystone
[   17.122123] [<c00178e4>] (unwind_backtrace) from [<c0013cbc>]
(show_stack+0x10/0x14)
[   17.129862] [<c0013cbc>] (show_stack) from [<c05ff450>]
(dump_stack+0x84/0xc4)
[   17.137079] [<c05ff450>] (dump_stack) from [<c0068e34>]
(print_circular_bug+0x210/0x330)
[   17.145161] [<c0068e34>] (print_circular_bug) from [<c006ab7c>]
(validate_chain.isra.35+0xf98/0x13ac)
[   17.154372] [<c006ab7c>] (validate_chain.isra.35) from [<c006da60>]
(__lock_acquire+0x52c/0xcc0)
[   17.163149] [<c006da60>] (__lock_acquire) from [<c006eab0>]
(lock_acquire+0x118/0x320)
[   17.171058] [<c006eab0>] (lock_acquire) from [<c06023f0>]
(mutex_lock_nested+0x68/0x4a8)
[   17.179140] [<c06023f0>] (mutex_lock_nested) from [<c03e8110>]
(netcp_ndo_open+0x168/0x518)
[   17.187484] [<c03e8110>] (netcp_ndo_open) from [<c04c48f0>]
(__dev_open+0xa8/0x10c)
[   17.195133] [<c04c48f0>] (__dev_open) from [<c04c4b74>]
(__dev_change_flags+0x94/0x144)
[   17.203129] [<c04c4b74>] (__dev_change_flags) from [<c04c4c3c>]
(dev_change_flags+0x18/0x48)
[   17.211560] [<c04c4c3c>] (dev_change_flags) from [<c053a9fc>]
(devinet_ioctl+0x6dc/0x7e4)
[   17.219729] [<c053a9fc>] (devinet_ioctl) from [<c04a59ec>]
(sock_ioctl+0x1d0/0x2a8)
[   17.227378] [<c04a59ec>] (sock_ioctl) from [<c0142844>]
(do_vfs_ioctl+0x41c/0x688)
[   17.234939] [<c0142844>] (do_vfs_ioctl) from [<c0142ae4>]
(SyS_ioctl+0x34/0x5c)
[   17.242242] [<c0142ae4>] (SyS_ioctl) from [<c000ff60>]
(ret_fast_syscall+0x0/0x54)
[   17.258855] netcp-1.0 2620110.netcp eth0: Link is Up - 1Gbps/Full - flow
control off
[   17.271282] BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:616
[   17.279712] in_atomic(): 1, irqs_disabled(): 0, pid: 1662, name: ifconfig
[   17.286500] INFO: lockdep is turned off.
[   17.290413] Preemption disabled at:[<  (null)>]   (null)
[   17.295728]
[   17.297214] CPU: 1 PID: 1662 Comm: ifconfig Tainted: G        W
4.1.6-01265-gfb1e101 #1
[   17.305735] Hardware name: Keystone
[   17.309223] [<c00178e4>] (unwind_backtrace) from [<c0013cbc>]
(show_stack+0x10/0x14)
[   17.316970] [<c0013cbc>] (show_stack) from [<c05ff450>]
(dump_stack+0x84/0xc4)
[   17.324194] [<c05ff450>] (dump_stack) from [<c06023b0>]
(mutex_lock_nested+0x28/0x4a8)
[   17.332112] [<c06023b0>] (mutex_lock_nested) from [<c03e9840>]
(netcp_set_rx_mode+0x160/0x210)
[   17.340724] [<c03e9840>] (netcp_set_rx_mode) from [<c04c483c>]
(dev_set_rx_mode+0x1c/0x28)
[   17.348982] [<c04c483c>] (dev_set_rx_mode) from [<c04c490c>]
(__dev_open+0xc4/0x10c)
[   17.356724] [<c04c490c>] (__dev_open) from [<c04c4b74>]
(__dev_change_flags+0x94/0x144)
[   17.364729] [<c04c4b74>] (__dev_change_flags) from [<c04c4c3c>]
(dev_change_flags+0x18/0x48)
[   17.373166] [<c04c4c3c>] (dev_change_flags) from [<c053a9fc>]
(devinet_ioctl+0x6dc/0x7e4)
[   17.381344] [<c053a9fc>] (devinet_ioctl) from [<c04a59ec>]
(sock_ioctl+0x1d0/0x2a8)
[   17.388994] [<c04a59ec>] (sock_ioctl) from [<c0142844>]
(do_vfs_ioctl+0x41c/0x688)
[   17.396563] [<c0142844>] (do_vfs_ioctl) from [<c0142ae4>]
(SyS_ioctl+0x34/0x5c)
[   17.403873] [<c0142ae4>] (SyS_ioctl) from [<c000ff60>]
(ret_fast_syscall+0x0/0x54)
[   17.413772] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
udhcpc (v1.20.2) started
Sending discover...
[   18.690666] netcp-1.0 2620110.netcp eth0: Link is Up - 1Gbps/Full - flow
control off
Sending discover...
[   22.250972] netcp-1.0 2620110.netcp eth0: Link is Up - 1Gbps/Full - flow
control off
[   22.258721] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   22.265458] BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:616
[   22.273896] in_atomic(): 1, irqs_disabled(): 0, pid: 342, name: kworker/1:1
[   22.280854] INFO: lockdep is turned off.
[   22.284767] Preemption disabled at:[<  (null)>]   (null)
[   22.290074]
[   22.291568] CPU: 1 PID: 342 Comm: kworker/1:1 Tainted: G        W
4.1.6-01265-gfb1e101 #1
[   22.300255] Hardware name: Keystone
[   22.303750] Workqueue: ipv6_addrconf addrconf_dad_work
[   22.308895] [<c00178e4>] (unwind_backtrace) from [<c0013cbc>]
(show_stack+0x10/0x14)
[   22.316643] [<c0013cbc>] (show_stack) from [<c05ff450>]
(dump_stack+0x84/0xc4)
[   22.323867] [<c05ff450>] (dump_stack) from [<c06023b0>]
(mutex_lock_nested+0x28/0x4a8)
[   22.331786] [<c06023b0>] (mutex_lock_nested) from [<c03e9840>]
(netcp_set_rx_mode+0x160/0x210)
[   22.340394] [<c03e9840>] (netcp_set_rx_mode) from [<c04c9d18>]
(__dev_mc_add+0x54/0x68)
[   22.348401] [<c04c9d18>] (__dev_mc_add) from [<c05ab358>]
(igmp6_group_added+0x168/0x1b4)
[   22.356580] [<c05ab358>] (igmp6_group_added) from [<c05ad2cc>]
(ipv6_dev_mc_inc+0x4f0/0x5a8)
[   22.365019] [<c05ad2cc>] (ipv6_dev_mc_inc) from [<c058f0d0>]
(addrconf_dad_work+0x21c/0x33c)
[   22.373460] [<c058f0d0>] (addrconf_dad_work) from [<c0042850>]
(process_one_work+0x214/0x8d0)
[   22.381986] [<c0042850>] (process_one_work) from [<c0042f54>]
(worker_thread+0x48/0x4bc)
[   22.390071] [<c0042f54>] (worker_thread) from [<c004868c>]
(kthread+0xf0/0x108)
[   22.397381] [<c004868c>] (kthread) from [<c0010038>]

Trace related to incorrect usage of mutex inside ndo_set_rx_mode

[   24.086066] BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:616
[   24.094506] in_atomic(): 1, irqs_disabled(): 0, pid: 1682, name: ifconfig
[   24.101291] INFO: lockdep is turned off.
[   24.105203] Preemption disabled at:[<  (null)>]   (null)
[   24.110511]
[   24.112005] CPU: 2 PID: 1682 Comm: ifconfig Tainted: G        W
4.1.6-01265-gfb1e101 #1
[   24.120518] Hardware name: Keystone
[   24.124018] [<c00178e4>] (unwind_backtrace) from [<c0013cbc>]
(show_stack+0x10/0x14)
[   24.131772] [<c0013cbc>] (show_stack) from [<c05ff450>]
(dump_stack+0x84/0xc4)
[   24.138989] [<c05ff450>] (dump_stack) from [<c06023b0>]
(mutex_lock_nested+0x28/0x4a8)
[   24.146908] [<c06023b0>] (mutex_lock_nested) from [<c03e9840>]
(netcp_set_rx_mode+0x160/0x210)
[   24.155523] [<c03e9840>] (netcp_set_rx_mode) from [<c04c483c>]
(dev_set_rx_mode+0x1c/0x28)
[   24.163787] [<c04c483c>] (dev_set_rx_mode) from [<c04c490c>]
(__dev_open+0xc4/0x10c)
[   24.171531] [<c04c490c>] (__dev_open) from [<c04c4b74>]
(__dev_change_flags+0x94/0x144)
[   24.179528] [<c04c4b74>] (__dev_change_flags) from [<c04c4c3c>]
(dev_change_flags+0x18/0x48)
[   24.187966] [<c04c4c3c>] (dev_change_flags) from [<c053a9fc>]
(devinet_ioctl+0x6dc/0x7e4)
[   24.196145] [<c053a9fc>] (devinet_ioctl) from [<c04a59ec>]
(sock_ioctl+0x1d0/0x2a8)
[   24.203803] [<c04a59ec>] (sock_ioctl) from [<c0142844>]
(do_vfs_ioctl+0x41c/0x688)
[   24.211373] [<c0142844>] (do_vfs_ioctl) from [<c0142ae4>]
(SyS_ioctl+0x34/0x5c)
[   24.218676] [<c0142ae4>] (SyS_ioctl) from [<c000ff60>]
(ret_fast_syscall+0x0/0x54)
[   24.227156] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready

Signed-off-by: Murali Karicheri <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
Jonathan Liu reports that the recent addition of CPU_SW_DOMAIN_PAN
causes wpa_supplicant to die due to the following kernel oops:

Unhandled fault: page domain fault (0x81b) at 0x001017a2
pgd = ee1b8000
[001017a2] *pgd=6ebee831, *pte=6c35475f, *ppte=6c354c7f
Internal error: : 81b [#1] SMP ARM
Modules linked in: rt2800usb rt2x00usb rt2800librt2x00lib crc_ccitt mac80211
CPU: 1 PID: 202 Comm: wpa_supplicant Not tainted 4.3.0-rc2 #1
Hardware name: Allwinner sun7i (A20) Family
task: ec872f80 ti: ee364000 task.ti: ee364000
PC is at do_alignment_ldmstm+0x1d4/0x238
LR is at 0x0
pc : [<c001d1d8>]    lr : [<00000000>]    psr: 600c0113
sp : ee365e18  ip : 00000000  fp : 00000002
r10: 001017a2  r9 : 00000002  r8 : 001017aa
r7 : ee365fb0  r6 : e8820018  r5 : 001017a2  r4 : 00000003
r3 : d49e30e0  r2 : 00000000  r1 : ee365fbc  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none[   34.393106] Control: 10c5387d  Table: 6e1b806a  DAC: 00000051
Process wpa_supplicant (pid: 202, stack limit = 0xee364210)
Stack: (0xee365e18 to 0xee366000)
...
[<c001d1d8>] (do_alignment_ldmstm) from [<c001d510>] (do_alignment+0x1f0/0x904)
[<c001d510>] (do_alignment) from [<c00092a0>] (do_DataAbort+0x38/0xb4)
[<c00092a0>] (do_DataAbort) from [<c0013d7c>] (__dabt_usr+0x3c/0x40)
Exception stack(0xee365fb0 to 0xee365ff8)
5fa0:                                     00000000 56c728c0 001017a2 d49e30e0
5fc0: 775448d2 597d4e74 00200800 7a9e1625 00802001 00000021 b6deec84 00000100
5fe0: 08020200 be9f4f20 0c0b0d0a b6d9b3e0 600c0010 ffffffff
Code: e1a0a005 e1a0000c 1affffe8 e5913000 (e4ea3001)
---[ end trace 0acd3882fcfdf9dd ]---

This is caused by the alignment handler not being fixed up for the
uaccess changes, and userspace issuing an unaligned LDM instruction.
So, fix the problem by adding the necessary fixups.

Reported-by: Jonathan Liu <[email protected]>
Tested-by: Jonathan Liu <[email protected]>
Signed-off-by: Russell King <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
kvm_set_cr0 may want to call kvm_zap_gfn_range and thus access the
memslots array (SRCU protected).  Using a mini SRCU critical section
is ugly, and adding it to kvm_arch_vcpu_create doesn't work because
the VMX vcpu_create callback calls synchronize_srcu.

Fixes this lockdep splat:

===============================
[ INFO: suspicious RCU usage. ]
4.3.0-rc1+ #1 Not tainted
-------------------------------
include/linux/kvm_host.h:488 suspicious rcu_dereference_check() usage!

other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-i38/17000:
 #0:  (&(&kvm->mmu_lock)->rlock){+.+...}, at: kvm_zap_gfn_range+0x24/0x1a0 [kvm]

[...]
Call Trace:
 dump_stack+0x4e/0x84
 lockdep_rcu_suspicious+0xfd/0x130
 kvm_zap_gfn_range+0x188/0x1a0 [kvm]
 kvm_set_cr0+0xde/0x1e0 [kvm]
 init_vmcb+0x760/0xad0 [kvm_amd]
 svm_create_vcpu+0x197/0x250 [kvm_amd]
 kvm_arch_vcpu_create+0x47/0x70 [kvm]
 kvm_vm_ioctl+0x302/0x7e0 [kvm]
 ? __lock_is_held+0x51/0x70
 ? __fget+0x101/0x210
 do_vfs_ioctl+0x2f4/0x560
 ? __fget_light+0x29/0x90
 SyS_ioctl+0x4c/0x90
 entry_SYSCALL_64_fastpath+0x16/0x73

Reported-by: Borislav Petkov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
ppp_dev_uninit() locks all_ppp_mutex while under rtnl mutex protection.
ppp_create_interface() must then lock these mutexes in that same order
to avoid possible deadlock.

[  120.880011] ======================================================
[  120.880011] [ INFO: possible circular locking dependency detected ]
[  120.880011] 4.2.0 #1 Not tainted
[  120.880011] -------------------------------------------------------
[  120.880011] ppp-apitest/15827 is trying to acquire lock:
[  120.880011]  (&pn->all_ppp_mutex){+.+.+.}, at: [<ffffffffa0145f56>] ppp_dev_uninit+0x64/0xb0 [ppp_generic]
[  120.880011]
[  120.880011] but task is already holding lock:
[  120.880011]  (rtnl_mutex){+.+.+.}, at: [<ffffffff812e4255>] rtnl_lock+0x12/0x14
[  120.880011]
[  120.880011] which lock already depends on the new lock.
[  120.880011]
[  120.880011]
[  120.880011] the existing dependency chain (in reverse order) is:
[  120.880011]
[  120.880011] -> #1 (rtnl_mutex){+.+.+.}:
[  120.880011]        [<ffffffff81073a6f>] lock_acquire+0xcf/0x10e
[  120.880011]        [<ffffffff813ab18a>] mutex_lock_nested+0x56/0x341
[  120.880011]        [<ffffffff812e4255>] rtnl_lock+0x12/0x14
[  120.880011]        [<ffffffff812d9d94>] register_netdev+0x11/0x27
[  120.880011]        [<ffffffffa0147b17>] ppp_ioctl+0x289/0xc98 [ppp_generic]
[  120.880011]        [<ffffffff8113b367>] do_vfs_ioctl+0x4ea/0x532
[  120.880011]        [<ffffffff8113b3fd>] SyS_ioctl+0x4e/0x7d
[  120.880011]        [<ffffffff813ad7d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
[  120.880011]
[  120.880011] -> #0 (&pn->all_ppp_mutex){+.+.+.}:
[  120.880011]        [<ffffffff8107334e>] __lock_acquire+0xb07/0xe76
[  120.880011]        [<ffffffff81073a6f>] lock_acquire+0xcf/0x10e
[  120.880011]        [<ffffffff813ab18a>] mutex_lock_nested+0x56/0x341
[  120.880011]        [<ffffffffa0145f56>] ppp_dev_uninit+0x64/0xb0 [ppp_generic]
[  120.880011]        [<ffffffff812d5263>] rollback_registered_many+0x19e/0x252
[  120.880011]        [<ffffffff812d5381>] rollback_registered+0x29/0x38
[  120.880011]        [<ffffffff812d53fa>] unregister_netdevice_queue+0x6a/0x77
[  120.880011]        [<ffffffffa0146a94>] ppp_release+0x42/0x79 [ppp_generic]
[  120.880011]        [<ffffffff8112d9f6>] __fput+0xec/0x192
[  120.880011]        [<ffffffff8112dacc>] ____fput+0x9/0xb
[  120.880011]        [<ffffffff8105447a>] task_work_run+0x66/0x80
[  120.880011]        [<ffffffff81001801>] prepare_exit_to_usermode+0x8c/0xa7
[  120.880011]        [<ffffffff81001900>] syscall_return_slowpath+0xe4/0x104
[  120.880011]        [<ffffffff813ad931>] int_ret_from_sys_call+0x25/0x9f
[  120.880011]
[  120.880011] other info that might help us debug this:
[  120.880011]
[  120.880011]  Possible unsafe locking scenario:
[  120.880011]
[  120.880011]        CPU0                    CPU1
[  120.880011]        ----                    ----
[  120.880011]   lock(rtnl_mutex);
[  120.880011]                                lock(&pn->all_ppp_mutex);
[  120.880011]                                lock(rtnl_mutex);
[  120.880011]   lock(&pn->all_ppp_mutex);
[  120.880011]
[  120.880011]  *** DEADLOCK ***

Fixes: 8cb775b ("ppp: fix device unregistration upon netns deletion")
Reported-by: Sedat Dilek <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
Andrey reported a panic:

[ 7249.865507] BUG: unable to handle kernel pointer dereference at 000000b4
[ 7249.865559] IP: [<c16afeca>] icmp_route_lookup+0xaa/0x320
[ 7249.865598] *pdpt = 0000000030f7f001 *pde = 0000000000000000
[ 7249.865637] Oops: 0000 [#1]
...
[ 7249.866811] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
4.3.0-999-generic #201509220155
[ 7249.866876] Hardware name: MSI MS-7250/MS-7250, BIOS 080014  08/02/2006
[ 7249.866916] task: c1a5ab00 ti: c1a52000 task.ti: c1a52000
[ 7249.866949] EIP: 0060:[<c16afeca>] EFLAGS: 00210246 CPU: 0
[ 7249.866981] EIP is at icmp_route_lookup+0xaa/0x320
[ 7249.867012] EAX: 00000000 EBX: f483ba48 ECX: 00000000 EDX: f2e18a00
[ 7249.867045] ESI: 000000c0 EDI: f483ba70 EBP: f483b9ec ESP: f483b974
[ 7249.867077]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 7249.867108] CR0: 8005003b CR2: 000000b4 CR3: 36ee07c0 CR4: 000006f0
[ 7249.867141] Stack:
[ 7249.867165]  320310ee 00000000 00000042 320310ee 00000000 c1aeca00
f3920240 f0c69180
[ 7249.867268]  f483ba04 f855058b a89b66cd f483ba44 f8962f4b 00000000
e659266c f483ba54
[ 7249.867361]  8004753c f483ba5c f8962f4b f2031140 000003c1 ffbd8fa0
c16b0e00 00000064
[ 7249.867448] Call Trace:
[ 7249.867494]  [<f855058b>] ? e1000_xmit_frame+0x87b/0xdc0 [e1000e]
[ 7249.867534]  [<f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack]
[ 7249.867576]  [<f8962f4b>] ? tcp_in_window+0xeb/0xb10 [nf_conntrack]
[ 7249.867615]  [<c16b0e00>] ? icmp_send+0xa0/0x380
[ 7249.867648]  [<c16b102f>] icmp_send+0x2cf/0x380
[ 7249.867681]  [<f89c8126>] nf_send_unreach+0xa6/0xc0 [nf_reject_ipv4]
[ 7249.867714]  [<f89cd0da>] reject_tg+0x7a/0x9f [ipt_REJECT]
[ 7249.867746]  [<f88c29a7>] ipt_do_table+0x317/0x70c [ip_tables]
[ 7249.867780]  [<f895e0a6>] ? __nf_conntrack_find_get+0x166/0x3b0
[nf_conntrack]
[ 7249.867838]  [<f895eea8>] ? nf_conntrack_in+0x398/0x600 [nf_conntrack]
[ 7249.867889]  [<f84c0035>] iptable_filter_hook+0x35/0x80 [iptable_filter]
[ 7249.867933]  [<c16776a1>] nf_iterate+0x71/0x80
[ 7249.867970]  [<c1677715>] nf_hook_slow+0x65/0xc0
[ 7249.868002]  [<c1681811>] __ip_local_out_sk+0xc1/0xd0
[ 7249.868034]  [<c1680f30>] ? ip_forward_options+0x1a0/0x1a0
[ 7249.868066]  [<c1681836>] ip_local_out_sk+0x16/0x30
[ 7249.868097]  [<c1684054>] ip_send_skb+0x14/0x80
[ 7249.868129]  [<c16840f4>] ip_push_pending_frames+0x34/0x40
[ 7249.868163]  [<c16844a2>] ip_send_unicast_reply+0x282/0x310
[ 7249.868196]  [<c16a0863>] tcp_v4_send_reset+0x1b3/0x380
[ 7249.868227]  [<c16a1b63>] tcp_v4_rcv+0x323/0x990
[ 7249.868257]  [<c16776a1>] ? nf_iterate+0x71/0x80
[ 7249.868289]  [<c167dc2b>] ip_local_deliver_finish+0x8b/0x230
[ 7249.868322]  [<c167df4c>] ip_local_deliver+0x4c/0xa0
[ 7249.868353]  [<c167dba0>] ? ip_rcv_finish+0x390/0x390
[ 7249.868384]  [<c167d88c>] ip_rcv_finish+0x7c/0x390
[ 7249.868415]  [<c167e280>] ip_rcv+0x2e0/0x420
...

Prior to the VRF change the oif was not set in the flow struct, so the
VRF support should really have only added the vrf_master_ifindex lookup.

Fixes: 613d09b ("net: Use VRF device index for lookups on TX")
Cc: Andrey Melnikov <[email protected]>
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
it was broken by 35d5d20
"mtd: mxc_nand: cleanup copy_spare function"

else we get the following error :
[   22.709507] ubi0: attaching mtd3
[   23.613470] ubi0: scanning is finished
[   23.617278] ubi0: empty MTD device detected
[   23.623219] Unhandled fault: imprecise external abort (0x1c06) at 0x9e62f0ec
[   23.630291] pgd = 9df80000
[   23.633005] [9e62f0ec] *pgd=8e60041e(bad)
[   23.637064] Internal error: : 1c06 [#1] SMP ARM
[   23.641605] Modules linked in:
[   23.644687] CPU: 0 PID: 99 Comm: ubiattach Not tainted 4.2.0-dirty #22
[   23.651222] Hardware name: Freescale i.MX53 (Device Tree Support)
[   23.657322] task: 9e687300 ti: 9dcfc000 task.ti: 9dcfc000
[   23.662744] PC is at memcpy16_toio+0x4c/0x74
[   23.667026] LR is at mxc_nand_command+0x484/0x640
[   23.671739] pc : [<803f9c08>]    lr : [<803faeb0>]    psr: 60000013
[   23.671739] sp : 9dcfdb10  ip : 9e62f0ea  fp : 9dcfdb1c
[   23.683222] r10: a09c1000  r9 : 0000001a  r8 : ffffffff
[   23.688453] r7 : ffffffff  r6 : 9e674810  r5 : 9e674810  r4 : 000000b6
[   23.694985] r3 : a09c16a4  r2 : a09c16a4  r1 : a09c16a4  r0 : 0000ffff
[   23.701521] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   23.708662] Control: 10c5387d  Table: 8df80019  DAC: 00000015
[   23.714413] Process ubiattach (pid: 99, stack limit = 0x9dcfc210)
[   23.720514] Stack: (0x9dcfdb10 to 0x9dcfe000)
[   23.724881] db00:                                     9dcfdb6c 9dcfdb20 803faeb0 803f9bc8
[   23.733069] db20: 803f227c 803f9b74 ffffffff 9e674810 9e674810 9e674810 00000040 9e62f010
[   23.741255] db40: 803faa2c 9e674b40 9e674810 803faa2c 00000400 803faa2c 00000000 9df42800
[   23.749441] db60: 9dcfdb9c 9dcfdb70 803f2024 803faa38 9e4201cc 00000000 803f0a78 9e674b40
[   23.757627] db80: 803f1f80 9e674810 00000400 00000400 9dcfdc14 9dcfdba0 803f3bd8 803f1f8c
[   23.765814] dba0: 9e4201cc 00000000 00000580 00000000 00000000 800718c0 0000007f 00001000
[   23.774000] dbc0: 9df42800 000000e0 00000000 00000000 9e4201cc 00000000 00000000 00000000
[   23.782186] dbe0: 00000580 00000580 00000000 9e674810 9dcfdc20 9dcfdce8 9df42800 00580000
[   23.790372] dc00: 00000000 00000400 9dcfdc6c 9dcfdc18 803f3f94 803f39a4 9dcfdc20 00000000
[   23.798558] dc20: 00000000 00000400 00000000 00000000 00000000 00000000 9df42800 00000000
[   23.806744] dc40: 9dcfdd0c 00580000 00000000 00000400 00000000 9df42800 9dee1000 9d802000
[   23.814930] dc60: 9dcfdc94 9dcfdc70 803eb63c 803f3f38 00000400 9dcfdce8 9df42800 dead4ead
[   23.823116] dc80: 803eb5f4 00000000 9dcfdcc4 9dcfdc98 803e82ac 803eb600 00000400 9dcfdce8
[   23.831301] dca0: 9df42800 00000400 9dee0000 00000000 00000400 00000000 9dcfdd1c 9dcfdcc8
[   23.839488] dcc0: 80406048 803e8230 00000400 9dcfdce8 9df42800 9dcfdc78 00000008 00000000
[   23.847673] dce0: 00000000 00000000 00000000 00000004 00000000 9df42800 9dee0000 00000000
[   23.855859] dd00: 9d802030 00000000 9dc8b214 9d802000 9dcfdd44 9dcfdd20 804066cc 80405f50
[   23.864047] dd20: 00000400 9dc8b200 9d802030 9df42800 9dee0000 9dc8b200 9dcfdd84 9dcfdd48
[   23.872233] dd40: 8040a544 804065ac 9e401c80 000080d0 9dcfdd84 00000001 800fc828 9df42400
[   23.880418] dd60: 00000000 00000080 9dc8b200 9dc8b200 9dc8b200 9dee0000 9dcfdddc 9dcfdd8
[   23.888605] dd80: 803fb560 8040a440 9dcfddc4 9dcfdd98 800f1428 9dee1000 a0acf000 00000000
[   23.896792] dda0: 00000000 ffffffff 00000006 00000000 9dee0000 9dee0000 00005600 00000080
[   23.904979] ddc0: 9dc8b200 a0acf000 9dc8b200 8112514c 9dcfde24 9dcfdde0 803fc08c 803fb4f0
[   23.913165] dde0: 9e401c80 00000013 9dcfde04 9dcfddf8 8006bbf8 8006ba00 9dcfde24 00000000
[   23.921351] de00: 9dee0000 00000065 9dee0000 00000001 9dc8b200 8112514c 9dcfde84 9dcfde28
[   23.929538] de20: 8040afa0 803fb948 ffffffff 00000000 9dc8b214 9dcfde40 800f1428 800f11dc
[   23.937724] de40: 9dc8b21c 9dc8b20c 9dc8b204 9dee1000 9dc8b214 8069bb60 fffff000 fffff000
[   23.945911] de60: 9e7b5400 00000000 9dee0000 9dee1000 00001000 9e7b5400 9dcfdecc 9dcfde88
[   23.954097] de80: 803ff1bc 8040a630 9dcfdea4 9dcfde98 00000800 00000800 9dcfdecc 9dcfdea8
[   23.962284] dea0: 803e8f6c 00000000 7e87ab70 9e7b5400 80113e30 00000003 9dcfc000 00000000
[   23.970470] dec0: 9dcfdf04 9dcfded0 804008cc 803feb98 ffffffff 00000003 00000000 00000000
[   23.978656] dee0: 00000000 00000000 9e7cb000 9dc193e0 7e87ab70 9dd92140 9dcfdf7c 9dcfdf08
[   23.986842] df00: 80113b5c 8040080c 800fbed8 8006bbf0 9e7cb000 00000003 9e7cb000 9dd92140
[   23.995029] df20: 9dc193e0 9dd92148 9dcfdf4c 9dcfdf38 8011022c 800fbe78 8000f9cc 9e687300
[   24.003216] df40: 9dcfdf6c 9dcfdf50 8011f798 8007ffe8 7e87ab70 9dd92140 00000003 9dd92140
[   24.011402] df60: 40186f40 7e87ab70 9dcfc000 00000000 9dcfdfa4 9dcfdf80 80113e30 8011373c
[   24.019588] df80: 7e87ab70 7e87ab70 7e87aea9 00000036 8000fb8 9dcfc000 00000000 9dcfdfa8
[   24.027775] dfa0: 8000f9a0 80113e00 7e87ab70 7e87ab70 00000003 40186f40 7e87ab70 00000000
[   24.035962] dfc0: 7e87ab70 7e87ab70 7e87aea9 00000036 00000000 00000000 76fd1f70 00000000
[   24.044148] dfe0: 76f80f8c 7e87ab28 00009810 76f80fc4 60000010 00000003 00000000 00000000
[   24.052328] Backtrace:
[   24.054806] [<803f9bbc>] (memcpy16_toio) from [<803faeb0>] (mxc_nand_command+0x484/0x640)
[   24.062996] [<803faa2c>] (mxc_nand_command) from [<803f2024>] (nand_write_page+0xa4/0x154)
[   24.071264]  r10:9df42800 r9:00000000 r8:803faa2c r7:00000400 r6:803faa2c r5:9e674810
[   24.079180]  r4:9e674b40
[   24.081738] [<803f1f80>] (nand_write_page) from [<803f3bd8>] (nand_do_write_ops+0x240/0x444)
[   24.090180]  r8:00000400 r7:00000400 r6:9e674810 r5:803f1f80 r4:9e674b40
[   24.096970] [<803f3998>] (nand_do_write_ops) from [<803f3f94>] (nand_write+0x68/0x88)
[   24.104804]  r10:00000400 r9:00000000 r8:00580000 r7:9df42800 r6:9dcfdce8 r5:9dcfdc20
[   24.112719]  r4:9e674810
[   24.115287] [<803f3f2c>] (nand_write) from [<803eb63c>] (part_write+0x48/0x50)
[   24.122514]  r10:9d802000 r9:9dee1000 r8:9df42800 r7:00000000 r6:00000400 r5:00000000
[   24.130429]  r4:00580000
[   24.132989] [<803eb5f4>] (part_write) from [<803e82ac>] (mtd_write+0x88/0xa0)
[   24.140129]  r5:00000000 r4:803eb5f4
[   24.143748] [<803e8224>] (mtd_write) from [<80406048>] (ubi_io_write+0x104/0x65c)
[   24.151235]  r7:00000000 r6:00000400 r5:00000000 r4:9dee0000
[   24.156968] [<80405f44>] (ubi_io_write) from [<804066cc>] (ubi_io_write_ec_hdr+0x12c/0x190)
[   24.165323]  r10:9d802000 r9:9dc8b214 r8:00000000 r7:9d802030 r6:00000000 r5:9dee0000
[   24.173239]  r4:9df42800
[   24.175798] [<804065a0>] (ubi_io_write_ec_hdr) from [<8040a544>] (ubi_early_get_peb+0x110/0x1f0)
[   24.184587]  r6:9dc8b200 r5:9dee0000 r4:9df42800
[   24.189262] [<8040a434>] (ubi_early_get_peb) from [<803fb560>] (create_vtbl+0x7c/0x238)
[   24.197271]  r10:9dee0000 r9:9dc8b200 r8:9dc8b200 r7:9dc8b200 r6:00000080 r5:00000000
[   24.205187]  r4:9df42400
[   24.207746] [<803fb4e4>] (create_vtbl) from [<803fc08c>] (ubi_read_volume_table+0x750/0xa64)
[   24.216187]  r10:8112514c r9:9dc8b200 r8:a0acf000 r7:9dc8b200 r6:00000080 r5:00005600
[   24.224103]  r4:9dee0000
[   24.226662] [<803fb93c>] (ubi_read_volume_table) from [<8040afa0>] (ubi_attach+0x97c/0x152c)
[   24.235103]  r10:8112514c r9:9dc8b200 r8:00000001 r7:9dee0000 r6:00000065 r5:9dee0000
[   24.243018]  r4:00000000
[   24.245579] [<8040a624>] (ubi_attach) from [<803ff1bc>] (ubi_attach_mtd_dev+0x630/0xbac)
[   24.253673]  r10:9e7b5400 r9:00001000 r8:9dee1000 r7:9dee0000 r6:00000000 r5:9e7b5400
[   24.261588]  r4:fffff000
[   24.264148] [<803feb8c>] (ubi_attach_mtd_dev) from [<804008cc>] (ctrl_cdev_ioctl+0xcc/0x1cc)
[   24.272589]  r10:00000000 r9:9dcfc000 r8:00000003 r7:80113e30 r6:9e7b5400 r5:7e87ab70
[   24.280505]  r4:00000000
[   24.283070] [<80400800>] (ctrl_cdev_ioctl) from [<80113b5c>] (do_vfs_ioctl+0x42c/0x6c4)
[   24.291077]  r6:9dd92140 r5:7e87ab70 r4:9dc193e0
[   24.295753] [<80113730>] (do_vfs_ioctl) from [<80113e30>] (SyS_ioctl+0x3c/0x64)
[   24.303066]  r10:00000000 r9:9dcfc000 r8:7e87ab70 r7:40186f40 r6:9dd92140 r5:00000003
[   24.310981]  r4:9dd92140
[   24.313549] [<80113df4>] (SyS_ioctl) from [<8000f9a0>] (ret_fast_syscall+0x0/0x54)
[   24.321123]  r9:9dcfc000 r8:8000fb84 r7:00000036 r6:7e87aea9 r5:7e87ab70 r4:7e87ab70
[   24.328957] Code: e1c300b0 e1510002 e1a03001 1afffff9 (e89da800)
[   24.335066] ---[ end trace ab1cb17887f21bbb ]---
[   24.340249] Unhandled fault: imprecise external abort (0x1c06) at 0x7ee8bcf0
[   24.347310] pgd = 9df3c000
[   24.350023] [7ee8bcf0] *pgd=8dcbf831, *pte=8eb3334f, *ppte=8eb3383f
Segmentation fault

Fixes: 35d5d20 ("mtd: mxc_nand: cleanup copy_spare function")
Signed-off-by: Eric Bénard <[email protected]>
Reviewed-by: Uwe Kleine-König <[email protected]>
Reviewed-by: Baruch Siach <[email protected]>
Cc: <[email protected]>
Signed-off-by: Brian Norris <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
…t initialized.

In case something goes wrong with power well initialization we were calling
intel_prepare_ddi during boot while encoder list isnt't initilized.

[    9.618747] i915 0000:00:02.0: Invalid ROM contents
[    9.631446] [drm] failed to find VBIOS tables
[    9.720036] BUG: unable to handle kernel NULL pointer dereference at 00000000
00000058
[    9.721986] IP: [<ffffffffa014eb72>] ddi_get_encoder_port+0x82/0x190 [i915]
[    9.723736] PGD 0
[    9.724286] Oops: 0000 [#1] PREEMPT SMP
[    9.725386] Modules linked in: intel_powerclamp snd_hda_intel(+) coretemp crc
32c_intel snd_hda_codec snd_hda_core serio_raw snd_pcm snd_timer i915(+) parport
_pc parport pinctrl_sunrisepoint pinctrl_intel nfsd nfs_acl
[    9.730635] CPU: 0 PID: 497 Comm: systemd-udevd Not tainted 4.3.0-rc2-eywa-10
967-g72de2cfd-dirty #2
[    9.732785] Hardware name: Intel Corporation Cannonlake Client platform/Skyla
ke DT DDR4 RVP8, BIOS CNLSE2R1.R00.X021.B00.1508040310 08/04/2015
[    9.735785] task: ffff88008a704700 ti: ffff88016a1ac000 task.ti: ffff88016a1a
c000
[    9.737584] RIP: 0010:[<ffffffffa014eb72>]  [<ffffffffa014eb72>] ddi_get_enco
der_port+0x82/0x190 [i915]
[    9.739934] RSP: 0000:ffff88016a1af710  EFLAGS: 00010296
[    9.741184] RAX: 000000000000004e RBX: ffff88008a9edc98 RCX: 0000000000000001
[    9.742934] RDX: 000000000000004e RSI: ffffffff81fc1e82 RDI: 00000000ffffffff
[    9.744634] RBP: ffff88016a1af730 R08: 0000000000000000 R09: 0000000000000578
[    9.746333] R10: 0000000000001065 R11: 0000000000000578 R12: fffffffffffffff8
[    9.748033] R13: ffff88016a1af7a8 R14: ffff88016a1af794 R15: 0000000000000000
[    9.749733] FS:  00007eff2e1e07c0(0000) GS:ffff88016fc00000(0000) knlGS:00000
00000000000
[    9.751683] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.753083] CR2: 0000000000000058 CR3: 000000016922b000 CR4: 00000000003406f0
[    9.754782] Stack:
[    9.755332]  ffff88008a9edc98 ffff88008a9ed800 ffffffffa01d07b0 00000000fffb9
09e
[    9.757232]  ffff88016a1af7d8 ffffffffa0154ea7 0000000000000246 ffff88016a370
080
[    9.759182]  ffff88016a370080 ffff88008a9ed800 0000000000000246 ffff88008a9ed
c98
[    9.761132] Call Trace:
[    9.761782]  [<ffffffffa0154ea7>] intel_prepare_ddi+0x67/0x860 [i915]
[    9.763332]  [<ffffffff81a56996>] ? _raw_spin_unlock_irqrestore+0x26/0x40
[    9.765031]  [<ffffffffa00fad01>] ? gen9_read32+0x141/0x360 [i915]
[    9.766531]  [<ffffffffa00b43e1>] skl_set_power_well+0x431/0xa80 [i915]
[    9.768181]  [<ffffffffa00b4a63>] skl_power_well_enable+0x13/0x20 [i915]
[    9.769781]  [<ffffffffa00b2188>] intel_power_well_enable+0x28/0x50 [i915]
[    9.771481]  [<ffffffffa00b4d52>] intel_display_power_get+0x92/0xc0 [i915]
[    9.773180]  [<ffffffffa00b4fcb>] intel_display_set_init_power+0x3b/0x40 [i91
5]
[    9.774980]  [<ffffffffa00b5170>] intel_power_domains_init_hw+0x120/0x520 [i9
15]
[    9.776780]  [<ffffffffa0194c61>] i915_driver_load+0xb21/0xf40 [i915]

So let's protect this case.

My first attempt was to remove the intel_prepare_ddi, but Daniel had pointed out
this is really needed to restore those registers values. And Imre pointed out
that this case was without the flag protection and this was actually where things
were going bad. So I've just checked and this indeed solves my issue.

The regressing intel_prepare_ddi call was added in

commit 1d2b952
Author: Damien Lespiau <[email protected]>
Date:   Fri Mar 6 18:50:53 2015 +0000

    drm/i915/skl: Restore the DDI translation tables when enabling PW1

Cc: Imre Deak <[email protected]>
Cc: Daniel Vetter <[email protected]>
Signed-off-by: Rodrigo Vivi <[email protected]>
Reviewed-by: Imre Deak <[email protected]>
[Jani: regression reference]
Signed-off-by: Jani Nikula <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
…ut event

A case can occur when sctp_accept() is called by the user during
a heartbeat timeout event after the 4-way handshake.  Since
sctp_assoc_migrate() changes both assoc->base.sk and assoc->ep, the
bh_sock_lock in sctp_generate_heartbeat_event() will be taken with
the listening socket but released with the new association socket.
The result is a deadlock on any future attempts to take the listening
socket lock.

Note that this race can occur with other SCTP timeouts that take
the bh_lock_sock() in the event sctp_accept() is called.

 BUG: soft lockup - CPU#9 stuck for 67s! [swapper:0]
 ...
 RIP: 0010:[<ffffffff8152d48e>]  [<ffffffff8152d48e>] _spin_lock+0x1e/0x30
 RSP: 0018:ffff880028323b20  EFLAGS: 00000206
 RAX: 0000000000000002 RBX: ffff880028323b20 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff880028323be0 RDI: ffff8804632c4b48
 RBP: ffffffff8100bb93 R08: 0000000000000000 R09: 0000000000000000
 R10: ffff880610662280 R11: 0000000000000100 R12: ffff880028323aa0
 R13: ffff8804383c3880 R14: ffff880028323a90 R15: ffffffff81534225
 FS:  0000000000000000(0000) GS:ffff880028320000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
 CR2: 00000000006df528 CR3: 0000000001a85000 CR4: 00000000000006e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process swapper (pid: 0, threadinfo ffff880616b70000, task ffff880616b6cab0)
 Stack:
 ffff880028323c40 ffffffffa01c2582 ffff880614cfb020 0000000000000000
 <d> 0100000000000000 00000014383a6c44 ffff8804383c3880 ffff880614e93c00
 <d> ffff880614e93c00 0000000000000000 ffff8804632c4b00 ffff8804383c38b8
 Call Trace:
 <IRQ>
 [<ffffffffa01c2582>] ? sctp_rcv+0x492/0xa10 [sctp]
 [<ffffffff8148c559>] ? nf_iterate+0x69/0xb0
 [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8148c716>] ? nf_hook_slow+0x76/0x120
 [<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8149757d>] ? ip_local_deliver_finish+0xdd/0x2d0
 [<ffffffff81497808>] ? ip_local_deliver+0x98/0xa0
 [<ffffffff81496ccd>] ? ip_rcv_finish+0x12d/0x440
 [<ffffffff81497255>] ? ip_rcv+0x275/0x350
 [<ffffffff8145cfeb>] ? __netif_receive_skb+0x4ab/0x750
 ...

With lockdep debugging:

 =====================================
 [ BUG: bad unlock balance detected! ]
 -------------------------------------
 CslRx/12087 is trying to release lock (slock-AF_INET) at:
 [<ffffffffa01bcae0>] sctp_generate_timeout_event+0x40/0xe0 [sctp]
 but there are no more locks to release!

 other info that might help us debug this:
 2 locks held by CslRx/12087:
 #0:  (&asoc->timers[i]){+.-...}, at: [<ffffffff8108ce1f>] run_timer_softirq+0x16f/0x3e0
 #1:  (slock-AF_INET){+.-...}, at: [<ffffffffa01bcac3>] sctp_generate_timeout_event+0x23/0xe0 [sctp]

Ensure the socket taken is also the same one that is released by
saving a copy of the socket before entering the timeout event
critical section.

Signed-off-by: Karl Heiss <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
Fixes the following lockdep splat:
[    1.244527] =============================================
[    1.245193] [ INFO: possible recursive locking detected ]
[    1.245193] 4.2.0-rc1+ #37 Not tainted
[    1.245193] ---------------------------------------------
[    1.245193] cp/742 is trying to acquire lock:
[    1.245193]  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff812b3f69>] ubifs_init_security+0x29/0xb0
[    1.245193]
[    1.245193] but task is already holding lock:
[    1.245193]  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff81198e7f>] path_openat+0x3af/0x1280
[    1.245193]
[    1.245193] other info that might help us debug this:
[    1.245193]  Possible unsafe locking scenario:
[    1.245193]
[    1.245193]        CPU0
[    1.245193]        ----
[    1.245193]   lock(&sb->s_type->i_mutex_key#9);
[    1.245193]   lock(&sb->s_type->i_mutex_key#9);
[    1.245193]
[    1.245193]  *** DEADLOCK ***
[    1.245193]
[    1.245193]  May be due to missing lock nesting notation
[    1.245193]
[    1.245193] 2 locks held by cp/742:
[    1.245193]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811ad37f>] mnt_want_write+0x1f/0x50
[    1.245193]  #1:  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff81198e7f>] path_openat+0x3af/0x1280
[    1.245193]
[    1.245193] stack backtrace:
[    1.245193] CPU: 2 PID: 742 Comm: cp Not tainted 4.2.0-rc1+ #37
[    1.245193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140816_022509-build35 04/01/2014
[    1.245193]  ffffffff8252d530 ffff88007b023a38 ffffffff814f6f49 ffffffff810b56c5
[    1.245193]  ffff88007c30cc80 ffff88007b023af8 ffffffff810a150d ffff88007b023a68
[    1.245193]  000000008101302a ffff880000000000 00000008f447e23f ffffffff8252d500
[    1.245193] Call Trace:
[    1.245193]  [<ffffffff814f6f49>] dump_stack+0x4c/0x65
[    1.245193]  [<ffffffff810b56c5>] ? console_unlock+0x1c5/0x510
[    1.245193]  [<ffffffff810a150d>] __lock_acquire+0x1a6d/0x1ea0
[    1.245193]  [<ffffffff8109fa78>] ? __lock_is_held+0x58/0x80
[    1.245193]  [<ffffffff810a1a93>] lock_acquire+0xd3/0x270
[    1.245193]  [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0
[    1.245193]  [<ffffffff814fc83b>] mutex_lock_nested+0x6b/0x3a0
[    1.245193]  [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0
[    1.245193]  [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0
[    1.245193]  [<ffffffff812b3f69>] ubifs_init_security+0x29/0xb0
[    1.245193]  [<ffffffff8128e286>] ubifs_create+0xa6/0x1f0
[    1.245193]  [<ffffffff81198e7f>] ? path_openat+0x3af/0x1280
[    1.245193]  [<ffffffff81195d15>] vfs_create+0x95/0xc0
[    1.245193]  [<ffffffff8119929c>] path_openat+0x7cc/0x1280
[    1.245193]  [<ffffffff8109ffe3>] ? __lock_acquire+0x543/0x1ea0
[    1.245193]  [<ffffffff81088f20>] ? sched_clock_cpu+0x90/0xc0
[    1.245193]  [<ffffffff81088c00>] ? calc_global_load_tick+0x60/0x90
[    1.245193]  [<ffffffff81088f20>] ? sched_clock_cpu+0x90/0xc0
[    1.245193]  [<ffffffff811a9cef>] ? __alloc_fd+0xaf/0x180
[    1.245193]  [<ffffffff8119ac55>] do_filp_open+0x75/0xd0
[    1.245193]  [<ffffffff814ffd86>] ? _raw_spin_unlock+0x26/0x40
[    1.245193]  [<ffffffff811a9cef>] ? __alloc_fd+0xaf/0x180
[    1.245193]  [<ffffffff81189bd9>] do_sys_open+0x129/0x200
[    1.245193]  [<ffffffff81189cc9>] SyS_open+0x19/0x20
[    1.245193]  [<ffffffff81500717>] entry_SYSCALL_64_fastpath+0x12/0x6f

While the lockdep splat is a false positive, becuase path_openat holds i_mutex
of the parent directory and ubifs_init_security() tries to acquire i_mutex
of a new inode, it reveals that taking i_mutex in ubifs_init_security() is
in vain because it is only being called in the inode allocation path
and therefore nobody else can see the inode yet.

Cc: [email protected] # 3.20-
Reported-and-tested-by: Boris Brezillon <[email protected]>
Reviewed-and-tested-by: Dongsheng Yang <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
Signed-off-by: [email protected]
shenki pushed a commit that referenced this pull request Nov 13, 2015
My system keeps crashing with below message. vmstat_update() schedules a delayed
work in current cpu and expects the work runs in the cpu.
schedule_delayed_work() is expected to make delayed work run in local cpu. The
problem is timer can be migrated with NO_HZ. __queue_work() queues work in
timer handler, which could run in a different cpu other than where the delayed
work is scheduled. The end result is the delayed work runs in different cpu.
The patch makes __queue_delayed_work records local cpu earlier. Where the timer
runs doesn't change where the work runs with the change.

[   28.010131] ------------[ cut here ]------------
[   28.010609] kernel BUG at ../mm/vmstat.c:1392!
[   28.011099] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[   28.011860] Modules linked in:
[   28.012245] CPU: 0 PID: 289 Comm: kworker/0:3 Tainted: G        W4.3.0-rc3+ torvalds#634
[   28.013065] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153802- 04/01/2014
[   28.014160] Workqueue: events vmstat_update
[   28.014571] task: ffff880117682580 ti: ffff8800ba428000 task.ti: ffff8800ba428000
[   28.015445] RIP: 0010:[<ffffffff8115f921>]  [<ffffffff8115f921>]vmstat_update+0x31/0x80
[   28.016282] RSP: 0018:ffff8800ba42fd80  EFLAGS: 00010297
[   28.016812] RAX: 0000000000000000 RBX: ffff88011a858dc0 RCX:0000000000000000
[   28.017585] RDX: ffff880117682580 RSI: ffffffff81f14d8c RDI:ffffffff81f4df8d
[   28.018366] RBP: ffff8800ba42fd90 R08: 0000000000000001 R09:0000000000000000
[   28.019169] R10: 0000000000000000 R11: 0000000000000121 R12:ffff8800baa9f640
[   28.019947] R13: ffff88011a81e340 R14: ffff88011a823700 R15:0000000000000000
[   28.020071] FS:  0000000000000000(0000) GS:ffff88011a800000(0000)knlGS:0000000000000000
[   28.020071] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   28.020071] CR2: 00007ff6144b01d0 CR3: 00000000b8e93000 CR4:00000000000006f0
[   28.020071] Stack:
[   28.020071]  ffff88011a858dc0 ffff8800baa9f640 ffff8800ba42fe00ffffffff8106bd88
[   28.020071]  ffffffff8106bd0b 0000000000000096 0000000000000000ffffffff82f9b1e8
[   28.020071]  ffffffff829f0b10 0000000000000000 ffffffff81f18460ffff88011a81e340
[   28.020071] Call Trace:
[   28.020071]  [<ffffffff8106bd88>] process_one_work+0x1c8/0x540
[   28.020071]  [<ffffffff8106bd0b>] ? process_one_work+0x14b/0x540
[   28.020071]  [<ffffffff8106c214>] worker_thread+0x114/0x460
[   28.020071]  [<ffffffff8106c100>] ? process_one_work+0x540/0x540
[   28.020071]  [<ffffffff81071bf8>] kthread+0xf8/0x110
[   28.020071]  [<ffffffff81071b00>] ?kthread_create_on_node+0x200/0x200
[   28.020071]  [<ffffffff81a6522f>] ret_from_fork+0x3f/0x70
[   28.020071]  [<ffffffff81071b00>] ?kthread_create_on_node+0x200/0x200

Signed-off-by: Shaohua Li <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Cc: [email protected] # v2.6.31+
shenki pushed a commit that referenced this pull request Nov 13, 2015
Dmitry Vyukov reported the following using trinity and the memory
error detector AddressSanitizer
(https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).

[ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
address ffff88002e280000
[ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a)
[ 124.578633] Accessed by thread T10915:
[ 124.579295] inlined in describe_heap_address
./arch/x86/mm/asan/report.c:164
[ 124.579295] #0 ffffffff810dd277 in asan_report_error
./arch/x86/mm/asan/report.c:278
[ 124.580137] #1 ffffffff810dc6a0 in asan_check_region
./arch/x86/mm/asan/asan.c:37
[ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0
[ 124.581893] #3 ffffffff8107c093 in get_wchan
./arch/x86/kernel/process_64.c:444

The address checks in the 64bit implementation of get_wchan() are
wrong in several ways:

 - The lower bound of the stack is not the start of the stack
   page. It's the start of the stack page plus sizeof (struct
   thread_info)

 - The upper bound must be:

       top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long).

   The 2 * sizeof(unsigned long) is required because the stack pointer
   points at the frame pointer. The layout on the stack is: ... IP FP
   ... IP FP. So we need to make sure that both IP and FP are in the
   bounds.

Fix the bound checks and get rid of the mix of numeric constants, u64
and unsigned long. Making all unsigned long allows us to use the same
function for 32bit as well.

Use READ_ONCE() when accessing the stack. This does not prevent a
concurrent wakeup of the task and the stack changing, but at least it
avoids TOCTOU.

Also check task state at the end of the loop. Again that does not
prevent concurrent changes, but it avoids walking for nothing.

Add proper comments while at it.

Reported-by: Dmitry Vyukov <[email protected]>
Reported-by: Sasha Levin <[email protected]>
Based-on-patch-from: Wolfram Gloger <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Reviewed-by: Dmitry Vyukov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Kostya Serebryany <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: kasan-dev <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Wolfram Gloger <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
…fy a fault

SunDong reported the following on

  https://bugzilla.kernel.org/show_bug.cgi?id=103841

	I think I find a linux bug, I have the test cases is constructed. I
	can stable recurring problems in fedora22(4.0.4) kernel version,
	arch for x86_64.  I construct transparent huge page, when the parent
	and child process with MAP_SHARE, MAP_PRIVATE way to access the same
	huge page area, it has the opportunity to lead to huge page copy on
	write failure, and then it will munmap the child corresponding mmap
	area, but then the child mmap area with VM_MAYSHARE attributes, child
	process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags
	functions (vma - > vm_flags & VM_MAYSHARE).

There were a number of problems with the report (e.g.  it's hugetlbfs that
triggers this, not transparent huge pages) but it was fundamentally
correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that
looks like this

	 vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000
	 next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800
	 prot 8000000000000027 anon_vma           (null) vm_ops ffffffff8182a7a0
	 pgoff 0 file ffff88106bdb9800 private_data           (null)
	 flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb)
	 ------------
	 kernel BUG at mm/hugetlb.c:462!
	 SMP
	 Modules linked in: xt_pkttype xt_LOG xt_limit [..]
	 CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1
	 Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012
	 set_vma_resv_flags+0x2d/0x30

The VM_BUG_ON is correct because private and shared mappings have
different reservation accounting but the warning clearly shows that the
VMA is shared.

When a private COW fails to allocate a new page then only the process
that created the VMA gets the page -- all the children unmap the page.
If the children access that data in the future then they get killed.

The problem is that the same file is mapped shared and private.  During
the COW, the allocation fails, the VMAs are traversed to unmap the other
private pages but a shared VMA is found and the bug is triggered.  This
patch identifies such VMAs and skips them.

Signed-off-by: Mel Gorman <[email protected]>
Reported-by: SunDong <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: David Rientjes <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
Since 9eb1e57
drm/dp/mst: make sure mst_primary mstb is valid in work function

we validate the mstb structs in the work function, and doing
that takes a reference. So we should never get here with the
work function running using the mstb device, only if the work
function hasn't run yet or is running for another mstb.

So we don't need to sync the work here, this was causing
lockdep spew as below.

[  +0.000160] =============================================
[  +0.000001] [ INFO: possible recursive locking detected ]
[  +0.000002] 3.10.0-320.el7.rhel72.stable.backport.3.x86_64.debug #1 Tainted: G        W      ------------
[  +0.000001] ---------------------------------------------
[  +0.000001] kworker/4:2/1262 is trying to acquire lock:
[  +0.000001]  ((&mgr->work)){+.+.+.}, at: [<ffffffff810b29a5>] flush_work+0x5/0x2e0
[  +0.000007]
but task is already holding lock:
[  +0.000001]  ((&mgr->work)){+.+.+.}, at: [<ffffffff810b57e4>] process_one_work+0x1b4/0x710
[  +0.000004]
other info that might help us debug this:
[  +0.000001]  Possible unsafe locking scenario:

[  +0.000002]        CPU0
[  +0.000000]        ----
[  +0.000001]   lock((&mgr->work));
[  +0.000002]   lock((&mgr->work));
[  +0.000001]
 *** DEADLOCK ***

[  +0.000001]  May be due to missing lock nesting notation

[  +0.000002] 2 locks held by kworker/4:2/1262:
[  +0.000001]  #0:  (events_long){.+.+.+}, at: [<ffffffff810b57e4>] process_one_work+0x1b4/0x710
[  +0.000004]  #1:  ((&mgr->work)){+.+.+.}, at: [<ffffffff810b57e4>] process_one_work+0x1b4/0x710
[  +0.000003]
stack backtrace:
[  +0.000003] CPU: 4 PID: 1262 Comm: kworker/4:2 Tainted: G        W      ------------   3.10.0-320.el7.rhel72.stable.backport.3.x86_64.debug #1
[  +0.000001] Hardware name: LENOVO 20EGS0R600/20EGS0R600, BIOS GNET71WW (2.19 ) 02/05/2015
[  +0.000008] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
[  +0.000001]  ffffffff82c26c90 00000000a527b914 ffff88046399bae8 ffffffff816fe04d
[  +0.000004]  ffff88046399bb58 ffffffff8110f47f ffff880461438000 0001009b840fc003
[  +0.000002]  ffff880461438a98 0000000000000000 0000000804dc26e1 ffffffff824a2c00
[  +0.000003] Call Trace:
[  +0.000004]  [<ffffffff816fe04d>] dump_stack+0x19/0x1b
[  +0.000004]  [<ffffffff8110f47f>] __lock_acquire+0x115f/0x1250
[  +0.000002]  [<ffffffff8110fd49>] lock_acquire+0x99/0x1e0
[  +0.000002]  [<ffffffff810b29a5>] ? flush_work+0x5/0x2e0
[  +0.000002]  [<ffffffff810b29ee>] flush_work+0x4e/0x2e0
[  +0.000002]  [<ffffffff810b29a5>] ? flush_work+0x5/0x2e0
[  +0.000004]  [<ffffffff81025905>] ? native_sched_clock+0x35/0x80
[  +0.000002]  [<ffffffff81025959>] ? sched_clock+0x9/0x10
[  +0.000002]  [<ffffffff810da1f5>] ? local_clock+0x25/0x30
[  +0.000002]  [<ffffffff8110dca9>] ? mark_held_locks+0xb9/0x140
[  +0.000003]  [<ffffffff810b4ed5>] ? __cancel_work_timer+0x95/0x160
[  +0.000002]  [<ffffffff810b4ee8>] __cancel_work_timer+0xa8/0x160
[  +0.000002]  [<ffffffff810b4fb0>] cancel_work_sync+0x10/0x20
[  +0.000007]  [<ffffffffa0160d17>] drm_dp_destroy_mst_branch_device+0x27/0x120 [drm_kms_helper]
[  +0.000006]  [<ffffffffa0163968>] drm_dp_mst_link_probe_work+0x78/0xa0 [drm_kms_helper]
[  +0.000002]  [<ffffffff810b5850>] process_one_work+0x220/0x710
[  +0.000002]  [<ffffffff810b57e4>] ? process_one_work+0x1b4/0x710
[  +0.000005]  [<ffffffff810b5e5b>] worker_thread+0x11b/0x3a0
[  +0.000003]  [<ffffffff810b5d40>] ? process_one_work+0x710/0x710
[  +0.000002]  [<ffffffff810beced>] kthread+0xed/0x100
[  +0.000003]  [<ffffffff810bec00>] ? insert_kthread_work+0x80/0x80
[  +0.000003]  [<ffffffff817121d8>] ret_from_fork+0x58/0x90

v2: add flush_work.

Reviewed-by: Daniel Vetter <[email protected]>
Cc: [email protected]
Signed-off-by: Dave Airlie <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
When function graph tracer is enabled, the following operation
will trigger panic:

mount -t debugfs nodev /sys/kernel
echo next_tgid > /sys/kernel/tracing/set_ftrace_filter
echo function_graph > /sys/kernel/tracing/current_tracer
ls /proc/

------------[ cut here ]------------
[  198.501417] Unable to handle kernel paging request at virtual address cb88537fdc8ba316
[  198.506126] pgd = ffffffc008f79000
[  198.509363] [cb88537fdc8ba316] *pgd=00000000488c6003, *pud=00000000488c6003, *pmd=0000000000000000
[  198.517726] Internal error: Oops: 94000005 [#1] SMP
[  198.518798] Modules linked in:
[  198.520582] CPU: 1 PID: 1388 Comm: ls Tainted: G
[  198.521800] Hardware name: linux,dummy-virt (DT)
[  198.522852] task: ffffffc0fa9e8000 ti: ffffffc0f9ab0000 task.ti: ffffffc0f9ab0000
[  198.524306] PC is at next_tgid+0x30/0x100
[  198.525205] LR is at return_to_handler+0x0/0x20
[  198.526090] pc : [<ffffffc0002a1070>] lr : [<ffffffc0000907c0>] pstate: 60000145
[  198.527392] sp : ffffffc0f9ab3d40
[  198.528084] x29: ffffffc0f9ab3d40 x28: ffffffc0f9ab0000
[  198.529406] x27: ffffffc000d6a000 x26: ffffffc000b786e8
[  198.530659] x25: ffffffc0002a1900 x24: ffffffc0faf16c00
[  198.531942] x23: ffffffc0f9ab3ea0 x22: 0000000000000002
[  198.533202] x21: ffffffc000d85050 x20: 0000000000000002
[  198.534446] x19: 0000000000000002 x18: 0000000000000000
[  198.535719] x17: 000000000049fa08 x16: ffffffc000242efc
[  198.537030] x15: 0000007fa472b54c x14: ffffffffff000000
[  198.538347] x13: ffffffc0fada84a0 x12: 0000000000000001
[  198.539634] x11: ffffffc0f9ab3d70 x10: ffffffc0f9ab3d70
[  198.540915] x9 : ffffffc0000907c0 x8 : ffffffc0f9ab3d40
[  198.542215] x7 : 0000002e330f08f0 x6 : 0000000000000015
[  198.543508] x5 : 0000000000000f08 x4 : ffffffc0f9835ec0
[  198.544792] x3 : cb88537fdc8ba316 x2 : cb88537fdc8ba306
[  198.546108] x1 : 0000000000000002 x0 : ffffffc000d85050
[  198.547432]
[  198.547920] Process ls (pid: 1388, stack limit = 0xffffffc0f9ab0020)
[  198.549170] Stack: (0xffffffc0f9ab3d40 to 0xffffffc0f9ab4000)
[  198.582568] Call trace:
[  198.583313] [<ffffffc0002a1070>] next_tgid+0x30/0x100
[  198.584359] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[  198.585503] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[  198.586574] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[  198.587660] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[  198.588896] Code: aa0003f5 2a0103f4 b4000102 91004043 (885f7c60)
[  198.591092] ---[ end trace 6a346f8f20949ac8 ]---

This is because when using function graph tracer, if the traced
function return value is in multi regs ([x0-x7]), return_to_handler
may corrupt them. So in return_to_handler, the parameter regs should
be protected properly.

Cc: <[email protected]> # 3.18+
Signed-off-by: Li Bin <[email protected]>
Acked-by: AKASHI Takahiro <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
Commit 1a3d595 ("MIPS: Tidy up FPU context switching") removed FP
context saving from the asm-written resume function in favour of reusing
existing code to perform the same task. However it only removed the FP
context saving code from the r4k_switch.S implementation of resume.
Octeon uses its own implementation in octeon_switch.S, so remove FP
context saving there too in order to prevent attempting to save context
twice. That formerly led to an exception from the second save as follows
because the FPU had already been disabled by the first save:

    do_cpu invoked from kernel context![#1]:
    CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.3.0-rc2-dirty #2
    task: 800000041f84a008 ti: 800000041f864000 task.ti: 800000041f864000
    $ 0   : 0000000000000000 0000000010008ce1 0000000000100000 ffffffffbfffffff
    $ 4   : 800000041f84a008 800000041f84ac08 800000041f84c000 0000000000000004
    $ 8   : 0000000000000001 0000000000000000 0000000000000000 0000000000000001
    $12   : 0000000010008ce3 0000000000119c60 0000000000000036 800000041f864000
    $16   : 800000041f84ac08 800000000792ce80 800000041f84a008 ffffffff81758b00
    $20   : 0000000000000000 ffffffff8175ae50 0000000000000000 ffffffff8176c740
    $24   : 0000000000000006 ffffffff81170300
    $28   : 800000041f864000 800000041f867d90 0000000000000000 ffffffff815f3fa0
    Hi    : 0000000000fa8257
    Lo    : ffffffffe15cfc00
    epc   : ffffffff8112821c resume+0x9c/0x200
    ra    : ffffffff815f3fa0 __schedule+0x3f0/0x7d8
    Status: 10008ce2        KX SX UX KERNEL EXL
    Cause : 1080002c (ExcCode 0b)
    PrId  : 000d0601 (Cavium Octeon+)
    Modules linked in:
    Process kthreadd (pid: 2, threadinfo=800000041f864000, task=800000041f84a008, tls=0000000000000000)
    Stack : ffffffff81604218 ffffffff815f7e08 800000041f84a008 ffffffff811681b0
              800000041f84a008 ffffffff817e9878 0000000000000000 ffffffff81770000
              ffffffff81768340 ffffffff81161398 0000000000000001 0000000000000000
              0000000000000000 ffffffff815f4424 0000000000000000 ffffffff81161d68
              ffffffff81161be8 0000000000000000 0000000000000000 0000000000000000
              0000000000000000 0000000000000000 0000000000000000 ffffffff8111e16c
              0000000000000000 0000000000000000 0000000000000000 0000000000000000
              0000000000000000 0000000000000000 0000000000000000 0000000000000000
              0000000000000000 0000000000000000 0000000000000000 0000000000000000
              0000000000000000 0000000000000000 0000000000000000 0000000000000000
              ...
    Call Trace:
    [<ffffffff8112821c>] resume+0x9c/0x200
    [<ffffffff815f3fa0>] __schedule+0x3f0/0x7d8
    [<ffffffff815f4424>] schedule+0x34/0x98
    [<ffffffff81161d68>] kthreadd+0x180/0x198
    [<ffffffff8111e16c>] ret_from_kernel_thread+0x14/0x1c

Tested using cavium_octeon_defconfig on an EdgeRouter Lite.

Fixes: 1a3d595 ("MIPS: Tidy up FPU context switching")
Reported-by: Aaro Koskinen <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Aleksey Makarov <[email protected]>
Cc: [email protected]
Cc: Chandrakala Chavva <[email protected]>
Cc: David Daney <[email protected]>
Cc: Leonid Rosenboim <[email protected]>
Patchwork: https://patchwork.linux-mips.org/patch/11166/
Signed-off-by: Ralf Baechle <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
My colleague ran into a program stall on a x86_64 server, where
n_tty_read() was waiting for data even if there was data in the buffer
in the pty.  kernel stack for the stuck process looks like below.
 #0 [ffff88303d107b58] __schedule at ffffffff815c4b20
 #1 [ffff88303d107bd0] schedule at ffffffff815c513e
 #2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818
 #3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2
 #4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23
 #5 [ffff88303d107dd0] tty_read at ffffffff81368013
 #6 [ffff88303d107e20] __vfs_read at ffffffff811a3704
 #7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57
 #8 [ffff88303d107f00] sys_read at ffffffff811a4306
 #9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7

There seems to be two problems causing this issue.

First, in drivers/tty/n_tty.c, __receive_buf() stores the data and
updates ldata->commit_head using smp_store_release() and then checks
the wait queue using waitqueue_active().  However, since there is no
memory barrier, __receive_buf() could return without calling
wake_up_interactive_poll(), and at the same time, n_tty_read() could
start to wait in wait_woken() as in the following chart.

        __receive_buf()                         n_tty_read()
------------------------------------------------------------------------
if (waitqueue_active(&tty->read_wait))
/* Memory operations issued after the
   RELEASE may be completed before the
   RELEASE operation has completed */
                                        add_wait_queue(&tty->read_wait, &wait);
                                        ...
                                        if (!input_available_p(tty, 0)) {
smp_store_release(&ldata->commit_head,
                  ldata->read_head);
                                        ...
                                        timeout = wait_woken(&wait,
                                          TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------

The second problem is that n_tty_read() also lacks a memory barrier
call and could also cause __receive_buf() to return without calling
wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken()
as in the chart below.

        __receive_buf()                         n_tty_read()
------------------------------------------------------------------------
                                        spin_lock_irqsave(&q->lock, flags);
                                        /* from add_wait_queue() */
                                        ...
                                        if (!input_available_p(tty, 0)) {
                                        /* Memory operations issued after the
                                           RELEASE may be completed before the
                                           RELEASE operation has completed */
smp_store_release(&ldata->commit_head,
                  ldata->read_head);
if (waitqueue_active(&tty->read_wait))
                                        __add_wait_queue(q, wait);
                                        spin_unlock_irqrestore(&q->lock,flags);
                                        /* from add_wait_queue() */
                                        ...
                                        timeout = wait_woken(&wait,
                                          TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------

There are also other places in drivers/tty/n_tty.c which have similar
calls to waitqueue_active(), so instead of adding many memory barrier
calls, this patch simply removes the call to waitqueue_active(),
leaving just wake_up*() behind.

This fixes both problems because, even though the memory access before
or after the spinlocks in both wake_up*() and add_wait_queue() can
sneak into the critical section, it cannot go past it and the critical
section assures that they will be serialized (please see "INTER-CPU
ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a
better explanation).  Moreover, the resulting code is much simpler.

Latency measurement using a ping-pong test over a pty doesn't show any
visible performance drop.

Signed-off-by: Kosuke Tatsukawa <[email protected]>
Cc: [email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
shenki pushed a commit that referenced this pull request Nov 13, 2015
Since commit 2b018d5 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release"),
pppoe_release() calls dev_put(po->pppoe_dev) if sk is in the
PPPOX_ZOMBIE state. But pppoe_flush_dev() can set sk->sk_state to
PPPOX_ZOMBIE _and_ reset po->pppoe_dev to NULL. This leads to the
following oops:

[  570.140800] BUG: unable to handle kernel NULL pointer dereference at 00000000000004e0
[  570.142931] IP: [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe]
[  570.144601] PGD 3d119067 PUD 3dbc1067 PMD 0
[  570.144601] Oops: 0000 [#1] SMP
[  570.144601] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppoe pppox ppp_generic slhc loop crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper acpi_cpufreq evdev serio_raw processor button ext4 crc16 mbcache jbd2 virtio_net virtio_blk virtio_pci virtio_ring virtio
[  570.144601] CPU: 1 PID: 15738 Comm: ppp-apitest Not tainted 4.2.0 #1
[  570.144601] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[  570.144601] task: ffff88003d30d600 ti: ffff880036b60000 task.ti: ffff880036b60000
[  570.144601] RIP: 0010:[<ffffffffa018c701>]  [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe]
[  570.144601] RSP: 0018:ffff880036b63e08  EFLAGS: 00010202
[  570.144601] RAX: 0000000000000000 RBX: ffff880034340000 RCX: 0000000000000206
[  570.144601] RDX: 0000000000000006 RSI: ffff88003d30dd20 RDI: ffff88003d30dd20
[  570.144601] RBP: ffff880036b63e28 R08: 0000000000000001 R09: 0000000000000000
[  570.144601] R10: 00007ffee9b50420 R11: ffff880034340078 R12: ffff8800387ec780
[  570.144601] R13: ffff8800387ec7b0 R14: ffff88003e222aa0 R15: ffff8800387ec7b0
[  570.144601] FS:  00007f5672f48700(0000) GS:ffff88003fc80000(0000) knlGS:0000000000000000
[  570.144601] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  570.144601] CR2: 00000000000004e0 CR3: 0000000037f7e000 CR4: 00000000000406a0
[  570.144601] Stack:
[  570.144601]  ffffffffa018f240 ffff8800387ec780 ffffffffa018f240 ffff8800387ec7b0
[  570.144601]  ffff880036b63e48 ffffffff812caabe ffff880039e4e000 0000000000000008
[  570.144601]  ffff880036b63e58 ffffffff812cabad ffff880036b63ea8 ffffffff811347f5
[  570.144601] Call Trace:
[  570.144601]  [<ffffffff812caabe>] sock_release+0x1a/0x75
[  570.144601]  [<ffffffff812cabad>] sock_close+0xd/0x11
[  570.144601]  [<ffffffff811347f5>] __fput+0xff/0x1a5
[  570.144601]  [<ffffffff811348cb>] ____fput+0x9/0xb
[  570.144601]  [<ffffffff81056682>] task_work_run+0x66/0x90
[  570.144601]  [<ffffffff8100189e>] prepare_exit_to_usermode+0x8c/0xa7
[  570.144601]  [<ffffffff81001a26>] syscall_return_slowpath+0x16d/0x19b
[  570.144601]  [<ffffffff813babb1>] int_ret_from_sys_call+0x25/0x9f
[  570.144601] Code: 48 8b 83 c8 01 00 00 a8 01 74 12 48 89 df e8 8b 27 14 e1 b8 f7 ff ff ff e9 b7 00 00 00 8a 43 12 a8 0b 74 1c 48 8b 83 a8 04 00 00 <48> 8b 80 e0 04 00 00 65 ff 08 48 c7 83 a8 04 00 00 00 00 00 00
[  570.144601] RIP  [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe]
[  570.144601]  RSP <ffff880036b63e08>
[  570.144601] CR2: 00000000000004e0
[  570.200518] ---[ end trace 46956baf17349563 ]---

pppoe_flush_dev() has no reason to override sk->sk_state with
PPPOX_ZOMBIE. pppox_unbind_sock() already sets sk->sk_state to
PPPOX_DEAD, which is the correct state given that sk is unbound and
po->pppoe_dev is NULL.

Fixes: 2b018d5 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release")
Tested-by: Oleksii Berezhniak <[email protected]>
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
amboar pushed a commit that referenced this pull request Oct 10, 2024
…acntion

commit c3b47f4 upstream.

[BUG]
Syzbot reported a NULL pointer dereference with the following crash:

  FAULT_INJECTION: forcing a failure.
   start_transaction+0x830/0x1670 fs/btrfs/transaction.c:676
   prepare_to_relocate+0x31f/0x4c0 fs/btrfs/relocation.c:3642
   relocate_block_group+0x169/0xd20 fs/btrfs/relocation.c:3678
  ...
  BTRFS info (device loop0): balance: ended with status: -12
  Oops: general protection fault, probably for non-canonical address 0xdffffc00000000cc: 0000 [#1] PREEMPT SMP KASAN NOPTI
  KASAN: null-ptr-deref in range [0x0000000000000660-0x0000000000000667]
  RIP: 0010:btrfs_update_reloc_root+0x362/0xa80 fs/btrfs/relocation.c:926
  Call Trace:
   <TASK>
   commit_fs_roots+0x2ee/0x720 fs/btrfs/transaction.c:1496
   btrfs_commit_transaction+0xfaf/0x3740 fs/btrfs/transaction.c:2430
   del_balance_item fs/btrfs/volumes.c:3678 [inline]
   reset_balance_state+0x25e/0x3c0 fs/btrfs/volumes.c:3742
   btrfs_balance+0xead/0x10c0 fs/btrfs/volumes.c:4574
   btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3673
   vfs_ioctl fs/ioctl.c:51 [inline]
   __do_sys_ioctl fs/ioctl.c:907 [inline]
   __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

[CAUSE]
The allocation failure happens at the start_transaction() inside
prepare_to_relocate(), and during the error handling we call
unset_reloc_control(), which makes fs_info->balance_ctl to be NULL.

Then we continue the error path cleanup in btrfs_balance() by calling
reset_balance_state() which will call del_balance_item() to fully delete
the balance item in the root tree.

However during the small window between set_reloc_contrl() and
unset_reloc_control(), we can have a subvolume tree update and created a
reloc_root for that subvolume.

Then we go into the final btrfs_commit_transaction() of
del_balance_item(), and into btrfs_update_reloc_root() inside
commit_fs_roots().

That function checks if fs_info->reloc_ctl is in the merge_reloc_tree
stage, but since fs_info->reloc_ctl is NULL, it results a NULL pointer
dereference.

[FIX]
Just add extra check on fs_info->reloc_ctl inside
btrfs_update_reloc_root(), before checking
fs_info->reloc_ctl->merge_reloc_tree.

That DEAD_RELOC_TREE handling is to prevent further modification to the
reloc tree during merge stage, but since there is no reloc_ctl at all,
we do not need to bother that.

Reported-by: [email protected]
Link: https://lore.kernel.org/linux-btrfs/[email protected]/
CC: [email protected] # 4.19+
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Qu Wenruo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
amboar pushed a commit that referenced this pull request Oct 10, 2024
[ Upstream commit cf7c278 ]

Syzkaller reported the following bug:

  general protection fault, probably for non-canonical address 0xdffffc0000000038: 0000 [#1] SMP KASAN
  KASAN: null-ptr-deref in range [0x00000000000001c0-0x00000000000001c7]
  Call Trace:
   lock_acquire
   lock_acquire+0x1ce/0x4f0
   down_read+0x93/0x4a0
   iommufd_test_syz_conv_iova+0x56/0x1f0
   iommufd_test_access_rw.isra.0+0x2ec/0x390
   iommufd_test+0x1058/0x1e30
   iommufd_fops_ioctl+0x381/0x510
   vfs_ioctl
   __do_sys_ioctl
   __se_sys_ioctl
   __x64_sys_ioctl+0x170/0x1e0
   do_syscall_x64
   do_syscall_64+0x71/0x140

This is because the new iommufd_access_change_ioas() sets access->ioas to
NULL during its process, so the lock might be gone in a concurrent racing
context.

Fix this by doing the same access->ioas sanity as iommufd_access_rw() and
iommufd_access_pin_pages() functions do.

Cc: [email protected]
Fixes: 9227da7 ("iommufd: Add iommufd_access_change_ioas(_id) helpers")
Link: https://lore.kernel.org/r/3f1932acaf1dd494d404c04364d73ce8f57f3e5e.1708636627.git.nicolinc@nvidia.com
Reported-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit cf7c278)
[Harshit: CVE-2024-26785; Resolve conflicts due to missing commit:
 bd7a282 ("iommufd: Add iommufd_ctx to iommufd_put_object()") in
 6.6.y]
Signed-off-by: Harshit Mogalapalli <[email protected]>
Signed-off-by: Vegard Nossum <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
amboar pushed a commit that referenced this pull request Oct 10, 2024
[ Upstream commit aeedaee ]

Moved IRQ registration down to end of adv7511_probe().

If an IRQ already is pending during adv7511_probe
(before adv7511_cec_init) then cec_received_msg_ts
could crash using uninitialized data:

    Unable to handle kernel read from unreadable memory at virtual address 00000000000003d5
    Internal error: Oops: 96000004 [#1] PREEMPT_RT SMP
    Call trace:
     cec_received_msg_ts+0x48/0x990 [cec]
     adv7511_cec_irq_process+0x1cc/0x308 [adv7511]
     adv7511_irq_process+0xd8/0x120 [adv7511]
     adv7511_irq_handler+0x1c/0x30 [adv7511]
     irq_thread_fn+0x30/0xa0
     irq_thread+0x14c/0x238
     kthread+0x190/0x1a8

Fixes: 3b1b975 ("drm: adv7511/33: add HDMI CEC support")
Signed-off-by: Mads Bligaard Nielsen <[email protected]>
Signed-off-by: Alvin Šipraga <[email protected]>
Reviewed-by: Robert Foss <[email protected]>
Signed-off-by: Robert Foss <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20240219-adv7511-cec-irq-crash-fix-v2-1-245e53c4b96f@bang-olufsen.dk
(cherry picked from commit aeedaee)
[Harshit: CVE-2024-26876; Resolve conflicts due to missing commit:
 c755512 ("drm: adv7511: Add has_dsi variable to struct
 adv7511_chip_info") in 6.6.y and adv7511_chip_info struct is also not
 defined]
Signed-off-by: Harshit Mogalapalli <[email protected]>
Signed-off-by: Vegard Nossum <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
amboar pushed a commit that referenced this pull request Oct 10, 2024
…mit_queues'

[ Upstream commit a2db328 ]

Writing 'power' and 'submit_queues' concurrently will trigger kernel
panic:

Test script:

modprobe null_blk nr_devices=0
mkdir -p /sys/kernel/config/nullb/nullb0
while true; do echo 1 > submit_queues; echo 4 > submit_queues; done &
while true; do echo 1 > power; echo 0 > power; done

Test result:

BUG: kernel NULL pointer dereference, address: 0000000000000148
Oops: 0000 [#1] PREEMPT SMP
RIP: 0010:__lock_acquire+0x41d/0x28f0
Call Trace:
 <TASK>
 lock_acquire+0x121/0x450
 down_write+0x5f/0x1d0
 simple_recursive_removal+0x12f/0x5c0
 blk_mq_debugfs_unregister_hctxs+0x7c/0x100
 blk_mq_update_nr_hw_queues+0x4a3/0x720
 nullb_update_nr_hw_queues+0x71/0xf0 [null_blk]
 nullb_device_submit_queues_store+0x79/0xf0 [null_blk]
 configfs_write_iter+0x119/0x1e0
 vfs_write+0x326/0x730
 ksys_write+0x74/0x150

This is because del_gendisk() can concurrent with
blk_mq_update_nr_hw_queues():

nullb_device_power_store	nullb_apply_submit_queues
 null_del_dev
 del_gendisk
				 nullb_update_nr_hw_queues
				  if (!dev->nullb)
				  // still set while gendisk is deleted
				   return 0
				  blk_mq_update_nr_hw_queues
 dev->nullb = NULL

Fix this problem by resuing the global mutex to protect
nullb_device_power_store() and nullb_update_nr_hw_queues() from configfs.

Fixes: 45919fb ("null_blk: Enable modifying 'submit_queues' after an instance has been configured")
Reported-and-tested-by: Yi Zhang <[email protected]>
Closes: https://lore.kernel.org/all/CAHj4cs9LgsHLnjg8z06LQ3Pr5cax-+Ps+xT7AP7TPnEjStuwZA@mail.gmail.com/
Signed-off-by: Yu Kuai <[email protected]>
Reviewed-by: Zhu Yanjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
(cherry picked from commit a2db328)
[Harshit: CVE-2024-36478; Resolve conflicts due to missing commit:
 e440626 ("null_blk: pass queue_limits to blk_mq_alloc_disk") in
 6.6.y]
Signed-off-by: Harshit Mogalapalli <[email protected]>
Signed-off-by: Vegard Nossum <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
amboar pushed a commit that referenced this pull request Oct 10, 2024
commit 221af82 upstream.

Since commit 3f8ca2e ("vhost/scsi: Extract common handling code
from control queue handler") a null pointer dereference bug can be
triggered when guest sends an SCSI AN request.

In vhost_scsi_ctl_handle_vq(), `vc.target` is assigned with
`&v_req.tmf.lun[1]` within a switch-case block and is then passed to
vhost_scsi_get_req() which extracts `vc->req` and `tpg`. However, for
a `VIRTIO_SCSI_T_AN_*` request, tpg is not required, so `vc.target` is
set to NULL in this branch. Later, in vhost_scsi_get_req(),
`vc->target` is dereferenced without being checked, leading to a null
pointer dereference bug. This bug can be triggered from guest.

When this bug occurs, the vhost_worker process is killed while holding
`vq->mutex` and the corresponding tpg will remain occupied
indefinitely.

Below is the KASAN report:
Oops: general protection fault, probably for non-canonical address
0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 PID: 840 Comm: poc Not tainted 6.10.0+ #1
Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS
1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:vhost_scsi_get_req+0x165/0x3a0
Code: 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 2b 02 00 00
48 b8 00 00 00 00 00 fc ff df 4d 8b 65 30 4c 89 e2 48 c1 ea 03 <0f> b6
04 02 4c 89 e2 83 e2 07 38 d0 7f 08 84 c0 0f 85 be 01 00 00
RSP: 0018:ffff888017affb50 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffff88801b000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888017affcb8
RBP: ffff888017affb80 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888017affc88 R14: ffff888017affd1c R15: ffff888017993000
FS:  000055556e076500(0000) GS:ffff88806b100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000200027c0 CR3: 0000000010ed0004 CR4: 0000000000370ef0
Call Trace:
 <TASK>
 ? show_regs+0x86/0xa0
 ? die_addr+0x4b/0xd0
 ? exc_general_protection+0x163/0x260
 ? asm_exc_general_protection+0x27/0x30
 ? vhost_scsi_get_req+0x165/0x3a0
 vhost_scsi_ctl_handle_vq+0x2a4/0xca0
 ? __pfx_vhost_scsi_ctl_handle_vq+0x10/0x10
 ? __switch_to+0x721/0xeb0
 ? __schedule+0xda5/0x5710
 ? __kasan_check_write+0x14/0x30
 ? _raw_spin_lock+0x82/0xf0
 vhost_scsi_ctl_handle_kick+0x52/0x90
 vhost_run_work_list+0x134/0x1b0
 vhost_task_fn+0x121/0x350
...
 </TASK>
---[ end trace 0000000000000000 ]---

Let's add a check in vhost_scsi_get_req.

Fixes: 3f8ca2e ("vhost/scsi: Extract common handling code from control queue handler")
Signed-off-by: Haoran Zhang <[email protected]>
[whitespace fixes]
Signed-off-by: Mike Christie <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
amboar pushed a commit that referenced this pull request Oct 10, 2024
commit 9af2efe upstream.

The fields in the hist_entry are filled on-demand which means they only
have meaningful values when relevant sort keys are used.

So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in
the hist entry can be garbage.  So it shouldn't access it
unconditionally.

I got a segfault, when I wanted to see cgroup profiles.

  $ sudo perf record -a --all-cgroups --synth=cgroup true

  $ sudo perf report -s cgroup

  Program received signal SIGSEGV, Segmentation fault.
  0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
  48		return RC_CHK_ACCESS(map)->dso;
  (gdb) bt
  #0  0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
  #1  0x00005555557aa39b in map__load (map=0x0) at util/map.c:344
  #2  0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385
  #3  0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true)
      at util/hist.c:644
  #4  0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
      block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761
  #5  0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
      sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779
  #6  0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015
  #7  0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0)
      at util/hist.c:1260
  #8  0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0,
      machine=0x5555560388e8) at builtin-report.c:334
  #9  0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128,
      sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232
  #10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128,
      sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271
  #11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0,
      file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354
  #12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132
  #13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245
  #14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324
  #15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342
  #16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60)
      at util/session.c:780
  #17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688,
      file_path=0x555556038ff0 "perf.data") at util/session.c:1406

As you can see the entry->ms.map was NULL even if he->ms.map has a
value.  This is because 'sym' sort key is not given, so it cannot assume
whether he->ms.sym and entry->ms.sym is the same.  I only checked the
'sym' sort key here as it implies 'dso' behavior (so maps are the same).

Fixes: ac01c8c ("perf hist: Update hist symbol when updating maps")
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
dkodihal pushed a commit to NVIDIA/linux that referenced this pull request Oct 15, 2024
commit 146a15b upstream.

Prior to LLVM 15.0.0, LLVM's integrated assembler would incorrectly
byte-swap NOP when compiling for big-endian, and the resulting series of
bytes happened to match the encoding of FNMADD S21, S30, S0, S0.

This went unnoticed until commit:

  34f66c4 ("arm64: Use a positive cpucap for FP/SIMD")

Prior to that commit, the kernel would always enable the use of FPSIMD
early in boot when __cpu_setup() initialized CPACR_EL1, and so usage of
FNMADD within the kernel was not detected, but could result in the
corruption of user or kernel FPSIMD state.

After that commit, the instructions happen to trap during boot prior to
FPSIMD being detected and enabled, e.g.

| Unhandled 64-bit el1h sync exception on CPU0, ESR 0x000000001fe00000 -- ASIMD
| CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc3-00013-g34f66c4c4d55 openbmc#1
| Hardware name: linux,dummy-virt (DT)
| pstate: 400000c9 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : __pi_strcmp+0x1c/0x150
| lr : populate_properties+0xe4/0x254
| sp : ffffd014173d3ad0
| x29: ffffd014173d3af0 x28: fffffbfffddffcb8 x27: 0000000000000000
| x26: 0000000000000058 x25: fffffbfffddfe054 x24: 0000000000000008
| x23: fffffbfffddfe000 x22: fffffbfffddfe000 x21: fffffbfffddfe044
| x20: ffffd014173d3b70 x19: 0000000000000001 x18: 0000000000000005
| x17: 0000000000000010 x16: 0000000000000000 x15: 00000000413e7000
| x14: 0000000000000000 x13: 0000000000001bcc x12: 0000000000000000
| x11: 00000000d00dfeed x10: ffffd414193f2cd0 x9 : 0000000000000000
| x8 : 0101010101010101 x7 : ffffffffffffffc0 x6 : 0000000000000000
| x5 : 0000000000000000 x4 : 0101010101010101 x3 : 000000000000002a
| x2 : 0000000000000001 x1 : ffffd014171f2988 x0 : fffffbfffddffcb8
| Kernel panic - not syncing: Unhandled exception
| CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc3-00013-g34f66c4c4d55 openbmc#1
| Hardware name: linux,dummy-virt (DT)
| Call trace:
|  dump_backtrace+0xec/0x108
|  show_stack+0x18/0x2c
|  dump_stack_lvl+0x50/0x68
|  dump_stack+0x18/0x24
|  panic+0x13c/0x340
|  el1t_64_irq_handler+0x0/0x1c
|  el1_abort+0x0/0x5c
|  el1h_64_sync+0x64/0x68
|  __pi_strcmp+0x1c/0x150
|  unflatten_dt_nodes+0x1e8/0x2d8
|  __unflatten_device_tree+0x5c/0x15c
|  unflatten_device_tree+0x38/0x50
|  setup_arch+0x164/0x1e0
|  start_kernel+0x64/0x38c
|  __primary_switched+0xbc/0xc4

Restrict CONFIG_CPU_BIG_ENDIAN to a known good assembler, which is
either GNU as or LLVM's IAS 15.0.0 and newer, which contains the linked
commit.

Closes: ClangBuiltLinux#1948
Link: llvm/llvm-project@1379b15
Signed-off-by: Nathan Chancellor <[email protected]>
Cc: [email protected]
Acked-by: Mark Rutland <[email protected]>
Link: https://lore.kernel.org/r/20231025-disable-arm64-be-ias-b4-llvm-15-v1-1-b25263ed8b23@kernel.org
Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
dkodihal pushed a commit to NVIDIA/linux that referenced this pull request Oct 15, 2024
commit bb32500 upstream.

The following can crash the kernel:

 # cd /sys/kernel/tracing
 # echo 'p:sched schedule' > kprobe_events
 # exec 5>>events/kprobes/sched/enable
 # > kprobe_events
 # exec 5>&-

The above commands:

 1. Change directory to the tracefs directory
 2. Create a kprobe event (doesn't matter what one)
 3. Open bash file descriptor 5 on the enable file of the kprobe event
 4. Delete the kprobe event (removes the files too)
 5. Close the bash file descriptor 5

The above causes a crash!

 BUG: kernel NULL pointer dereference, address: 0000000000000028
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [openbmc#1] PREEMPT SMP PTI
 CPU: 6 PID: 877 Comm: bash Not tainted 6.5.0-rc4-test-00008-g2c6b6b1029d4-dirty openbmc#186
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
 RIP: 0010:tracing_release_file_tr+0xc/0x50

What happens here is that the kprobe event creates a trace_event_file
"file" descriptor that represents the file in tracefs to the event. It
maintains state of the event (is it enabled for the given instance?).
Opening the "enable" file gets a reference to the event "file" descriptor
via the open file descriptor. When the kprobe event is deleted, the file is
also deleted from the tracefs system which also frees the event "file"
descriptor.

But as the tracefs file is still opened by user space, it will not be
totally removed until the final dput() is called on it. But this is not
true with the event "file" descriptor that is already freed. If the user
does a write to or simply closes the file descriptor it will reference the
event "file" descriptor that was just freed, causing a use-after-free bug.

To solve this, add a ref count to the event "file" descriptor as well as a
new flag called "FREED". The "file" will not be freed until the last
reference is released. But the FREE flag will be set when the event is
removed to prevent any more modifications to that event from happening,
even if there's still a reference to the event "file" descriptor.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]

Cc: [email protected]
Cc: Mark Rutland <[email protected]>
Fixes: f5ca233 ("tracing: Increase trace array ref count on enable and filter files")
Reported-by: Beau Belgrave <[email protected]>
Tested-by: Beau Belgrave <[email protected]>
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
dkodihal pushed a commit to NVIDIA/linux that referenced this pull request Oct 15, 2024
commit af1689a upstream.

Validate offsets and lengths before dereferencing create contexts in
smb2_parse_contexts().

This fixes following oops when accessing invalid create contexts from
server:

  BUG: unable to handle page fault for address: ffff8881178d8cc3
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 4a01067 P4D 4a01067 PUD 0
  Oops: 0000 [openbmc#1] PREEMPT SMP NOPTI
  CPU: 3 PID: 1736 Comm: mount.cifs Not tainted 6.7.0-rc4 openbmc#1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
  rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  RIP: 0010:smb2_parse_contexts+0xa0/0x3a0 [cifs]
  Code: f8 10 75 13 48 b8 93 ad 25 50 9c b4 11 e7 49 39 06 0f 84 d2 00
  00 00 8b 45 00 85 c0 74 61 41 29 c5 48 01 c5 41 83 fd 0f 76 55 <0f> b7
  7d 04 0f b7 45 06 4c 8d 74 3d 00 66 83 f8 04 75 bc ba 04 00
  RSP: 0018:ffffc900007939e0 EFLAGS: 00010216
  RAX: ffffc90000793c78 RBX: ffff8880180cc000 RCX: ffffc90000793c90
  RDX: ffffc90000793cc0 RSI: ffff8880178d8cc0 RDI: ffff8880180cc000
  RBP: ffff8881178d8cbf R08: ffffc90000793c22 R09: 0000000000000000
  R10: ffff8880180cc000 R11: 0000000000000024 R12: 0000000000000000
  R13: 0000000000000020 R14: 0000000000000000 R15: ffffc90000793c22
  FS: 00007f873753cbc0(0000) GS:ffff88806bc00000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffff8881178d8cc3 CR3: 00000000181ca000 CR4: 0000000000750ef0
  PKRU: 55555554
  Call Trace:
   <TASK>
   ? __die+0x23/0x70
   ? page_fault_oops+0x181/0x480
   ? search_module_extables+0x19/0x60
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? exc_page_fault+0x1b6/0x1c0
   ? asm_exc_page_fault+0x26/0x30
   ? smb2_parse_contexts+0xa0/0x3a0 [cifs]
   SMB2_open+0x38d/0x5f0 [cifs]
   ? smb2_is_path_accessible+0x138/0x260 [cifs]
   smb2_is_path_accessible+0x138/0x260 [cifs]
   cifs_is_path_remote+0x8d/0x230 [cifs]
   cifs_mount+0x7e/0x350 [cifs]
   cifs_smb3_do_mount+0x128/0x780 [cifs]
   smb3_get_tree+0xd9/0x290 [cifs]
   vfs_get_tree+0x2c/0x100
   ? capable+0x37/0x70
   path_mount+0x2d7/0xb80
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? _raw_spin_unlock_irqrestore+0x44/0x60
   __x64_sys_mount+0x11a/0x150
   do_syscall_64+0x47/0xf0
   entry_SYSCALL_64_after_hwframe+0x6f/0x77
  RIP: 0033:0x7f8737657b1e

Reported-by: Robert Morris <[email protected]>
Cc: [email protected]
Signed-off-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
dkodihal pushed a commit to NVIDIA/linux that referenced this pull request Oct 15, 2024
[ Upstream commit 23d05d5 ]

Once again syzbot is able to crash the kernel in skb_segment() [1]

GSO_BY_FRAGS is a forbidden value, but unfortunately the following
computation in skb_segment() can reach it quite easily :

	mss = mss * partial_segs;

65535 = 3 * 5 * 17 * 257, so many initial values of mss can lead to
a bad final result.

Make sure to limit segmentation so that the new mss value is smaller
than GSO_BY_FRAGS.

[1]

general protection fault, probably for non-canonical address 0xdffffc000000000e: 0000 [openbmc#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077]
CPU: 1 PID: 5079 Comm: syz-executor993 Not tainted 6.7.0-rc4-syzkaller-00141-g1ae4cd3cbdd0 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
RIP: 0010:skb_segment+0x181d/0x3f30 net/core/skbuff.c:4551
Code: 83 e3 02 e9 fb ed ff ff e8 90 68 1c f9 48 8b 84 24 f8 00 00 00 48 8d 78 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 8a 21 00 00 48 8b 84 24 f8 00
RSP: 0018:ffffc900043473d0 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000010046 RCX: ffffffff886b1597
RDX: 000000000000000e RSI: ffffffff886b2520 RDI: 0000000000000070
RBP: ffffc90004347578 R08: 0000000000000005 R09: 000000000000ffff
R10: 000000000000ffff R11: 0000000000000002 R12: ffff888063202ac0
R13: 0000000000010000 R14: 000000000000ffff R15: 0000000000000046
FS: 0000555556e7e380(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020010000 CR3: 0000000027ee2000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
udp6_ufo_fragment+0xa0e/0xd00 net/ipv6/udp_offload.c:109
ipv6_gso_segment+0x534/0x17e0 net/ipv6/ip6_offload.c:120
skb_mac_gso_segment+0x290/0x610 net/core/gso.c:53
__skb_gso_segment+0x339/0x710 net/core/gso.c:124
skb_gso_segment include/net/gso.h:83 [inline]
validate_xmit_skb+0x36c/0xeb0 net/core/dev.c:3626
__dev_queue_xmit+0x6f3/0x3d60 net/core/dev.c:4338
dev_queue_xmit include/linux/netdevice.h:3134 [inline]
packet_xmit+0x257/0x380 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3087 [inline]
packet_sendmsg+0x24c6/0x5220 net/packet/af_packet.c:3119
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0xd5/0x180 net/socket.c:745
__sys_sendto+0x255/0x340 net/socket.c:2190
__do_sys_sendto net/socket.c:2202 [inline]
__se_sys_sendto net/socket.c:2198 [inline]
__x64_sys_sendto+0xe0/0x1b0 net/socket.c:2198
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f8692032aa9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 d1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff8d685418 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8692032aa9
RDX: 0000000000010048 RSI: 00000000200000c0 RDI: 0000000000000003
RBP: 00000000000f4240 R08: 0000000020000540 R09: 0000000000000014
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff8d685480
R13: 0000000000000001 R14: 00007fff8d685480 R15: 0000000000000003
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:skb_segment+0x181d/0x3f30 net/core/skbuff.c:4551
Code: 83 e3 02 e9 fb ed ff ff e8 90 68 1c f9 48 8b 84 24 f8 00 00 00 48 8d 78 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 8a 21 00 00 48 8b 84 24 f8 00
RSP: 0018:ffffc900043473d0 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000010046 RCX: ffffffff886b1597
RDX: 000000000000000e RSI: ffffffff886b2520 RDI: 0000000000000070
RBP: ffffc90004347578 R08: 0000000000000005 R09: 000000000000ffff
R10: 000000000000ffff R11: 0000000000000002 R12: ffff888063202ac0
R13: 0000000000010000 R14: 000000000000ffff R15: 0000000000000046
FS: 0000555556e7e380(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020010000 CR3: 0000000027ee2000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: 3953c46 ("sk_buff: allow segmenting based on frag sizes")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Marcelo Ricardo Leitner <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
dkodihal pushed a commit to NVIDIA/linux that referenced this pull request Oct 15, 2024
commit 93ec4a3 upstream.

The lock_class_key is still registered and can be found in
lock_keys_hash hlist after subsys_private is freed in error
handler path.A task who iterate over the lock_keys_hash
later may cause use-after-free.So fix that up and unregister
the lock_class_key before kfree(cp).

On our platform, a driver fails to kset_register because of
creating duplicate filename '/class/xxx'.With Kasan enabled,
it prints a invalid-access bug report.

KASAN bug report:

BUG: KASAN: invalid-access in lockdep_register_key+0x19c/0x1bc
Write of size 8 at addr 15ffff808b8c0368 by task modprobe/252
Pointer tag: [15], memory tag: [fe]

CPU: 7 PID: 252 Comm: modprobe Tainted: G        W
 6.6.0-mainline-maybe-dirty openbmc#1

Call trace:
dump_backtrace+0x1b0/0x1e4
show_stack+0x2c/0x40
dump_stack_lvl+0xac/0xe0
print_report+0x18c/0x4d8
kasan_report+0xe8/0x148
__hwasan_store8_noabort+0x88/0x98
lockdep_register_key+0x19c/0x1bc
class_register+0x94/0x1ec
init_module+0xbc/0xf48 [rfkill]
do_one_initcall+0x17c/0x72c
do_init_module+0x19c/0x3f8
...
Memory state around the buggy address:
ffffff808b8c0100: 8a 8a 8a 8a 8a 8a 8a 8a 8a 8a 8a 8a 8a 8a 8a 8a
ffffff808b8c0200: 8a 8a 8a 8a 8a 8a 8a 8a fe fe fe fe fe fe fe fe
>ffffff808b8c0300: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
                                     ^
ffffff808b8c0400: 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03

As CONFIG_KASAN_GENERIC is not set, Kasan reports invalid-access
not use-after-free here.In this case, modprobe is manipulating
the corrupted lock_keys_hash hlish where lock_class_key is already
freed before.

It's worth noting that this only can happen if lockdep is enabled,
which is not true for normal system.

Fixes: dcfbb67 ("driver core: class: use lock_class_key already present in struct subsys_private")
Cc: stable <[email protected]>
Signed-off-by: Jing Xia <[email protected]>
Signed-off-by: Xuewen Yan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
dkodihal pushed a commit to NVIDIA/linux that referenced this pull request Oct 15, 2024
[ Upstream commit fa765c4 ]

shutdown_pirq and startup_pirq are not taking the
irq_mapping_update_lock because they can't due to lock inversion. Both
are called with the irq_desc->lock being taking. The lock order,
however, is first irq_mapping_update_lock and then irq_desc->lock.

This opens multiple races:
- shutdown_pirq can be interrupted by a function that allocates an event
  channel:

  CPU0                        CPU1
  shutdown_pirq {
    xen_evtchn_close(e)
                              __startup_pirq {
                                EVTCHNOP_bind_pirq
                                  -> returns just freed evtchn e
                                set_evtchn_to_irq(e, irq)
                              }
    xen_irq_info_cleanup() {
      set_evtchn_to_irq(e, -1)
    }
  }

  Assume here event channel e refers here to the same event channel
  number.
  After this race the evtchn_to_irq mapping for e is invalid (-1).

- __startup_pirq races with __unbind_from_irq in a similar way. Because
  __startup_pirq doesn't take irq_mapping_update_lock it can grab the
  evtchn that __unbind_from_irq is currently freeing and cleaning up. In
  this case even though the event channel is allocated, its mapping can
  be unset in evtchn_to_irq.

The fix is to first cleanup the mappings and then close the event
channel. In this way, when an event channel gets allocated it's
potential previous evtchn_to_irq mappings are guaranteed to be unset already.
This is also the reverse order of the allocation where first the event
channel is allocated and then the mappings are setup.

On a 5.10 kernel prior to commit 3fcdaf3 ("xen/events: modify internal
[un]bind interfaces"), we hit a BUG like the following during probing of NVMe
devices. The issue is that during nvme_setup_io_queues, pci_free_irq
is called for every device which results in a call to shutdown_pirq.
With many nvme devices it's therefore likely to hit this race during
boot because there will be multiple calls to shutdown_pirq and
startup_pirq are running potentially in parallel.

  ------------[ cut here ]------------
  blkfront: xvda: barrier or flush: disabled; persistent grants: enabled; indirect descriptors: enabled; bounce buffer: enabled
  kernel BUG at drivers/xen/events/events_base.c:499!
  invalid opcode: 0000 [openbmc#1] SMP PTI
  CPU: 44 PID: 375 Comm: kworker/u257:23 Not tainted 5.10.201-191.748.amzn2.x86_64 openbmc#1
  Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006
  Workqueue: nvme-reset-wq nvme_reset_work
  RIP: 0010:bind_evtchn_to_cpu+0xdf/0xf0
  Code: 5d 41 5e c3 cc cc cc cc 44 89 f7 e8 2b 55 ad ff 49 89 c5 48 85 c0 0f 84 64 ff ff ff 4c 8b 68 30 41 83 fe ff 0f 85 60 ff ff ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00
  RSP: 0000:ffffc9000d533b08 EFLAGS: 00010046
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
  RDX: 0000000000000028 RSI: 00000000ffffffff RDI: 00000000ffffffff
  RBP: ffff888107419680 R08: 0000000000000000 R09: ffffffff82d72b00
  R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000001ed
  R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000002
  FS:  0000000000000000(0000) GS:ffff88bc8b500000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000000002610001 CR4: 00000000001706e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   ? show_trace_log_lvl+0x1c1/0x2d9
   ? show_trace_log_lvl+0x1c1/0x2d9
   ? set_affinity_irq+0xdc/0x1c0
   ? __die_body.cold+0x8/0xd
   ? die+0x2b/0x50
   ? do_trap+0x90/0x110
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? do_error_trap+0x65/0x80
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? exc_invalid_op+0x4e/0x70
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? asm_exc_invalid_op+0x12/0x20
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? bind_evtchn_to_cpu+0xc5/0xf0
   set_affinity_irq+0xdc/0x1c0
   irq_do_set_affinity+0x1d7/0x1f0
   irq_setup_affinity+0xd6/0x1a0
   irq_startup+0x8a/0xf0
   __setup_irq+0x639/0x6d0
   ? nvme_suspend+0x150/0x150
   request_threaded_irq+0x10c/0x180
   ? nvme_suspend+0x150/0x150
   pci_request_irq+0xa8/0xf0
   ? __blk_mq_free_request+0x74/0xa0
   queue_request_irq+0x6f/0x80
   nvme_create_queue+0x1af/0x200
   nvme_create_io_queues+0xbd/0xf0
   nvme_setup_io_queues+0x246/0x320
   ? nvme_irq_check+0x30/0x30
   nvme_reset_work+0x1c8/0x400
   process_one_work+0x1b0/0x350
   worker_thread+0x49/0x310
   ? process_one_work+0x350/0x350
   kthread+0x11b/0x140
   ? __kthread_bind_mask+0x60/0x60
   ret_from_fork+0x22/0x30
  Modules linked in:
  ---[ end trace a11715de1eee1873 ]---

Fixes: d46a78b ("xen: implement pirq type event channels")
Cc: [email protected]
Co-debugged-by: Andrew Panyakin <[email protected]>
Signed-off-by: Maximilian Heyne <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
dkodihal pushed a commit to NVIDIA/linux that referenced this pull request Oct 15, 2024
[ Upstream commit 2774f25 ]

With numa balancing on, when a numa system is running where a numa node
doesn't have its local memory so it has no managed zones, the following
oops has been observed.  It's because wakeup_kswapd() is called with a
wrong zone index, -1.  Fixed it by checking the index before calling
wakeup_kswapd().

> BUG: unable to handle page fault for address: 00000000000033f3
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [openbmc#1] PREEMPT SMP NOPTI
> CPU: 2 PID: 895 Comm: masim Not tainted 6.6.0-dirty torvalds#255
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>    rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> RIP: 0010:wakeup_kswapd (./linux/mm/vmscan.c:7812)
> Code: (omitted)
> RSP: 0000:ffffc90004257d58 EFLAGS: 00010286
> RAX: ffffffffffffffff RBX: ffff88883fff0480 RCX: 0000000000000003
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88883fff0480
> RBP: ffffffffffffffff R08: ff0003ffffffffff R09: ffffffffffffffff
> R10: ffff888106c95540 R11: 0000000055555554 R12: 0000000000000003
> R13: 0000000000000000 R14: 0000000000000000 R15: ffff88883fff0940
> FS:  00007fc4b8124740(0000) GS:ffff888827c00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000033f3 CR3: 000000026cc08004 CR4: 0000000000770ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
>  <TASK>
> ? __die
> ? page_fault_oops
> ? __pte_offset_map_lock
> ? exc_page_fault
> ? asm_exc_page_fault
> ? wakeup_kswapd
> migrate_misplaced_page
> __handle_mm_fault
> handle_mm_fault
> do_user_addr_fault
> exc_page_fault
> asm_exc_page_fault
> RIP: 0033:0x55b897ba0808
> Code: (omitted)
> RSP: 002b:00007ffeefa821a0 EFLAGS: 00010287
> RAX: 000055b89983acd0 RBX: 00007ffeefa823f8 RCX: 000055b89983acd0
> RDX: 00007fc2f8122010 RSI: 0000000000020000 RDI: 000055b89983acd0
> RBP: 00007ffeefa821a0 R08: 0000000000000037 R09: 0000000000000075
> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
> R13: 00007ffeefa82410 R14: 000055b897ba5dd8 R15: 00007fc4b8340000
>  </TASK>

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Byungchul Park <[email protected]>
Reported-by: Hyeongtak Ji <[email protected]>
Fixes: c574bbe ("NUMA balancing: optimize page placement for memory tiering system")
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
dkodihal pushed a commit to NVIDIA/linux that referenced this pull request Oct 15, 2024
[ Upstream commit 65e8fbd ]

There is this reported crash when experimenting with the lvm2 testsuite.
The list corruption is caused by the fact that the postsuspend and resume
methods were not paired correctly; there were two consecutive calls to the
origin_postsuspend function. The second call attempts to remove the
"hash_list" entry from a list, while it was already removed by the first
call.

Fix __dm_internal_resume so that it calls the preresume and resume
methods of the table's targets.

If a preresume method of some target fails, we are in a tricky situation.
We can't return an error because dm_internal_resume isn't supposed to
return errors. We can't return success, because then the "resume" and
"postsuspend" methods would not be paired correctly. So, we set the
DMF_SUSPENDED flag and we fake normal suspend - it may confuse userspace
tools, but it won't cause a kernel crash.

------------[ cut here ]------------
kernel BUG at lib/list_debug.c:56!
invalid opcode: 0000 [openbmc#1] PREEMPT SMP
CPU: 1 PID: 8343 Comm: dmsetup Not tainted 6.8.0-rc6 openbmc#4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
RIP: 0010:__list_del_entry_valid_or_report+0x77/0xc0
<snip>
RSP: 0018:ffff8881b831bcc0 EFLAGS: 00010282
RAX: 000000000000004e RBX: ffff888143b6eb80 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffffff819053d0 RDI: 00000000ffffffff
RBP: ffff8881b83a3400 R08: 00000000fffeffff R09: 0000000000000058
R10: 0000000000000000 R11: ffffffff81a24080 R12: 0000000000000001
R13: ffff88814538e000 R14: ffff888143bc6dc0 R15: ffffffffa02e4bb0
FS:  00000000f7c0f780(0000) GS:ffff8893f0a40000(0000) knlGS:0000000000000000
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000057fb5000 CR3: 0000000143474000 CR4: 00000000000006b0
Call Trace:
 <TASK>
 ? die+0x2d/0x80
 ? do_trap+0xeb/0xf0
 ? __list_del_entry_valid_or_report+0x77/0xc0
 ? do_error_trap+0x60/0x80
 ? __list_del_entry_valid_or_report+0x77/0xc0
 ? exc_invalid_op+0x49/0x60
 ? __list_del_entry_valid_or_report+0x77/0xc0
 ? asm_exc_invalid_op+0x16/0x20
 ? table_deps+0x1b0/0x1b0 [dm_mod]
 ? __list_del_entry_valid_or_report+0x77/0xc0
 origin_postsuspend+0x1a/0x50 [dm_snapshot]
 dm_table_postsuspend_targets+0x34/0x50 [dm_mod]
 dm_suspend+0xd8/0xf0 [dm_mod]
 dev_suspend+0x1f2/0x2f0 [dm_mod]
 ? table_deps+0x1b0/0x1b0 [dm_mod]
 ctl_ioctl+0x300/0x5f0 [dm_mod]
 dm_compat_ctl_ioctl+0x7/0x10 [dm_mod]
 __x64_compat_sys_ioctl+0x104/0x170
 do_syscall_64+0x184/0x1b0
 entry_SYSCALL_64_after_hwframe+0x46/0x4e
RIP: 0033:0xf7e6aead
<snip>
---[ end trace 0000000000000000 ]---

Fixes: ffcc393 ("dm: enhance internal suspend and resume interface")
Signed-off-by: Mikulas Patocka <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
dkodihal pushed a commit to NVIDIA/linux that referenced this pull request Oct 15, 2024
commit d21d406 upstream.

syzkaller reported infinite recursive calls of fib6_dump_done() during
netlink socket destruction.  [1]

From the log, syzkaller sent an AF_UNSPEC RTM_GETROUTE message, and then
the response was generated.  The following recvmmsg() resumed the dump
for IPv6, but the first call of inet6_dump_fib() failed at kzalloc() due
to the fault injection.  [0]

  12:01:34 executing program 3:
  r0 = socket$nl_route(0x10, 0x3, 0x0)
  sendmsg$nl_route(r0, ... snip ...)
  recvmmsg(r0, ... snip ...) (fail_nth: 8)

Here, fib6_dump_done() was set to nlk_sk(sk)->cb.done, and the next call
of inet6_dump_fib() set it to nlk_sk(sk)->cb.args[3].  syzkaller stopped
receiving the response halfway through, and finally netlink_sock_destruct()
called nlk_sk(sk)->cb.done().

fib6_dump_done() calls fib6_dump_end() and nlk_sk(sk)->cb.done() if it
is still not NULL.  fib6_dump_end() rewrites nlk_sk(sk)->cb.done() by
nlk_sk(sk)->cb.args[3], but it has the same function, not NULL, calling
itself recursively and hitting the stack guard page.

To avoid the issue, let's set the destructor after kzalloc().

[0]:
FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
CPU: 1 PID: 432110 Comm: syz-executor.3 Not tainted 6.8.0-12821-g537c2e91d354-dirty openbmc#11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl (lib/dump_stack.c:117)
 should_fail_ex (lib/fault-inject.c:52 lib/fault-inject.c:153)
 should_failslab (mm/slub.c:3733)
 kmalloc_trace (mm/slub.c:3748 mm/slub.c:3827 mm/slub.c:3992)
 inet6_dump_fib (./include/linux/slab.h:628 ./include/linux/slab.h:749 net/ipv6/ip6_fib.c:662)
 rtnl_dump_all (net/core/rtnetlink.c:4029)
 netlink_dump (net/netlink/af_netlink.c:2269)
 netlink_recvmsg (net/netlink/af_netlink.c:1988)
 ____sys_recvmsg (net/socket.c:1046 net/socket.c:2801)
 ___sys_recvmsg (net/socket.c:2846)
 do_recvmmsg (net/socket.c:2943)
 __x64_sys_recvmmsg (net/socket.c:3041 net/socket.c:3034 net/socket.c:3034)

[1]:
BUG: TASK stack guard page was hit at 00000000f2fa9af1 (stack is 00000000b7912430..000000009a436beb)
stack guard page: 0000 [openbmc#1] PREEMPT SMP KASAN
CPU: 1 PID: 223719 Comm: kworker/1:3 Not tainted 6.8.0-12821-g537c2e91d354-dirty openbmc#11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: events netlink_sock_destruct_work
RIP: 0010:fib6_dump_done (net/ipv6/ip6_fib.c:570)
Code: 3c 24 e8 f3 e9 51 fd e9 28 fd ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 48 89 fd <53> 48 8d 5d 60 e8 b6 4d 07 fd 48 89 da 48 b8 00 00 00 00 00 fc ff
RSP: 0018:ffffc9000d980000 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffffffff84405990 RCX: ffffffff844059d3
RDX: ffff8881028e0000 RSI: ffffffff84405ac2 RDI: ffff88810c02f358
RBP: ffff88810c02f358 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000224 R12: 0000000000000000
R13: ffff888007c82c78 R14: ffff888007c82c68 R15: ffff888007c82c68
FS:  0000000000000000(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc9000d97fff8 CR3: 0000000102309002 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
 <#DF>
 </#DF>
 <TASK>
 fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
 fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
 ...
 fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
 fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
 netlink_sock_destruct (net/netlink/af_netlink.c:401)
 __sk_destruct (net/core/sock.c:2177 (discriminator 2))
 sk_destruct (net/core/sock.c:2224)
 __sk_free (net/core/sock.c:2235)
 sk_free (net/core/sock.c:2246)
 process_one_work (kernel/workqueue.c:3259)
 worker_thread (kernel/workqueue.c:3329 kernel/workqueue.c:3416)
 kthread (kernel/kthread.c:388)
 ret_from_fork (arch/x86/kernel/process.c:153)
 ret_from_fork_asm (arch/x86/entry/entry_64.S:256)
Modules linked in:

Fixes: 1da177e ("Linux-2.6.12-rc2")
Reported-by: syzkaller <[email protected]>
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
dkodihal pushed a commit to NVIDIA/linux that referenced this pull request Oct 15, 2024
[ Upstream commit 58a4c9b ]

syzbot was able to trigger a NULL deref in fib_validate_source()
in an old tree [1].

It appears the bug exists in latest trees.

All calls to __in_dev_get_rcu() must be checked for a NULL result.

[1]
general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [openbmc#1] SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 2 PID: 3257 Comm: syz-executor.3 Not tainted 5.10.0-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
 RIP: 0010:fib_validate_source+0xbf/0x15a0 net/ipv4/fib_frontend.c:425
Code: 18 f2 f2 f2 f2 42 c7 44 20 23 f3 f3 f3 f3 48 89 44 24 78 42 c6 44 20 27 f3 e8 5d 88 48 fc 4c 89 e8 48 c1 e8 03 48 89 44 24 18 <42> 80 3c 20 00 74 08 4c 89 ef e8 d2 15 98 fc 48 89 5c 24 10 41 bf
RSP: 0018:ffffc900015fee40 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88800f7a4000 RCX: ffff88800f4f90c0
RDX: 0000000000000000 RSI: 0000000004001eac RDI: ffff8880160c64c0
RBP: ffffc900015ff060 R08: 0000000000000000 R09: ffff88800f7a4000
R10: 0000000000000002 R11: ffff88800f4f90c0 R12: dffffc0000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88800f7a4000
FS:  00007f938acfe6c0(0000) GS:ffff888058c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f938acddd58 CR3: 000000001248e000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  ip_route_use_hint+0x410/0x9b0 net/ipv4/route.c:2231
  ip_rcv_finish_core+0x2c4/0x1a30 net/ipv4/ip_input.c:327
  ip_list_rcv_finish net/ipv4/ip_input.c:612 [inline]
  ip_sublist_rcv+0x3ed/0xe50 net/ipv4/ip_input.c:638
  ip_list_rcv+0x422/0x470 net/ipv4/ip_input.c:673
  __netif_receive_skb_list_ptype net/core/dev.c:5572 [inline]
  __netif_receive_skb_list_core+0x6b1/0x890 net/core/dev.c:5620
  __netif_receive_skb_list net/core/dev.c:5672 [inline]
  netif_receive_skb_list_internal+0x9f9/0xdc0 net/core/dev.c:5764
  netif_receive_skb_list+0x55/0x3e0 net/core/dev.c:5816
  xdp_recv_frames net/bpf/test_run.c:257 [inline]
  xdp_test_run_batch net/bpf/test_run.c:335 [inline]
  bpf_test_run_xdp_live+0x1818/0x1d00 net/bpf/test_run.c:363
  bpf_prog_test_run_xdp+0x81f/0x1170 net/bpf/test_run.c:1376
  bpf_prog_test_run+0x349/0x3c0 kernel/bpf/syscall.c:3736
  __sys_bpf+0x45c/0x710 kernel/bpf/syscall.c:5115
  __do_sys_bpf kernel/bpf/syscall.c:5201 [inline]
  __se_sys_bpf kernel/bpf/syscall.c:5199 [inline]
  __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5199

Fixes: 02b2494 ("ipv4: use dst hint for ipv4 list receive")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
dkodihal pushed a commit to NVIDIA/linux that referenced this pull request Oct 15, 2024
[ Upstream commit f2db723 ]

Anderson Nascimento reported a use-after-free splat in tcp_twsk_unique()
with nice analysis.

Since commit ec94c26 ("tcp/dccp: avoid one atomic operation for
timewait hashdance"), inet_twsk_hashdance() sets TIME-WAIT socket's
sk_refcnt after putting it into ehash and releasing the bucket lock.

Thus, there is a small race window where other threads could try to
reuse the port during connect() and call sock_hold() in tcp_twsk_unique()
for the TIME-WAIT socket with zero refcnt.

If that happens, the refcnt taken by tcp_twsk_unique() is overwritten
and sock_put() will cause underflow, triggering a real use-after-free
somewhere else.

To avoid the use-after-free, we need to use refcount_inc_not_zero() in
tcp_twsk_unique() and give up on reusing the port if it returns false.

[0]:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 0 PID: 1039313 at lib/refcount.c:25 refcount_warn_saturate+0xe5/0x110
CPU: 0 PID: 1039313 Comm: trigger Not tainted 6.8.6-200.fc39.x86_64 openbmc#1
Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023
RIP: 0010:refcount_warn_saturate+0xe5/0x110
Code: 42 8e ff 0f 0b c3 cc cc cc cc 80 3d aa 13 ea 01 00 0f 85 5e ff ff ff 48 c7 c7 f8 8e b7 82 c6 05 96 13 ea 01 01 e8 7b 42 8e ff <0f> 0b c3 cc cc cc cc 48 c7 c7 50 8f b7 82 c6 05 7a 13 ea 01 01 e8
RSP: 0018:ffffc90006b43b60 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff888009bb3ef0 RCX: 0000000000000027
RDX: ffff88807be218c8 RSI: 0000000000000001 RDI: ffff88807be218c0
RBP: 0000000000069d70 R08: 0000000000000000 R09: ffffc90006b439f0
R10: ffffc90006b439e8 R11: 0000000000000003 R12: ffff8880029ede84
R13: 0000000000004e20 R14: ffffffff84356dc0 R15: ffff888009bb3ef0
FS:  00007f62c10926c0(0000) GS:ffff88807be00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020ccb000 CR3: 000000004628c005 CR4: 0000000000f70ef0
PKRU: 55555554
Call Trace:
 <TASK>
 ? refcount_warn_saturate+0xe5/0x110
 ? __warn+0x81/0x130
 ? refcount_warn_saturate+0xe5/0x110
 ? report_bug+0x171/0x1a0
 ? refcount_warn_saturate+0xe5/0x110
 ? handle_bug+0x3c/0x80
 ? exc_invalid_op+0x17/0x70
 ? asm_exc_invalid_op+0x1a/0x20
 ? refcount_warn_saturate+0xe5/0x110
 tcp_twsk_unique+0x186/0x190
 __inet_check_established+0x176/0x2d0
 __inet_hash_connect+0x74/0x7d0
 ? __pfx___inet_check_established+0x10/0x10
 tcp_v4_connect+0x278/0x530
 __inet_stream_connect+0x10f/0x3d0
 inet_stream_connect+0x3a/0x60
 __sys_connect+0xa8/0xd0
 __x64_sys_connect+0x18/0x20
 do_syscall_64+0x83/0x170
 entry_SYSCALL_64_after_hwframe+0x78/0x80
RIP: 0033:0x7f62c11a885d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a3 45 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007f62c1091e58 EFLAGS: 00000296 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000020ccb004 RCX: 00007f62c11a885d
RDX: 0000000000000010 RSI: 0000000020ccb000 RDI: 0000000000000003
RBP: 00007f62c1091e90 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000296 R12: 00007f62c10926c0
R13: ffffffffffffff88 R14: 0000000000000000 R15: 00007ffe237885b0
 </TASK>

Fixes: ec94c26 ("tcp/dccp: avoid one atomic operation for timewait hashdance")
Reported-by: Anderson Nascimento <[email protected]>
Closes: https://lore.kernel.org/netdev/[email protected]/
Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
dkodihal pushed a commit to NVIDIA/linux that referenced this pull request Oct 15, 2024
[ Upstream commit d101291 ]

syzbot is able to trigger the following crash [1],
caused by unsafe ip6_dst_idev() use.

Indeed ip6_dst_idev() can return NULL, and must always be checked.

[1]

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [openbmc#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 0 PID: 31648 Comm: syz-executor.0 Not tainted 6.9.0-rc4-next-20240417-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
 RIP: 0010:__fib6_rule_action net/ipv6/fib6_rules.c:237 [inline]
 RIP: 0010:fib6_rule_action+0x241/0x7b0 net/ipv6/fib6_rules.c:267
Code: 02 00 00 49 8d 9f d8 00 00 00 48 89 d8 48 c1 e8 03 42 80 3c 20 00 74 08 48 89 df e8 f9 32 bf f7 48 8b 1b 48 89 d8 48 c1 e8 03 <42> 80 3c 20 00 74 08 48 89 df e8 e0 32 bf f7 4c 8b 03 48 89 ef 4c
RSP: 0018:ffffc9000fc1f2f0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 1a772f98c8186700
RDX: 0000000000000003 RSI: ffffffff8bcac4e0 RDI: ffffffff8c1f9760
RBP: ffff8880673fb980 R08: ffffffff8fac15ef R09: 1ffffffff1f582bd
R10: dffffc0000000000 R11: fffffbfff1f582be R12: dffffc0000000000
R13: 0000000000000080 R14: ffff888076509000 R15: ffff88807a029a00
FS:  00007f55e82ca6c0(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b31d23000 CR3: 0000000022b66000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  fib_rules_lookup+0x62c/0xdb0 net/core/fib_rules.c:317
  fib6_rule_lookup+0x1fd/0x790 net/ipv6/fib6_rules.c:108
  ip6_route_output_flags_noref net/ipv6/route.c:2637 [inline]
  ip6_route_output_flags+0x38e/0x610 net/ipv6/route.c:2649
  ip6_route_output include/net/ip6_route.h:93 [inline]
  ip6_dst_lookup_tail+0x189/0x11a0 net/ipv6/ip6_output.c:1120
  ip6_dst_lookup_flow+0xb9/0x180 net/ipv6/ip6_output.c:1250
  sctp_v6_get_dst+0x792/0x1e20 net/sctp/ipv6.c:326
  sctp_transport_route+0x12c/0x2e0 net/sctp/transport.c:455
  sctp_assoc_add_peer+0x614/0x15c0 net/sctp/associola.c:662
  sctp_connect_new_asoc+0x31d/0x6c0 net/sctp/socket.c:1099
  __sctp_connect+0x66d/0xe30 net/sctp/socket.c:1197
  sctp_connect net/sctp/socket.c:4819 [inline]
  sctp_inet_connect+0x149/0x1f0 net/sctp/socket.c:4834
  __sys_connect_file net/socket.c:2048 [inline]
  __sys_connect+0x2df/0x310 net/socket.c:2065
  __do_sys_connect net/socket.c:2075 [inline]
  __se_sys_connect net/socket.c:2072 [inline]
  __x64_sys_connect+0x7a/0x90 net/socket.c:2072
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: 5e5f3f0 ("[IPV6] ADDRCONF: Convert ipv6_get_saddr() to ipv6_dev_get_saddr().")
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
dkodihal pushed a commit to NVIDIA/linux that referenced this pull request Oct 15, 2024
[ Upstream commit 4db783d ]

According to syzbot, there is a chance that ip6_dst_idev()
returns NULL in ip6_output(). Most places in IPv6 stack
deal with a NULL idev just fine, but not here.

syzbot reported:

general protection fault, probably for non-canonical address 0xdffffc00000000bc: 0000 [openbmc#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x00000000000005e0-0x00000000000005e7]
CPU: 0 PID: 9775 Comm: syz-executor.4 Not tainted 6.9.0-rc5-syzkaller-00157-g6a30653b604a #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
 RIP: 0010:ip6_output+0x231/0x3f0 net/ipv6/ip6_output.c:237
Code: 3c 1e 00 49 89 df 74 08 4c 89 ef e8 19 58 db f7 48 8b 44 24 20 49 89 45 00 49 89 c5 48 8d 9d e0 05 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 38 84 c0 4c 8b 74 24 28 0f 85 61 01 00 00 8b 1b 31 ff
RSP: 0018:ffffc9000927f0d8 EFLAGS: 00010202
RAX: 00000000000000bc RBX: 00000000000005e0 RCX: 0000000000040000
RDX: ffffc900131f9000 RSI: 0000000000004f47 RDI: 0000000000004f48
RBP: 0000000000000000 R08: ffffffff8a1f0b9a R09: 1ffffffff1f51fad
R10: dffffc0000000000 R11: fffffbfff1f51fae R12: ffff8880293ec8c0
R13: ffff88805d7fc000 R14: 1ffff1100527d91a R15: dffffc0000000000
FS:  00007f135c6856c0(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000080 CR3: 0000000064096000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  NF_HOOK include/linux/netfilter.h:314 [inline]
  ip6_xmit+0xefe/0x17f0 net/ipv6/ip6_output.c:358
  sctp_v6_xmit+0x9f2/0x13f0 net/sctp/ipv6.c:248
  sctp_packet_transmit+0x26ad/0x2ca0 net/sctp/output.c:653
  sctp_packet_singleton+0x22c/0x320 net/sctp/outqueue.c:783
  sctp_outq_flush_ctrl net/sctp/outqueue.c:914 [inline]
  sctp_outq_flush+0x6d5/0x3e20 net/sctp/outqueue.c:1212
  sctp_side_effects net/sctp/sm_sideeffect.c:1198 [inline]
  sctp_do_sm+0x59cc/0x60c0 net/sctp/sm_sideeffect.c:1169
  sctp_primitive_ASSOCIATE+0x95/0xc0 net/sctp/primitive.c:73
  __sctp_connect+0x9cd/0xe30 net/sctp/socket.c:1234
  sctp_connect net/sctp/socket.c:4819 [inline]
  sctp_inet_connect+0x149/0x1f0 net/sctp/socket.c:4834
  __sys_connect_file net/socket.c:2048 [inline]
  __sys_connect+0x2df/0x310 net/socket.c:2065
  __do_sys_connect net/socket.c:2075 [inline]
  __se_sys_connect net/socket.c:2072 [inline]
  __x64_sys_connect+0x7a/0x90 net/socket.c:2072
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: 778d80b ("ipv6: Add disable_ipv6 sysctl to disable IPv6 operaion on specific interface.")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Larysa Zaremba <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
amboar pushed a commit that referenced this pull request Oct 18, 2024
…tion to perf_sched__replay()

[ Upstream commit c690786 ]

The start_work_mutex and work_done_wait_mutex are used only for the
'perf sched replay'. Put their initialization in perf_sched__replay () to
reduce unnecessary actions in other commands.

Simple functional testing:

  # perf sched record perf bench sched messaging
  # Running 'sched/messaging' benchmark:
  # 20 sender and receiver processes per group
  # 10 groups == 400 processes run

       Total time: 0.197 [sec]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 14.952 MB perf.data (134165 samples) ]

  # perf sched replay
  run measurement overhead: 108 nsecs
  sleep measurement overhead: 65658 nsecs
  the run test took 999991 nsecs
  the sleep test took 1079324 nsecs
  nr_run_events:        42378
  nr_sleep_events:      43102
  nr_wakeup_events:     31852
  target-less wakeups:  17
  multi-target wakeups: 712
  task      0 (             swapper:         0), nr_events: 10451
  task      1 (             swapper:         1), nr_events: 3
  task      2 (             swapper:         2), nr_events: 1
  <SNIP>
  task    717 (     sched-messaging:     74483), nr_events: 152
  task    718 (     sched-messaging:     74484), nr_events: 1944
  task    719 (     sched-messaging:     74485), nr_events: 73
  task    720 (     sched-messaging:     74486), nr_events: 163
  task    721 (     sched-messaging:     74487), nr_events: 942
  task    722 (     sched-messaging:     74488), nr_events: 78
  task    723 (     sched-messaging:     74489), nr_events: 1090
  ------------------------------------------------------------
  #1  : 1366.507, ravg: 1366.51, cpu: 7682.70 / 7682.70
  #2  : 1410.072, ravg: 1370.86, cpu: 7723.88 / 7686.82
  #3  : 1396.296, ravg: 1373.41, cpu: 7568.20 / 7674.96
  #4  : 1381.019, ravg: 1374.17, cpu: 7531.81 / 7660.64
  #5  : 1393.826, ravg: 1376.13, cpu: 7725.25 / 7667.11
  #6  : 1401.581, ravg: 1378.68, cpu: 7594.82 / 7659.88
  #7  : 1381.337, ravg: 1378.94, cpu: 7371.22 / 7631.01
  #8  : 1373.842, ravg: 1378.43, cpu: 7894.92 / 7657.40
  #9  : 1364.697, ravg: 1377.06, cpu: 7324.91 / 7624.15
  #10 : 1363.613, ravg: 1375.72, cpu: 7209.55 / 7582.69
  # echo $?
  0

Signed-off-by: Yang Jihong <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Stable-dep-of: 1a5efc9 ("libsubcmd: Don't free the usage string")
Signed-off-by: Sasha Levin <[email protected]>
amboar pushed a commit that referenced this pull request Oct 18, 2024
…f_sched__{lat|map|replay}()

[ Upstream commit bd2cdf2 ]

The curr_pid and cpu_last_switched are used only for the
'perf sched replay/latency/map'. Put their initialization in
perf_sched__{lat|map|replay () to reduce unnecessary actions in other
commands.

Simple functional testing:

  # perf sched record perf bench sched messaging
  # Running 'sched/messaging' benchmark:
  # 20 sender and receiver processes per group
  # 10 groups == 400 processes run

       Total time: 0.209 [sec]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 16.456 MB perf.data (147907 samples) ]

  # perf sched lat

   -------------------------------------------------------------------------------------------------------------------------------------------
    Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Max delay start           | Max delay end          |
   -------------------------------------------------------------------------------------------------------------------------------------------
    sched-messaging:(401) |   2990.699 ms |    38705 | avg:   0.661 ms | max:  67.046 ms | max start: 456532.624830 s | max end: 456532.691876 s
    qemu-system-x86:(7)   |    179.764 ms |     2191 | avg:   0.152 ms | max:  21.857 ms | max start: 456532.576434 s | max end: 456532.598291 s
    sshd:48125            |      0.522 ms |        2 | avg:   0.037 ms | max:   0.046 ms | max start: 456532.514610 s | max end: 456532.514656 s
  <SNIP>
    ksoftirqd/11:82       |      0.063 ms |        1 | avg:   0.005 ms | max:   0.005 ms | max start: 456532.769366 s | max end: 456532.769371 s
    kworker/9:0-mm_:34624 |      0.233 ms |       20 | avg:   0.004 ms | max:   0.007 ms | max start: 456532.690804 s | max end: 456532.690812 s
    migration/13:93       |      0.000 ms |        1 | avg:   0.004 ms | max:   0.004 ms | max start: 456532.512669 s | max end: 456532.512674 s
   -----------------------------------------------------------------------------------------------------------------
    TOTAL:                |   3180.750 ms |    41368 |
   ---------------------------------------------------

  # echo $?
  0

  # perf sched map
    *A0                                                               456532.510141 secs A0 => migration/0:15
    *.                                                                456532.510171 secs .  => swapper:0
     .  *B0                                                           456532.510261 secs B0 => migration/1:21
     .  *.                                                            456532.510279 secs
  <SNIP>
     L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7  .   .   .   .    456532.785979 secs
     L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7  .   .   .    456532.786054 secs
     L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7  .   .    456532.786127 secs
     L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7  .    456532.786197 secs
     L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7   456532.786270 secs
  # echo $?
  0

  # perf sched replay
  run measurement overhead: 108 nsecs
  sleep measurement overhead: 66473 nsecs
  the run test took 1000002 nsecs
  the sleep test took 1082686 nsecs
  nr_run_events:        49334
  nr_sleep_events:      50054
  nr_wakeup_events:     34701
  target-less wakeups:  165
  multi-target wakeups: 766
  task      0 (             swapper:         0), nr_events: 15419
  task      1 (             swapper:         1), nr_events: 1
  task      2 (             swapper:         2), nr_events: 1
  <SNIP>
  task    715 (     sched-messaging:    110248), nr_events: 1438
  task    716 (     sched-messaging:    110249), nr_events: 512
  task    717 (     sched-messaging:    110250), nr_events: 500
  task    718 (     sched-messaging:    110251), nr_events: 537
  task    719 (     sched-messaging:    110252), nr_events: 823
  ------------------------------------------------------------
  #1  : 1325.288, ravg: 1325.29, cpu: 7823.35 / 7823.35
  #2  : 1363.606, ravg: 1329.12, cpu: 7655.53 / 7806.56
  #3  : 1349.494, ravg: 1331.16, cpu: 7544.80 / 7780.39
  #4  : 1311.488, ravg: 1329.19, cpu: 7495.13 / 7751.86
  #5  : 1309.902, ravg: 1327.26, cpu: 7266.65 / 7703.34
  #6  : 1309.535, ravg: 1325.49, cpu: 7843.86 / 7717.39
  #7  : 1316.482, ravg: 1324.59, cpu: 7854.41 / 7731.09
  #8  : 1366.604, ravg: 1328.79, cpu: 7955.81 / 7753.57
  #9  : 1326.286, ravg: 1328.54, cpu: 7466.86 / 7724.90
  #10 : 1356.653, ravg: 1331.35, cpu: 7566.60 / 7709.07
  # echo $?
  0

Signed-off-by: Yang Jihong <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Stable-dep-of: 1a5efc9 ("libsubcmd: Don't free the usage string")
Signed-off-by: Sasha Levin <[email protected]>
amboar pushed a commit that referenced this pull request Oct 18, 2024
[ Upstream commit d1bc560 ]

Add nested locking with I_MUTEX_XATTR subclass to avoid lockdep warning
while handling xattr inode on file open syscall at ext4_xattr_inode_iget.

Backtrace
EXT4-fs (loop0): Ignoring removed oldalloc option
======================================================
WARNING: possible circular locking dependency detected
5.10.0-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor543/2794 is trying to acquire lock:
ffff8880215e1a48 (&ea_inode->i_rwsem#7/1){+.+.}-{3:3}, at: inode_lock include/linux/fs.h:782 [inline]
ffff8880215e1a48 (&ea_inode->i_rwsem#7/1){+.+.}-{3:3}, at: ext4_xattr_inode_iget+0x42a/0x5c0 fs/ext4/xattr.c:425

but task is already holding lock:
ffff8880215e3278 (&ei->i_data_sem/3){++++}-{3:3}, at: ext4_setattr+0x136d/0x19c0 fs/ext4/inode.c:5559

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&ei->i_data_sem/3){++++}-{3:3}:
       lock_acquire+0x197/0x480 kernel/locking/lockdep.c:5566
       down_write+0x93/0x180 kernel/locking/rwsem.c:1564
       ext4_update_i_disksize fs/ext4/ext4.h:3267 [inline]
       ext4_xattr_inode_write fs/ext4/xattr.c:1390 [inline]
       ext4_xattr_inode_lookup_create fs/ext4/xattr.c:1538 [inline]
       ext4_xattr_set_entry+0x331a/0x3d80 fs/ext4/xattr.c:1662
       ext4_xattr_ibody_set+0x124/0x390 fs/ext4/xattr.c:2228
       ext4_xattr_set_handle+0xc27/0x14e0 fs/ext4/xattr.c:2385
       ext4_xattr_set+0x219/0x390 fs/ext4/xattr.c:2498
       ext4_xattr_user_set+0xc9/0xf0 fs/ext4/xattr_user.c:40
       __vfs_setxattr+0x404/0x450 fs/xattr.c:177
       __vfs_setxattr_noperm+0x11d/0x4f0 fs/xattr.c:208
       __vfs_setxattr_locked+0x1f9/0x210 fs/xattr.c:266
       vfs_setxattr+0x112/0x2c0 fs/xattr.c:283
       setxattr+0x1db/0x3e0 fs/xattr.c:548
       path_setxattr+0x15a/0x240 fs/xattr.c:567
       __do_sys_setxattr fs/xattr.c:582 [inline]
       __se_sys_setxattr fs/xattr.c:578 [inline]
       __x64_sys_setxattr+0xc5/0xe0 fs/xattr.c:578
       do_syscall_64+0x6d/0xa0 arch/x86/entry/common.c:62
       entry_SYSCALL_64_after_hwframe+0x61/0xcb

-> #0 (&ea_inode->i_rwsem#7/1){+.+.}-{3:3}:
       check_prev_add kernel/locking/lockdep.c:2988 [inline]
       check_prevs_add kernel/locking/lockdep.c:3113 [inline]
       validate_chain+0x1695/0x58f0 kernel/locking/lockdep.c:3729
       __lock_acquire+0x12fd/0x20d0 kernel/locking/lockdep.c:4955
       lock_acquire+0x197/0x480 kernel/locking/lockdep.c:5566
       down_write+0x93/0x180 kernel/locking/rwsem.c:1564
       inode_lock include/linux/fs.h:782 [inline]
       ext4_xattr_inode_iget+0x42a/0x5c0 fs/ext4/xattr.c:425
       ext4_xattr_inode_get+0x138/0x410 fs/ext4/xattr.c:485
       ext4_xattr_move_to_block fs/ext4/xattr.c:2580 [inline]
       ext4_xattr_make_inode_space fs/ext4/xattr.c:2682 [inline]
       ext4_expand_extra_isize_ea+0xe70/0x1bb0 fs/ext4/xattr.c:2774
       __ext4_expand_extra_isize+0x304/0x3f0 fs/ext4/inode.c:5898
       ext4_try_to_expand_extra_isize fs/ext4/inode.c:5941 [inline]
       __ext4_mark_inode_dirty+0x591/0x810 fs/ext4/inode.c:6018
       ext4_setattr+0x1400/0x19c0 fs/ext4/inode.c:5562
       notify_change+0xbb6/0xe60 fs/attr.c:435
       do_truncate+0x1de/0x2c0 fs/open.c:64
       handle_truncate fs/namei.c:2970 [inline]
       do_open fs/namei.c:3311 [inline]
       path_openat+0x29f3/0x3290 fs/namei.c:3425
       do_filp_open+0x20b/0x450 fs/namei.c:3452
       do_sys_openat2+0x124/0x460 fs/open.c:1207
       do_sys_open fs/open.c:1223 [inline]
       __do_sys_open fs/open.c:1231 [inline]
       __se_sys_open fs/open.c:1227 [inline]
       __x64_sys_open+0x221/0x270 fs/open.c:1227
       do_syscall_64+0x6d/0xa0 arch/x86/entry/common.c:62
       entry_SYSCALL_64_after_hwframe+0x61/0xcb

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&ei->i_data_sem/3);
                               lock(&ea_inode->i_rwsem#7/1);
                               lock(&ei->i_data_sem/3);
  lock(&ea_inode->i_rwsem#7/1);

 *** DEADLOCK ***

5 locks held by syz-executor543/2794:
 #0: ffff888026fbc448 (sb_writers#4){.+.+}-{0:0}, at: mnt_want_write+0x4a/0x2a0 fs/namespace.c:365
 #1: ffff8880215e3488 (&sb->s_type->i_mutex_key#7){++++}-{3:3}, at: inode_lock include/linux/fs.h:782 [inline]
 #1: ffff8880215e3488 (&sb->s_type->i_mutex_key#7){++++}-{3:3}, at: do_truncate+0x1cf/0x2c0 fs/open.c:62
 #2: ffff8880215e3310 (&ei->i_mmap_sem){++++}-{3:3}, at: ext4_setattr+0xec4/0x19c0 fs/ext4/inode.c:5519
 #3: ffff8880215e3278 (&ei->i_data_sem/3){++++}-{3:3}, at: ext4_setattr+0x136d/0x19c0 fs/ext4/inode.c:5559
 #4: ffff8880215e30c8 (&ei->xattr_sem){++++}-{3:3}, at: ext4_write_trylock_xattr fs/ext4/xattr.h:162 [inline]
 #4: ffff8880215e30c8 (&ei->xattr_sem){++++}-{3:3}, at: ext4_try_to_expand_extra_isize fs/ext4/inode.c:5938 [inline]
 #4: ffff8880215e30c8 (&ei->xattr_sem){++++}-{3:3}, at: __ext4_mark_inode_dirty+0x4fb/0x810 fs/ext4/inode.c:6018

stack backtrace:
CPU: 1 PID: 2794 Comm: syz-executor543 Not tainted 5.10.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x177/0x211 lib/dump_stack.c:118
 print_circular_bug+0x146/0x1b0 kernel/locking/lockdep.c:2002
 check_noncircular+0x2cc/0x390 kernel/locking/lockdep.c:2123
 check_prev_add kernel/locking/lockdep.c:2988 [inline]
 check_prevs_add kernel/locking/lockdep.c:3113 [inline]
 validate_chain+0x1695/0x58f0 kernel/locking/lockdep.c:3729
 __lock_acquire+0x12fd/0x20d0 kernel/locking/lockdep.c:4955
 lock_acquire+0x197/0x480 kernel/locking/lockdep.c:5566
 down_write+0x93/0x180 kernel/locking/rwsem.c:1564
 inode_lock include/linux/fs.h:782 [inline]
 ext4_xattr_inode_iget+0x42a/0x5c0 fs/ext4/xattr.c:425
 ext4_xattr_inode_get+0x138/0x410 fs/ext4/xattr.c:485
 ext4_xattr_move_to_block fs/ext4/xattr.c:2580 [inline]
 ext4_xattr_make_inode_space fs/ext4/xattr.c:2682 [inline]
 ext4_expand_extra_isize_ea+0xe70/0x1bb0 fs/ext4/xattr.c:2774
 __ext4_expand_extra_isize+0x304/0x3f0 fs/ext4/inode.c:5898
 ext4_try_to_expand_extra_isize fs/ext4/inode.c:5941 [inline]
 __ext4_mark_inode_dirty+0x591/0x810 fs/ext4/inode.c:6018
 ext4_setattr+0x1400/0x19c0 fs/ext4/inode.c:5562
 notify_change+0xbb6/0xe60 fs/attr.c:435
 do_truncate+0x1de/0x2c0 fs/open.c:64
 handle_truncate fs/namei.c:2970 [inline]
 do_open fs/namei.c:3311 [inline]
 path_openat+0x29f3/0x3290 fs/namei.c:3425
 do_filp_open+0x20b/0x450 fs/namei.c:3452
 do_sys_openat2+0x124/0x460 fs/open.c:1207
 do_sys_open fs/open.c:1223 [inline]
 __do_sys_open fs/open.c:1231 [inline]
 __se_sys_open fs/open.c:1227 [inline]
 __x64_sys_open+0x221/0x270 fs/open.c:1227
 do_syscall_64+0x6d/0xa0 arch/x86/entry/common.c:62
 entry_SYSCALL_64_after_hwframe+0x61/0xcb
RIP: 0033:0x7f0cde4ea229
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 21 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd81d1c978 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
RAX: ffffffffffffffda RBX: 0030656c69662f30 RCX: 00007f0cde4ea229
RDX: 0000000000000089 RSI: 00000000000a0a00 RDI: 00000000200001c0
RBP: 2f30656c69662f2e R08: 0000000000208000 R09: 0000000000208000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd81d1c9c0
R13: 00007ffd81d1ca00 R14: 0000000000080000 R15: 0000000000000003
EXT4-fs error (device loop0): ext4_expand_extra_isize_ea:2730: inode #13: comm syz-executor543: corrupted in-inode xattr

Signed-off-by: Wojciech Gładysz <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
amboar pushed a commit that referenced this pull request Oct 18, 2024
[ Upstream commit 2a77767 ]

Current timeout handler of mad agent acquires/releases mad_agent_priv
lock for every timed out WRs. This causes heavy locking contention
when higher no. of WRs are to be handled inside timeout handler.

This leads to softlockup with below trace in some use cases where
rdma-cm path is used to establish connection between peer nodes

Trace:
-----
 BUG: soft lockup - CPU#4 stuck for 26s! [kworker/u128:3:19767]
 CPU: 4 PID: 19767 Comm: kworker/u128:3 Kdump: loaded Tainted: G OE
     -------  ---  5.14.0-427.13.1.el9_4.x86_64 #1
 Hardware name: Dell Inc. PowerEdge R740/01YM03, BIOS 2.4.8 11/26/2019
 Workqueue: ib_mad1 timeout_sends [ib_core]
 RIP: 0010:__do_softirq+0x78/0x2ac
 RSP: 0018:ffffb253449e4f98 EFLAGS: 00000246
 RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 000000000000001f
 RDX: 000000000000001d RSI: 000000003d1879ab RDI: fff363b66fd3a86b
 RBP: ffffb253604cbcd8 R08: 0000009065635f3b R09: 0000000000000000
 R10: 0000000000000040 R11: ffffb253449e4ff8 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000040
 FS:  0000000000000000(0000) GS:ffff8caa1fc80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fd9ec9db900 CR3: 0000000891934006 CR4: 00000000007706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  <IRQ>
  ? show_trace_log_lvl+0x1c4/0x2df
  ? show_trace_log_lvl+0x1c4/0x2df
  ? __irq_exit_rcu+0xa1/0xc0
  ? watchdog_timer_fn+0x1b2/0x210
  ? __pfx_watchdog_timer_fn+0x10/0x10
  ? __hrtimer_run_queues+0x127/0x2c0
  ? hrtimer_interrupt+0xfc/0x210
  ? __sysvec_apic_timer_interrupt+0x5c/0x110
  ? sysvec_apic_timer_interrupt+0x37/0x90
  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
  ? __do_softirq+0x78/0x2ac
  ? __do_softirq+0x60/0x2ac
  __irq_exit_rcu+0xa1/0xc0
  sysvec_call_function_single+0x72/0x90
  </IRQ>
  <TASK>
  asm_sysvec_call_function_single+0x16/0x20
 RIP: 0010:_raw_spin_unlock_irq+0x14/0x30
 RSP: 0018:ffffb253604cbd88 EFLAGS: 00000247
 RAX: 000000000001960d RBX: 0000000000000002 RCX: ffff8cad2a064800
 RDX: 000000008020001b RSI: 0000000000000001 RDI: ffff8cad5d39f66c
 RBP: ffff8cad5d39f600 R08: 0000000000000001 R09: 0000000000000000
 R10: ffff8caa443e0c00 R11: ffffb253604cbcd8 R12: ffff8cacb8682538
 R13: 0000000000000005 R14: ffffb253604cbd90 R15: ffff8cad5d39f66c
  cm_process_send_error+0x122/0x1d0 [ib_cm]
  timeout_sends+0x1dd/0x270 [ib_core]
  process_one_work+0x1e2/0x3b0
  ? __pfx_worker_thread+0x10/0x10
  worker_thread+0x50/0x3a0
  ? __pfx_worker_thread+0x10/0x10
  kthread+0xdd/0x100
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x29/0x50
  </TASK>

Simplified timeout handler by creating local list of timed out WRs
and invoke send handler post creating the list. The new method acquires/
releases lock once to fetch the list and hence helps to reduce locking
contetiong when processing higher no. of WRs

Signed-off-by: Saravanan Vajravel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
amboar pushed a commit that referenced this pull request Oct 18, 2024
[ Upstream commit a848c29 ]

On the node of an NFS client, some files saved in the mountpoint of the
NFS server were copied to another location of the same NFS server.
Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference
crash with the following syslog:

[232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058
[232066.588586] Mem abort info:
[232066.588701]   ESR = 0x0000000096000007
[232066.588862]   EC = 0x25: DABT (current EL), IL = 32 bits
[232066.589084]   SET = 0, FnV = 0
[232066.589216]   EA = 0, S1PTW = 0
[232066.589340]   FSC = 0x07: level 3 translation fault
[232066.589559] Data abort info:
[232066.589683]   ISV = 0, ISS = 0x00000007
[232066.589842]   CM = 0, WnR = 0
[232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400
[232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000
[232066.590757] Internal error: Oops: 96000007 [#1] SMP
[232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2
[232066.591052]  vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs
[232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1
[232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06
[232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4]
[232066.598595] sp : ffff8000f568fc70
[232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000
[232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001
[232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050
[232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000
[232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000
[232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6
[232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828
[232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a
[232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058
[232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000
[232066.601636] Call trace:
[232066.601749]  nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.601998]  nfs4_do_reclaim+0x1b8/0x28c [nfsv4]
[232066.602218]  nfs4_state_manager+0x928/0x10f0 [nfsv4]
[232066.602455]  nfs4_run_state_manager+0x78/0x1b0 [nfsv4]
[232066.602690]  kthread+0x110/0x114
[232066.602830]  ret_from_fork+0x10/0x20
[232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00)
[232066.603284] SMP: stopping secondary CPUs
[232066.606936] Starting crashdump kernel...
[232066.607146] Bye!

Analysing the vmcore, we know that nfs4_copy_state listed by destination
nfs_server->ss_copies was added by the field copies in handle_async_copy(),
and we found a waiting copy process with the stack as:
PID: 3511963  TASK: ffff710028b47e00  CPU: 0   COMMAND: "cp"
 #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4
 #1 [ffff8001116ef760] __schedule at ffff800008dd0650
 #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00
 #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0
 #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c
 #5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898
 #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4]
 #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4]
 #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4]
 #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4]

The NULL-pointer dereference was due to nfs42_complete_copies() listed
the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state.
So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and
the data accessed through this pointer was also incorrect. Generally,
the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or
open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state().
When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED
and copies are not deleted in nfs_server->ss_copies, the source state
may be passed to the nfs42_complete_copies() process earlier, resulting
in this crash scene finally. To solve this issue, we add a list_head
nfs_server->ss_src_copies for a server-to-server copy specially.

Fixes: 0e65a32 ("NFS: handle source server reboot")
Signed-off-by: Yanjun Zhang <[email protected]>
Reviewed-by: Trond Myklebust <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
amboar pushed a commit that referenced this pull request Oct 18, 2024
[ Upstream commit f9ff766 ]

Fix a kernel panic in the br_netfilter module when sending untagged
traffic via a VxLAN device.
This happens during the check for fragmentation in br_nf_dev_queue_xmit.

It is dependent on:
1) the br_netfilter module being loaded;
2) net.bridge.bridge-nf-call-iptables set to 1;
3) a bridge with a VxLAN (single-vxlan-device) netdevice as a bridge port;
4) untagged frames with size higher than the VxLAN MTU forwarded/flooded

When forwarding the untagged packet to the VxLAN bridge port, before
the netfilter hooks are called, br_handle_egress_vlan_tunnel is called and
changes the skb_dst to the tunnel dst. The tunnel_dst is a metadata type
of dst, i.e., skb_valid_dst(skb) is false, and metadata->dst.dev is NULL.

Then in the br_netfilter hooks, in br_nf_dev_queue_xmit, there's a check
for frames that needs to be fragmented: frames with higher MTU than the
VxLAN device end up calling br_nf_ip_fragment, which in turns call
ip_skb_dst_mtu.

The ip_dst_mtu tries to use the skb_dst(skb) as if it was a valid dst
with valid dst->dev, thus the crash.

This case was never supported in the first place, so drop the packet
instead.

PING 10.0.0.2 (10.0.0.2) from 0.0.0.0 h1-eth0: 2000(2028) bytes of data.
[  176.291791] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000110
[  176.292101] Mem abort info:
[  176.292184]   ESR = 0x0000000096000004
[  176.292322]   EC = 0x25: DABT (current EL), IL = 32 bits
[  176.292530]   SET = 0, FnV = 0
[  176.292709]   EA = 0, S1PTW = 0
[  176.292862]   FSC = 0x04: level 0 translation fault
[  176.293013] Data abort info:
[  176.293104]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[  176.293488]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  176.293787]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  176.293995] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043ef5000
[  176.294166] [0000000000000110] pgd=0000000000000000,
p4d=0000000000000000
[  176.294827] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[  176.295252] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel veth
br_netfilter bridge stp llc ipv6 crct10dif_ce
[  176.295923] CPU: 0 PID: 188 Comm: ping Not tainted
6.8.0-rc3-g5b3fbd61b9d1 #2
[  176.296314] Hardware name: linux,dummy-virt (DT)
[  176.296535] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[  176.296808] pc : br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
[  176.297382] lr : br_nf_dev_queue_xmit+0x2ac/0x4ec [br_netfilter]
[  176.297636] sp : ffff800080003630
[  176.297743] x29: ffff800080003630 x28: 0000000000000008 x27:
ffff6828c49ad9f8
[  176.298093] x26: ffff6828c49ad000 x25: 0000000000000000 x24:
00000000000003e8
[  176.298430] x23: 0000000000000000 x22: ffff6828c4960b40 x21:
ffff6828c3b16d28
[  176.298652] x20: ffff6828c3167048 x19: ffff6828c3b16d00 x18:
0000000000000014
[  176.298926] x17: ffffb0476322f000 x16: ffffb7e164023730 x15:
0000000095744632
[  176.299296] x14: ffff6828c3f1c880 x13: 0000000000000002 x12:
ffffb7e137926a70
[  176.299574] x11: 0000000000000001 x10: ffff6828c3f1c898 x9 :
0000000000000000
[  176.300049] x8 : ffff6828c49bf070 x7 : 0008460f18d5f20e x6 :
f20e0100bebafeca
[  176.300302] x5 : ffff6828c7f918fe x4 : ffff6828c49bf070 x3 :
0000000000000000
[  176.300586] x2 : 0000000000000000 x1 : ffff6828c3c7ad00 x0 :
ffff6828c7f918f0
[  176.300889] Call trace:
[  176.301123]  br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
[  176.301411]  br_nf_post_routing+0x2a8/0x3e4 [br_netfilter]
[  176.301703]  nf_hook_slow+0x48/0x124
[  176.302060]  br_forward_finish+0xc8/0xe8 [bridge]
[  176.302371]  br_nf_hook_thresh+0x124/0x134 [br_netfilter]
[  176.302605]  br_nf_forward_finish+0x118/0x22c [br_netfilter]
[  176.302824]  br_nf_forward_ip.part.0+0x264/0x290 [br_netfilter]
[  176.303136]  br_nf_forward+0x2b8/0x4e0 [br_netfilter]
[  176.303359]  nf_hook_slow+0x48/0x124
[  176.303803]  __br_forward+0xc4/0x194 [bridge]
[  176.304013]  br_flood+0xd4/0x168 [bridge]
[  176.304300]  br_handle_frame_finish+0x1d4/0x5c4 [bridge]
[  176.304536]  br_nf_hook_thresh+0x124/0x134 [br_netfilter]
[  176.304978]  br_nf_pre_routing_finish+0x29c/0x494 [br_netfilter]
[  176.305188]  br_nf_pre_routing+0x250/0x524 [br_netfilter]
[  176.305428]  br_handle_frame+0x244/0x3cc [bridge]
[  176.305695]  __netif_receive_skb_core.constprop.0+0x33c/0xecc
[  176.306080]  __netif_receive_skb_one_core+0x40/0x8c
[  176.306197]  __netif_receive_skb+0x18/0x64
[  176.306369]  process_backlog+0x80/0x124
[  176.306540]  __napi_poll+0x38/0x17c
[  176.306636]  net_rx_action+0x124/0x26c
[  176.306758]  __do_softirq+0x100/0x26c
[  176.307051]  ____do_softirq+0x10/0x1c
[  176.307162]  call_on_irq_stack+0x24/0x4c
[  176.307289]  do_softirq_own_stack+0x1c/0x2c
[  176.307396]  do_softirq+0x54/0x6c
[  176.307485]  __local_bh_enable_ip+0x8c/0x98
[  176.307637]  __dev_queue_xmit+0x22c/0xd28
[  176.307775]  neigh_resolve_output+0xf4/0x1a0
[  176.308018]  ip_finish_output2+0x1c8/0x628
[  176.308137]  ip_do_fragment+0x5b4/0x658
[  176.308279]  ip_fragment.constprop.0+0x48/0xec
[  176.308420]  __ip_finish_output+0xa4/0x254
[  176.308593]  ip_finish_output+0x34/0x130
[  176.308814]  ip_output+0x6c/0x108
[  176.308929]  ip_send_skb+0x50/0xf0
[  176.309095]  ip_push_pending_frames+0x30/0x54
[  176.309254]  raw_sendmsg+0x758/0xaec
[  176.309568]  inet_sendmsg+0x44/0x70
[  176.309667]  __sys_sendto+0x110/0x178
[  176.309758]  __arm64_sys_sendto+0x28/0x38
[  176.309918]  invoke_syscall+0x48/0x110
[  176.310211]  el0_svc_common.constprop.0+0x40/0xe0
[  176.310353]  do_el0_svc+0x1c/0x28
[  176.310434]  el0_svc+0x34/0xb4
[  176.310551]  el0t_64_sync_handler+0x120/0x12c
[  176.310690]  el0t_64_sync+0x190/0x194
[  176.311066] Code: f9402e61 79402aa2 927ff821 f9400023 (f9408860)
[  176.315743] ---[ end trace 0000000000000000 ]---
[  176.316060] Kernel panic - not syncing: Oops: Fatal exception in
interrupt
[  176.316371] Kernel Offset: 0x37e0e3000000 from 0xffff800080000000
[  176.316564] PHYS_OFFSET: 0xffff97d780000000
[  176.316782] CPU features: 0x0,88000203,3c020000,0100421b
[  176.317210] Memory Limit: none
[  176.317527] ---[ end Kernel panic - not syncing: Oops: Fatal
Exception in interrupt ]---\

Fixes: 11538d0 ("bridge: vlan dst_metadata hooks in ingress and egress paths")
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: Andy Roulin <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
amboar pushed a commit that referenced this pull request Oct 18, 2024
[ Upstream commit 330a699 ]

Commit 004d250 ("igb: Fix igb_down hung on surprise removal")
changed igb_io_error_detected() to ignore non-fatal pcie errors in order
to avoid hung task that can happen when igb_down() is called multiple
times. This caused an issue when processing transient non-fatal errors.
igb_io_resume(), which is called after igb_io_error_detected(), assumes
that device is brought down by igb_io_error_detected() if the interface
is up. This resulted in panic with stacktrace below.

[ T3256] igb 0000:09:00.0 haeth0: igb: haeth0 NIC Link is Down
[  T292] pcieport 0000:00:1c.5: AER: Uncorrected (Non-Fatal) error received: 0000:09:00.0
[  T292] igb 0000:09:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
[  T292] igb 0000:09:00.0:   device [8086:1537] error status/mask=00004000/00000000
[  T292] igb 0000:09:00.0:    [14] CmpltTO [  200.105524,009][  T292] igb 0000:09:00.0: AER:   TLP Header: 00000000 00000000 00000000 00000000
[  T292] pcieport 0000:00:1c.5: AER: broadcast error_detected message
[  T292] igb 0000:09:00.0: Non-correctable non-fatal error reported.
[  T292] pcieport 0000:00:1c.5: AER: broadcast mmio_enabled message
[  T292] pcieport 0000:00:1c.5: AER: broadcast resume message
[  T292] ------------[ cut here ]------------
[  T292] kernel BUG at net/core/dev.c:6539!
[  T292] invalid opcode: 0000 [#1] PREEMPT SMP
[  T292] RIP: 0010:napi_enable+0x37/0x40
[  T292] Call Trace:
[  T292]  <TASK>
[  T292]  ? die+0x33/0x90
[  T292]  ? do_trap+0xdc/0x110
[  T292]  ? napi_enable+0x37/0x40
[  T292]  ? do_error_trap+0x70/0xb0
[  T292]  ? napi_enable+0x37/0x40
[  T292]  ? napi_enable+0x37/0x40
[  T292]  ? exc_invalid_op+0x4e/0x70
[  T292]  ? napi_enable+0x37/0x40
[  T292]  ? asm_exc_invalid_op+0x16/0x20
[  T292]  ? napi_enable+0x37/0x40
[  T292]  igb_up+0x41/0x150
[  T292]  igb_io_resume+0x25/0x70
[  T292]  report_resume+0x54/0x70
[  T292]  ? report_frozen_detected+0x20/0x20
[  T292]  pci_walk_bus+0x6c/0x90
[  T292]  ? aer_print_port_info+0xa0/0xa0
[  T292]  pcie_do_recovery+0x22f/0x380
[  T292]  aer_process_err_devices+0x110/0x160
[  T292]  aer_isr+0x1c1/0x1e0
[  T292]  ? disable_irq_nosync+0x10/0x10
[  T292]  irq_thread_fn+0x1a/0x60
[  T292]  irq_thread+0xe3/0x1a0
[  T292]  ? irq_set_affinity_notifier+0x120/0x120
[  T292]  ? irq_affinity_notify+0x100/0x100
[  T292]  kthread+0xe2/0x110
[  T292]  ? kthread_complete_and_exit+0x20/0x20
[  T292]  ret_from_fork+0x2d/0x50
[  T292]  ? kthread_complete_and_exit+0x20/0x20
[  T292]  ret_from_fork_asm+0x11/0x20
[  T292]  </TASK>

To fix this issue igb_io_resume() checks if the interface is running and
the device is not down this means igb_io_error_detected() did not bring
the device down and there is no need to bring it up.

Signed-off-by: Mohamed Khalfella <[email protected]>
Reviewed-by: Yuanyuan Zhong <[email protected]>
Fixes: 004d250 ("igb: Fix igb_down hung on surprise removal")
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
amboar pushed a commit that referenced this pull request Oct 18, 2024
[ Upstream commit 3cb7cf1 ]

Most qdiscs maintain their backlog using qdisc_pkt_len(skb)
on the assumption it is invariant between the enqueue()
and dequeue() handlers.

Unfortunately syzbot can crash a host rather easily using
a TBF + SFQ combination, with an STAB on SFQ [1]

We can't support TCA_STAB on arbitrary level, this would
require to maintain per-qdisc storage.

[1]
[   88.796496] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   88.798611] #PF: supervisor read access in kernel mode
[   88.799014] #PF: error_code(0x0000) - not-present page
[   88.799506] PGD 0 P4D 0
[   88.799829] Oops: Oops: 0000 [#1] SMP NOPTI
[   88.800569] CPU: 14 UID: 0 PID: 2053 Comm: b371744477 Not tainted 6.12.0-rc1-virtme #1117
[   88.801107] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   88.801779] RIP: 0010:sfq_dequeue (net/sched/sch_sfq.c:272 net/sched/sch_sfq.c:499) sch_sfq
[ 88.802544] Code: 0f b7 50 12 48 8d 04 d5 00 00 00 00 48 89 d6 48 29 d0 48 8b 91 c0 01 00 00 48 c1 e0 03 48 01 c2 66 83 7a 1a 00 7e c0 48 8b 3a <4c> 8b 07 4c 89 02 49 89 50 08 48 c7 47 08 00 00 00 00 48 c7 07 00
All code
========
   0:	0f b7 50 12          	movzwl 0x12(%rax),%edx
   4:	48 8d 04 d5 00 00 00 	lea    0x0(,%rdx,8),%rax
   b:	00
   c:	48 89 d6             	mov    %rdx,%rsi
   f:	48 29 d0             	sub    %rdx,%rax
  12:	48 8b 91 c0 01 00 00 	mov    0x1c0(%rcx),%rdx
  19:	48 c1 e0 03          	shl    $0x3,%rax
  1d:	48 01 c2             	add    %rax,%rdx
  20:	66 83 7a 1a 00       	cmpw   $0x0,0x1a(%rdx)
  25:	7e c0                	jle    0xffffffffffffffe7
  27:	48 8b 3a             	mov    (%rdx),%rdi
  2a:*	4c 8b 07             	mov    (%rdi),%r8		<-- trapping instruction
  2d:	4c 89 02             	mov    %r8,(%rdx)
  30:	49 89 50 08          	mov    %rdx,0x8(%r8)
  34:	48 c7 47 08 00 00 00 	movq   $0x0,0x8(%rdi)
  3b:	00
  3c:	48                   	rex.W
  3d:	c7                   	.byte 0xc7
  3e:	07                   	(bad)
	...

Code starting with the faulting instruction
===========================================
   0:	4c 8b 07             	mov    (%rdi),%r8
   3:	4c 89 02             	mov    %r8,(%rdx)
   6:	49 89 50 08          	mov    %rdx,0x8(%r8)
   a:	48 c7 47 08 00 00 00 	movq   $0x0,0x8(%rdi)
  11:	00
  12:	48                   	rex.W
  13:	c7                   	.byte 0xc7
  14:	07                   	(bad)
	...
[   88.803721] RSP: 0018:ffff9a1f892b7d58 EFLAGS: 00000206
[   88.804032] RAX: 0000000000000000 RBX: ffff9a1f8420c800 RCX: ffff9a1f8420c800
[   88.804560] RDX: ffff9a1f81bc1440 RSI: 0000000000000000 RDI: 0000000000000000
[   88.805056] RBP: ffffffffc04bb0e0 R08: 0000000000000001 R09: 00000000ff7f9a1f
[   88.805473] R10: 000000000001001b R11: 0000000000009a1f R12: 0000000000000140
[   88.806194] R13: 0000000000000001 R14: ffff9a1f886df400 R15: ffff9a1f886df4ac
[   88.806734] FS:  00007f445601a740(0000) GS:ffff9a2e7fd80000(0000) knlGS:0000000000000000
[   88.807225] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   88.807672] CR2: 0000000000000000 CR3: 000000050cc46000 CR4: 00000000000006f0
[   88.808165] Call Trace:
[   88.808459]  <TASK>
[   88.808710] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
[   88.809261] ? page_fault_oops (arch/x86/mm/fault.c:715)
[   88.809561] ? exc_page_fault (./arch/x86/include/asm/irqflags.h:26 ./arch/x86/include/asm/irqflags.h:87 ./arch/x86/include/asm/irqflags.h:147 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539)
[   88.809806] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
[   88.810074] ? sfq_dequeue (net/sched/sch_sfq.c:272 net/sched/sch_sfq.c:499) sch_sfq
[   88.810411] sfq_reset (net/sched/sch_sfq.c:525) sch_sfq
[   88.810671] qdisc_reset (./include/linux/skbuff.h:2135 ./include/linux/skbuff.h:2441 ./include/linux/skbuff.h:3304 ./include/linux/skbuff.h:3310 net/sched/sch_generic.c:1036)
[   88.810950] tbf_reset (./include/linux/timekeeping.h:169 net/sched/sch_tbf.c:334) sch_tbf
[   88.811208] qdisc_reset (./include/linux/skbuff.h:2135 ./include/linux/skbuff.h:2441 ./include/linux/skbuff.h:3304 ./include/linux/skbuff.h:3310 net/sched/sch_generic.c:1036)
[   88.811484] netif_set_real_num_tx_queues (./include/linux/spinlock.h:396 ./include/net/sch_generic.h:768 net/core/dev.c:2958)
[   88.811870] __tun_detach (drivers/net/tun.c:590 drivers/net/tun.c:673)
[   88.812271] tun_chr_close (drivers/net/tun.c:702 drivers/net/tun.c:3517)
[   88.812505] __fput (fs/file_table.c:432 (discriminator 1))
[   88.812735] task_work_run (kernel/task_work.c:230)
[   88.813016] do_exit (kernel/exit.c:940)
[   88.813372] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:58 (discriminator 4))
[   88.813639] ? handle_mm_fault (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/memcontrol.h:1022 ./include/linux/memcontrol.h:1045 ./include/linux/memcontrol.h:1052 mm/memory.c:5928 mm/memory.c:6088)
[   88.813867] do_group_exit (kernel/exit.c:1070)
[   88.814138] __x64_sys_exit_group (kernel/exit.c:1099)
[   88.814490] x64_sys_call (??:?)
[   88.814791] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
[   88.815012] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[   88.815495] RIP: 0033:0x7f44560f1975

Fixes: 175f9c1 ("net_sched: Add size table for qdiscs")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
amboar pushed a commit that referenced this pull request Oct 18, 2024
[ Upstream commit f7345cc ]

After a CPU has set itself offline and before it eventually calls
rcutree_report_cpu_dead(), there are still opportunities for callbacks
to be enqueued, for example from a softirq. When that happens on NOCB,
the rcuog wake-up is deferred through an IPI to an online CPU in order
not to call into the scheduler and risk arming the RT-bandwidth after
hrtimers have been migrated out and disabled.

But performing a synchronized IPI from a softirq is buggy as reported in
the following scenario:

        WARNING: CPU: 1 PID: 26 at kernel/smp.c:633 smp_call_function_single
        Modules linked in: rcutorture torture
        CPU: 1 UID: 0 PID: 26 Comm: migration/1 Not tainted 6.11.0-rc1-00012-g9139f93209d1 #1
        Stopper: multi_cpu_stop+0x0/0x320 <- __stop_cpus+0xd0/0x120
        RIP: 0010:smp_call_function_single
        <IRQ>
        swake_up_one_online
        __call_rcu_nocb_wake
        __call_rcu_common
        ? rcu_torture_one_read
        call_timer_fn
        __run_timers
        run_timer_softirq
        handle_softirqs
        irq_exit_rcu
        ? tick_handle_periodic
        sysvec_apic_timer_interrupt
        </IRQ>

Fix this with forcing deferred rcuog wake up through the NOCB timer when
the CPU is offline. The actual wake up will happen from
rcutree_report_cpu_dead().

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Fixes: 9139f93 ("rcu/nocb: Fix RT throttling hrtimer armed from offline CPU")
Reviewed-by: "Joel Fernandes (Google)" <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Neeraj Upadhyay <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
amboar pushed a commit that referenced this pull request Oct 18, 2024
commit 7d1fd36 upstream.

When running `kmscube` with one or more performance monitors enabled
via `GALLIUM_HUD`, the following kernel panic can occur:

[   55.008324] Unable to handle kernel paging request at virtual address 00000000052004a4
[   55.008368] Mem abort info:
[   55.008377]   ESR = 0x0000000096000005
[   55.008387]   EC = 0x25: DABT (current EL), IL = 32 bits
[   55.008402]   SET = 0, FnV = 0
[   55.008412]   EA = 0, S1PTW = 0
[   55.008421]   FSC = 0x05: level 1 translation fault
[   55.008434] Data abort info:
[   55.008442]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[   55.008455]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[   55.008467]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[   55.008481] user pgtable: 4k pages, 39-bit VAs, pgdp=00000001046c6000
[   55.008497] [00000000052004a4] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[   55.008525] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
[   55.008542] Modules linked in: rfcomm [...] vc4 v3d snd_soc_hdmi_codec drm_display_helper
gpu_sched drm_shmem_helper cec drm_dma_helper drm_kms_helper i2c_brcmstb
drm drm_panel_orientation_quirks snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm snd_timer snd backlight
[   55.008799] CPU: 2 PID: 166 Comm: v3d_bin Tainted: G         C         6.6.47+rpt-rpi-v8 #1  Debian 1:6.6.47-1+rpt1
[   55.008824] Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT)
[   55.008838] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   55.008855] pc : __mutex_lock.constprop.0+0x90/0x608
[   55.008879] lr : __mutex_lock.constprop.0+0x58/0x608
[   55.008895] sp : ffffffc080673cf0
[   55.008904] x29: ffffffc080673cf0 x28: 0000000000000000 x27: ffffff8106188a28
[   55.008926] x26: ffffff8101e78040 x25: ffffff8101baa6c0 x24: ffffffd9d989f148
[   55.008947] x23: ffffffda1c2a4008 x22: 0000000000000002 x21: ffffffc080673d38
[   55.008968] x20: ffffff8101238000 x19: ffffff8104f83188 x18: 0000000000000000
[   55.008988] x17: 0000000000000000 x16: ffffffda1bd04d18 x15: 00000055bb08bc90
[   55.009715] x14: 0000000000000000 x13: 0000000000000000 x12: ffffffda1bd4cbb0
[   55.010433] x11: 00000000fa83b2da x10: 0000000000001a40 x9 : ffffffda1bd04d04
[   55.011162] x8 : ffffff8102097b80 x7 : 0000000000000000 x6 : 00000000030a5857
[   55.011880] x5 : 00ffffffffffffff x4 : 0300000005200470 x3 : 0300000005200470
[   55.012598] x2 : ffffff8101238000 x1 : 0000000000000021 x0 : 0300000005200470
[   55.013292] Call trace:
[   55.013959]  __mutex_lock.constprop.0+0x90/0x608
[   55.014646]  __mutex_lock_slowpath+0x1c/0x30
[   55.015317]  mutex_lock+0x50/0x68
[   55.015961]  v3d_perfmon_stop+0x40/0xe0 [v3d]
[   55.016627]  v3d_bin_job_run+0x10c/0x2d8 [v3d]
[   55.017282]  drm_sched_main+0x178/0x3f8 [gpu_sched]
[   55.017921]  kthread+0x11c/0x128
[   55.018554]  ret_from_fork+0x10/0x20
[   55.019168] Code: f9400260 f1001c1f 54001ea9 927df000 (b9403401)
[   55.019776] ---[ end trace 0000000000000000 ]---
[   55.020411] note: v3d_bin[166] exited with preempt_count 1

This issue arises because, upon closing the file descriptor (which happens
when we interrupt `kmscube`), the active performance monitor is not
stopped. Although all perfmons are destroyed in `v3d_perfmon_close_file()`,
the active performance monitor's pointer (`v3d->active_perfmon`) is still
retained.

If `kmscube` is run again, the driver will attempt to stop the active
performance monitor using the stale pointer in `v3d->active_perfmon`.
However, this pointer is no longer valid because the previous process has
already terminated, and all performance monitors associated with it have
been destroyed and freed.

To fix this, when the active performance monitor belongs to a given
process, explicitly stop it before destroying and freeing it.

Cc: [email protected] # v5.15+
Closes: raspberrypi/linux#6389
Fixes: 26a4dc2 ("drm/v3d: Expose performance counters to userspace")
Signed-off-by: Maíra Canal <[email protected]>
Reviewed-by: Juan A. Suarez <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
amboar pushed a commit that referenced this pull request Oct 18, 2024
commit 1dae9f1 upstream.

The kernel may crash when deleting a genetlink family if there are still
listeners for that family:

Oops: Kernel access of bad area, sig: 11 [#1]
  ...
  NIP [c000000000c080bc] netlink_update_socket_mc+0x3c/0xc0
  LR [c000000000c0f764] __netlink_clear_multicast_users+0x74/0xc0
  Call Trace:
__netlink_clear_multicast_users+0x74/0xc0
genl_unregister_family+0xd4/0x2d0

Change the unsafe loop on the list to a safe one, because inside the
loop there is an element removal from this list.

Fixes: b827357 ("genetlink: fix netns vs. netlink table locking (2)")
Cc: [email protected]
Signed-off-by: Anastasia Kovaleva <[email protected]>
Reviewed-by: Dmitry Bogdanov <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant