-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix reading of thermal grade OTP value on i.MX7D #29
Conversation
Is it fixed in Linux mainline? @fabioestevam could you review this as well? |
On Fri, Jun 29, 2018 at 5:45 PM, Otavio Salvador ***@***.***> wrote:
Is it fixed in Linux mainline?
@fabioestevam <https://github.com/fabioestevam> could you review this as
well?
We need a detailed commit log explaining the original issue and how it
fixes the problem.
What is the thermal grade result prior and after this patch? What MX7D part
number does this fix affect?
It also needs to sent to the U-Boot list for proper review.
Thanks
|
This affects at least part MCIMX7D5EVM10SC and probably MCIMX7D5EVM10SD and other extended temperature range models, too (the letter 'E' being the key here). Before this patch these extended temperature range models (-20...105 C) were incorrectly detected as consumer models (0...95 C) and trip points were set accordingly too low. After the commit, the range is detected correctly by the driver. I don't know about the U-Boot list. Perhaps you point to the right direction or forward this there? |
If by mainline you mean the Linus tree (https://github.com/torvalds/linux/commits/master/drivers/thermal/imx_thermal.c), then no, it is not fixed there. |
Hi Jari,
On Mon, Jul 2, 2018 at 2:54 AM, Jari Tenhunen ***@***.***> wrote:
This affects at least part MCIMX7D5EVM10SC and probably MCIMX7D5EVM10SD
and other extended temperature range models, too (the letter 'E' being the
key here). Before this patch these extended temperature range models
(-20...105 C) were incorrectly detected as consumer models (0...95 C) and
trip points were set accordingly too low. After the commit, the range is
detected correctly by the driver.
Good. Please add such information into the patch commit log.
You also need to add your Signed-off-by tag.
I don't know about the U-Boot list. Perhaps you point to the right
direction or forward this there?
U-Boot list is [email protected] . You need to send it there (preferably
via git send-email).
|
@fabioestevam, in fact, his patch is for the Linux kernel, so it should be fixed on mainline Linux and send upstream. Am I mistaken? |
Added required information to revised commit. |
Hi Otavio,
On Mon, Jul 2, 2018 at 10:08 AM, Otavio Salvador ***@***.***> wrote:
@fabioestevam <https://github.com/fabioestevam>, in fact, his patch is
for the Linux kernel, so it should be fixed on mainline Linux and send
upstream. Am I mistaken?
Ops, sorry. I thought it was a U-Boot patch.
Yes, it would be really nice to fix this in the mainline kernel first.
Jari,
Please generate the patch against mainline kernel and run
./scripts/get_maintainer 0001-yourpatch.patch to see the people and lists
that you need to send this patch to.
Thaks
|
According to fusemap in reference manual, speed/thermal grading is configured at fuse address 0x440[9:8]. By reading the correct bits, this change fixes the detection of thermal grade at least on extended consumer grade models (P/N char 5E, 8E), possibly industrial and automative models, too. Tested on a board with P/N MCIMX7D5EVM10SC. Signed-off-by: Jari Tenhunen <[email protected]>
Thanks, but it looks like that the mainline kernel doesn't contain the required parent commits that your 4.9-2.0.x-imx tree has. I'll just leave it here. |
On Mon, Jul 2, 2018 at 11:36 AM, Jari Tenhunen ***@***.***> wrote:
Thanks, but it looks like that the mainline kernel doesn't contain the
required parent commits that your 4.9-2.0.x-imx tree has. I'll just leave
it here.
Your reworked patch with the commit log looks good, thanks.
Yes, the fix in mainline needs to be a bit different. Would you have
interest in fixing there as well?
|
[ Upstream commit e117cb5 ] Mark noticed that syzkaller is able to reliably trigger the following warning: dl_rq->running_bw > dl_rq->this_bw WARNING: CPU: 1 PID: 153 at kernel/sched/deadline.c:124 switched_from_dl+0x454/0x608 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 153 Comm: syz-executor253 Not tainted 4.18.0-rc3+ #29 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x458 show_stack+0x20/0x30 dump_stack+0x180/0x250 panic+0x2dc/0x4ec __warn_printk+0x0/0x150 report_bug+0x228/0x2d8 bug_handler+0xa0/0x1a0 brk_handler+0x2f0/0x568 do_debug_exception+0x1bc/0x5d0 el1_dbg+0x18/0x78 switched_from_dl+0x454/0x608 __sched_setscheduler+0x8cc/0x2018 sys_sched_setattr+0x340/0x758 el0_svc_naked+0x30/0x34 syzkaller reproducer runs a bunch of threads that constantly switch between DEADLINE and NORMAL classes while interacting through futexes. The splat above is caused by the fact that if a DEADLINE task is setattr back to NORMAL while in non_contending state (blocked on a futex - inactive timer armed), its contribution to running_bw is not removed before sub_rq_bw() gets called (!task_on_rq_queued() branch) and the latter sees running_bw > this_bw. Fix it by removing a task contribution from running_bw if the task is not queued and in non_contending state while switched to a different class. Reported-by: Mark Rutland <[email protected]> Signed-off-by: Juri Lelli <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Daniel Bristot de Oliveira <[email protected]> Reviewed-by: Luca Abeni <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit d3f07c0 upstream. syzbot found the following crash on: HEAD commit: d9bd94c Add linux-next specific files for 20180801 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=1001189c400000 kernel config: https://syzkaller.appspot.com/x/.config?x=cc8964ea4d04518c dashboard link: https://syzkaller.appspot.com/bug?extid=c966a82db0b14aa37e81 compiler: gcc (GCC) 8.0.1 20180413 (experimental) Unfortunately, I don't have any reproducer for this crash yet. IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: [email protected] loop7: rw=12288, want=8200, limit=20 netlink: 65342 bytes leftover after parsing attributes in process `syz-executor4'. openvswitch: netlink: Message has 8 unknown bytes. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [Freescale#1] SMP KASAN CPU: 1 PID: 7615 Comm: syz-executor7 Not tainted 4.18.0-rc7-next-20180801+ Freescale#29 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline] RIP: 0010:compound_head include/linux/page-flags.h:142 [inline] RIP: 0010:PageLocked include/linux/page-flags.h:272 [inline] RIP: 0010:f2fs_put_page fs/f2fs/f2fs.h:2011 [inline] RIP: 0010:validate_checkpoint+0x66d/0xec0 fs/f2fs/checkpoint.c:835 Code: e8 58 05 7f fe 4c 8d 6b 80 4d 8d 74 24 08 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 c6 04 02 00 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 f4 06 00 00 4c 89 ea 4d 8b 7c 24 08 48 b8 00 00 RSP: 0018:ffff8801937cebe8 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff8801937cef30 RCX: ffffc90006035000 RDX: 0000000000000000 RSI: ffffffff82fd9658 RDI: 0000000000000005 RBP: ffff8801937cef58 R08: ffff8801ab254700 R09: fffff94000d9e026 R10: fffff94000d9e026 R11: ffffea0006cf0137 R12: fffffffffffffffb R13: ffff8801937ceeb0 R14: 0000000000000003 R15: ffff880193419b40 FS: 00007f36a61d5700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc04ff93000 CR3: 00000001d0562000 CR4: 00000000001426e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: f2fs_get_valid_checkpoint+0x436/0x1ec0 fs/f2fs/checkpoint.c:860 f2fs_fill_super+0x2d42/0x8110 fs/f2fs/super.c:2883 mount_bdev+0x314/0x3e0 fs/super.c:1344 f2fs_mount+0x3c/0x50 fs/f2fs/super.c:3133 legacy_get_tree+0x131/0x460 fs/fs_context.c:729 vfs_get_tree+0x1cb/0x5c0 fs/super.c:1743 do_new_mount fs/namespace.c:2603 [inline] do_mount+0x6f2/0x1e20 fs/namespace.c:2927 ksys_mount+0x12d/0x140 fs/namespace.c:3143 __do_sys_mount fs/namespace.c:3157 [inline] __se_sys_mount fs/namespace.c:3154 [inline] __x64_sys_mount+0xbe/0x150 fs/namespace.c:3154 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x45943a Code: b8 a6 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 bd 8a fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 9a 8a fb ff c3 66 0f 1f 84 00 00 00 00 00 RSP: 002b:00007f36a61d4a88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 00007f36a61d4b30 RCX: 000000000045943a RDX: 00007f36a61d4ad0 RSI: 0000000020000100 RDI: 00007f36a61d4af0 RBP: 0000000020000100 R08: 00007f36a61d4b30 R09: 00007f36a61d4ad0 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000013 R13: 0000000000000000 R14: 00000000004c8ea0 R15: 0000000000000000 Modules linked in: Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace bd8550c129352286 ]--- RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline] RIP: 0010:compound_head include/linux/page-flags.h:142 [inline] RIP: 0010:PageLocked include/linux/page-flags.h:272 [inline] RIP: 0010:f2fs_put_page fs/f2fs/f2fs.h:2011 [inline] RIP: 0010:validate_checkpoint+0x66d/0xec0 fs/f2fs/checkpoint.c:835 Code: e8 58 05 7f fe 4c 8d 6b 80 4d 8d 74 24 08 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 c6 04 02 00 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 f4 06 00 00 4c 89 ea 4d 8b 7c 24 08 48 b8 00 00 RSP: 0018:ffff8801937cebe8 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff8801937cef30 RCX: ffffc90006035000 RDX: 0000000000000000 RSI: ffffffff82fd9658 RDI: 0000000000000005 netlink: 65342 bytes leftover after parsing attributes in process `syz-executor4'. RBP: ffff8801937cef58 R08: ffff8801ab254700 R09: fffff94000d9e026 openvswitch: netlink: Message has 8 unknown bytes. R10: fffff94000d9e026 R11: ffffea0006cf0137 R12: fffffffffffffffb R13: ffff8801937ceeb0 R14: 0000000000000003 R15: ffff880193419b40 FS: 00007f36a61d5700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc04ff93000 CR3: 00000001d0562000 CR4: 00000000001426e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 In validate_checkpoint(), if we failed to call get_checkpoint_version(), we will pass returned invalid page pointer into f2fs_put_page, cause accessing invalid memory, this patch tries to handle error path correctly to fix this issue. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
commit 36eb8ff upstream. Crash dump shows following instructions crash> bt PID: 0 TASK: ffffffffbe412480 CPU: 0 COMMAND: "swapper/0" #0 [ffff891ee0003868] machine_kexec at ffffffffbd063ef1 Freescale#1 [ffff891ee00038c8] __crash_kexec at ffffffffbd12b6f2 Freescale#2 [ffff891ee0003998] crash_kexec at ffffffffbd12c84c Freescale#3 [ffff891ee00039b8] oops_end at ffffffffbd030f0a Freescale#4 [ffff891ee00039e0] no_context at ffffffffbd074643 Freescale#5 [ffff891ee0003a40] __bad_area_nosemaphore at ffffffffbd07496e Freescale#6 [ffff891ee0003a90] bad_area_nosemaphore at ffffffffbd074a64 Freescale#7 [ffff891ee0003aa0] __do_page_fault at ffffffffbd074b0a Freescale#8 [ffff891ee0003b18] do_page_fault at ffffffffbd074fc8 Freescale#9 [ffff891ee0003b50] page_fault at ffffffffbda01925 [exception RIP: qlt_schedule_sess_for_deletion+15] RIP: ffffffffc02e526f RSP: ffff891ee0003c08 RFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffc0307847 RDX: 00000000000020e6 RSI: ffff891edbc377c8 RDI: 0000000000000000 RBP: ffff891ee0003c18 R8: ffffffffc02f0b20 R9: 0000000000000250 R10: 0000000000000258 R11: 000000000000b780 R12: ffff891ed9b43000 R13: 00000000000000f0 R14: 0000000000000006 R15: ffff891edbc377c8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 Freescale#10 [ffff891ee0003c20] qla2x00_fcport_event_handler at ffffffffc02853d3 [qla2xxx] Freescale#11 [ffff891ee0003cf0] __dta_qla24xx_async_gnl_sp_done_333 at ffffffffc0285a1d [qla2xxx] Freescale#12 [ffff891ee0003de8] qla24xx_process_response_queue at ffffffffc02a2eb5 [qla2xxx] Freescale#13 [ffff891ee0003e88] qla24xx_msix_rsp_q at ffffffffc02a5403 [qla2xxx] Freescale#14 [ffff891ee0003ec0] __handle_irq_event_percpu at ffffffffbd0f4c59 Freescale#15 [ffff891ee0003f10] handle_irq_event_percpu at ffffffffbd0f4e02 Freescale#16 [ffff891ee0003f40] handle_irq_event at ffffffffbd0f4e90 Freescale#17 [ffff891ee0003f68] handle_edge_irq at ffffffffbd0f8984 Freescale#18 [ffff891ee0003f88] handle_irq at ffffffffbd0305d5 Freescale#19 [ffff891ee0003fb8] do_IRQ at ffffffffbda02a18 --- <IRQ stack> --- Freescale#20 [ffffffffbe403d30] ret_from_intr at ffffffffbda0094e [exception RIP: unknown or invalid address] RIP: 000000000000001f RSP: 0000000000000000 RFLAGS: fff3b8c2091ebb3f RAX: ffffbba5a0000200 RBX: 0000be8cdfa8f9fa RCX: 0000000000000018 RDX: 0000000000000101 RSI: 000000000000015d RDI: 0000000000000193 RBP: 0000000000000083 R8: ffffffffbe403e38 R9: 0000000000000002 R10: 0000000000000000 R11: ffffffffbe56b820 R12: ffff891ee001cf00 R13: ffffffffbd11c0a4 R14: ffffffffbe403d60 R15: 0000000000000001 ORIG_RAX: ffff891ee0022ac0 CS: 0000 SS: ffffffffffffffb9 bt: WARNING: possibly bogus exception frame Freescale#21 [ffffffffbe403dd8] cpuidle_enter_state at ffffffffbd67c6fd Freescale#22 [ffffffffbe403e40] cpuidle_enter at ffffffffbd67c907 Freescale#23 [ffffffffbe403e50] call_cpuidle at ffffffffbd0d98f3 Freescale#24 [ffffffffbe403e60] do_idle at ffffffffbd0d9b42 Freescale#25 [ffffffffbe403e98] cpu_startup_entry at ffffffffbd0d9da3 Freescale#26 [ffffffffbe403ec0] rest_init at ffffffffbd81d4aa Freescale#27 [ffffffffbe403ed0] start_kernel at ffffffffbe67d2ca Freescale#28 [ffffffffbe403f28] x86_64_start_reservations at ffffffffbe67c675 Freescale#29 [ffffffffbe403f38] x86_64_start_kernel at ffffffffbe67c6eb Freescale#30 [ffffffffbe403f50] secondary_startup_64 at ffffffffbd0000d5 Fixes: 040036b ("scsi: qla2xxx: Delay loop id allocation at login") Cc: <[email protected]> # v4.17+ Signed-off-by: Chuck Anderson <[email protected]> Signed-off-by: Himanshu Madhani <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit fc27fe7 upstream. ALSA sequencer core has a mechanism to load the enumerated devices automatically, and it's performed in an off-load work. This seems causing some race when a sequencer is removed while the pending autoload work is running. As syzkaller spotted, it may lead to some use-after-free: BUG: KASAN: use-after-free in snd_rawmidi_dev_seq_free+0x69/0x70 sound/core/rawmidi.c:1617 Write of size 8 at addr ffff88006c611d90 by task kworker/2:1/567 CPU: 2 PID: 567 Comm: kworker/2:1 Not tainted 4.13.0+ Freescale#29 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: events autoload_drivers Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x192/0x22c lib/dump_stack.c:52 print_address_description+0x78/0x280 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x230/0x340 mm/kasan/report.c:409 __asan_report_store8_noabort+0x1c/0x20 mm/kasan/report.c:435 snd_rawmidi_dev_seq_free+0x69/0x70 sound/core/rawmidi.c:1617 snd_seq_dev_release+0x4f/0x70 sound/core/seq_device.c:192 device_release+0x13f/0x210 drivers/base/core.c:814 kobject_cleanup lib/kobject.c:648 [inline] kobject_release lib/kobject.c:677 [inline] kref_put include/linux/kref.h:70 [inline] kobject_put+0x145/0x240 lib/kobject.c:694 put_device+0x25/0x30 drivers/base/core.c:1799 klist_devices_put+0x36/0x40 drivers/base/bus.c:827 klist_next+0x264/0x4a0 lib/klist.c:403 next_device drivers/base/bus.c:270 [inline] bus_for_each_dev+0x17e/0x210 drivers/base/bus.c:312 autoload_drivers+0x3b/0x50 sound/core/seq_device.c:117 process_one_work+0x9fb/0x1570 kernel/workqueue.c:2097 worker_thread+0x1e4/0x1350 kernel/workqueue.c:2231 kthread+0x324/0x3f0 kernel/kthread.c:231 ret_from_fork+0x25/0x30 arch/x86/entry/entry_64.S:425 The fix is simply to assure canceling the autoload work at removing the device. Reported-by: Andrey Konovalov <[email protected]> Tested-by: Andrey Konovalov <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit 0d9c9a2 ] These functions are called from atomic context: [ 9.150239] BUG: sleeping function called from invalid context at /home/scott/git/linux/mm/slab.h:421 [ 9.158159] in_atomic(): 1, irqs_disabled(): 0, pid: 4432, name: ip [ 9.163128] CPU: 8 PID: 4432 Comm: ip Not tainted 4.20.0-rc2-00169-g63d86876f324 #29 [ 9.163130] Call Trace: [ 9.170701] [c0000002e899a980] [c0000000009c1068] .dump_stack+0xa8/0xec (unreliable) [ 9.177140] [c0000002e899aa10] [c00000000007a7b4] .___might_sleep+0x138/0x164 [ 9.184440] [c0000002e899aa80] [c0000000001d5bac] .kmem_cache_alloc_trace+0x238/0x30c [ 9.191216] [c0000002e899ab40] [c00000000065ea1c] .memac_add_hash_mac_address+0x104/0x198 [ 9.199464] [c0000002e899abd0] [c00000000065a788] .set_multi+0x1c8/0x218 [ 9.206242] [c0000002e899ac80] [c0000000006615ec] .dpaa_set_rx_mode+0xdc/0x17c [ 9.213544] [c0000002e899ad00] [c00000000083d2b0] .__dev_set_rx_mode+0x80/0xd4 [ 9.219535] [c0000002e899ad90] [c00000000083d334] .dev_set_rx_mode+0x30/0x54 [ 9.225271] [c0000002e899ae10] [c00000000083d4a0] .__dev_open+0x148/0x1c8 [ 9.230751] [c0000002e899aeb0] [c00000000083d934] .__dev_change_flags+0x19c/0x1e0 [ 9.230755] [c0000002e899af60] [c00000000083d9a4] .dev_change_flags+0x2c/0x80 [ 9.242752] [c0000002e899aff0] [c0000000008554ec] .do_setlink+0x350/0xf08 [ 9.248228] [c0000002e899b170] [c000000000857ad0] .rtnl_newlink+0x588/0x7e0 [ 9.253965] [c0000002e899b740] [c000000000852424] .rtnetlink_rcv_msg+0x3e0/0x498 [ 9.261440] [c0000002e899b820] [c000000000884790] .netlink_rcv_skb+0x134/0x14c [ 9.267607] [c0000002e899b8e0] [c000000000851840] .rtnetlink_rcv+0x18/0x2c [ 9.274558] [c0000002e899b950] [c000000000883c8c] .netlink_unicast+0x214/0x318 [ 9.281163] [c0000002e899ba00] [c000000000884220] .netlink_sendmsg+0x348/0x444 [ 9.287076] [c0000002e899bae0] [c00000000080d13c] .sock_sendmsg+0x2c/0x54 [ 9.287080] [c0000002e899bb50] [c0000000008106c0] .___sys_sendmsg+0x2d0/0x2d8 [ 9.298375] [c0000002e899bd30] [c000000000811a80] .__sys_sendmsg+0x5c/0xb0 [ 9.303939] [c0000002e899be20] [c0000000000006b0] system_call+0x60/0x6c Signed-off-by: Scott Wood <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 0d9c9a2 ] These functions are called from atomic context: [ 9.150239] BUG: sleeping function called from invalid context at /home/scott/git/linux/mm/slab.h:421 [ 9.158159] in_atomic(): 1, irqs_disabled(): 0, pid: 4432, name: ip [ 9.163128] CPU: 8 PID: 4432 Comm: ip Not tainted 4.20.0-rc2-00169-g63d86876f324 Freescale#29 [ 9.163130] Call Trace: [ 9.170701] [c0000002e899a980] [c0000000009c1068] .dump_stack+0xa8/0xec (unreliable) [ 9.177140] [c0000002e899aa10] [c00000000007a7b4] .___might_sleep+0x138/0x164 [ 9.184440] [c0000002e899aa80] [c0000000001d5bac] .kmem_cache_alloc_trace+0x238/0x30c [ 9.191216] [c0000002e899ab40] [c00000000065ea1c] .memac_add_hash_mac_address+0x104/0x198 [ 9.199464] [c0000002e899abd0] [c00000000065a788] .set_multi+0x1c8/0x218 [ 9.206242] [c0000002e899ac80] [c0000000006615ec] .dpaa_set_rx_mode+0xdc/0x17c [ 9.213544] [c0000002e899ad00] [c00000000083d2b0] .__dev_set_rx_mode+0x80/0xd4 [ 9.219535] [c0000002e899ad90] [c00000000083d334] .dev_set_rx_mode+0x30/0x54 [ 9.225271] [c0000002e899ae10] [c00000000083d4a0] .__dev_open+0x148/0x1c8 [ 9.230751] [c0000002e899aeb0] [c00000000083d934] .__dev_change_flags+0x19c/0x1e0 [ 9.230755] [c0000002e899af60] [c00000000083d9a4] .dev_change_flags+0x2c/0x80 [ 9.242752] [c0000002e899aff0] [c0000000008554ec] .do_setlink+0x350/0xf08 [ 9.248228] [c0000002e899b170] [c000000000857ad0] .rtnl_newlink+0x588/0x7e0 [ 9.253965] [c0000002e899b740] [c000000000852424] .rtnetlink_rcv_msg+0x3e0/0x498 [ 9.261440] [c0000002e899b820] [c000000000884790] .netlink_rcv_skb+0x134/0x14c [ 9.267607] [c0000002e899b8e0] [c000000000851840] .rtnetlink_rcv+0x18/0x2c [ 9.274558] [c0000002e899b950] [c000000000883c8c] .netlink_unicast+0x214/0x318 [ 9.281163] [c0000002e899ba00] [c000000000884220] .netlink_sendmsg+0x348/0x444 [ 9.287076] [c0000002e899bae0] [c00000000080d13c] .sock_sendmsg+0x2c/0x54 [ 9.287080] [c0000002e899bb50] [c0000000008106c0] .___sys_sendmsg+0x2d0/0x2d8 [ 9.298375] [c0000002e899bd30] [c000000000811a80] .__sys_sendmsg+0x5c/0xb0 [ 9.303939] [c0000002e899be20] [c0000000000006b0] system_call+0x60/0x6c Signed-off-by: Scott Wood <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit c4317b1 ] In case devlink reload failed, it is possible to trigger a use-after-free when querying the kernel for device info via 'devlink dev info' [1]. This happens because as part of the reload error path the PCI command interface is de-initialized and its mailboxes are freed. When the devlink '->info_get()' callback is invoked the device is queried via the command interface and the freed mailboxes are accessed. Fix this by initializing the command interface once during probe and not during every reload. This is consistent with the other bus used by mlxsw (i.e., 'mlxsw_i2c') and also allows user space to query the running firmware version (for example) from the device after a failed reload. [1] BUG: KASAN: use-after-free in memcpy include/linux/string.h:406 [inline] BUG: KASAN: use-after-free in mlxsw_pci_cmd_exec+0x177/0xa60 drivers/net/ethernet/mellanox/mlxsw/pci.c:1675 Write of size 4096 at addr ffff88810ae32000 by task syz-executor.1/2355 CPU: 1 PID: 2355 Comm: syz-executor.1 Not tainted 5.8.0-rc2+ Freescale#29 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xf6/0x16e lib/dump_stack.c:118 print_address_description.constprop.0+0x1c/0x250 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 check_memory_region_inline mm/kasan/generic.c:186 [inline] check_memory_region+0x14e/0x1b0 mm/kasan/generic.c:192 memcpy+0x39/0x60 mm/kasan/common.c:106 memcpy include/linux/string.h:406 [inline] mlxsw_pci_cmd_exec+0x177/0xa60 drivers/net/ethernet/mellanox/mlxsw/pci.c:1675 mlxsw_cmd_exec+0x249/0x550 drivers/net/ethernet/mellanox/mlxsw/core.c:2335 mlxsw_cmd_access_reg drivers/net/ethernet/mellanox/mlxsw/cmd.h:859 [inline] mlxsw_core_reg_access_cmd drivers/net/ethernet/mellanox/mlxsw/core.c:1938 [inline] mlxsw_core_reg_access+0x2f6/0x540 drivers/net/ethernet/mellanox/mlxsw/core.c:1985 mlxsw_reg_query drivers/net/ethernet/mellanox/mlxsw/core.c:2000 [inline] mlxsw_devlink_info_get+0x17f/0x6e0 drivers/net/ethernet/mellanox/mlxsw/core.c:1090 devlink_nl_info_fill.constprop.0+0x13c/0x2d0 net/core/devlink.c:4588 devlink_nl_cmd_info_get_dumpit+0x246/0x460 net/core/devlink.c:4648 genl_lock_dumpit+0x85/0xc0 net/netlink/genetlink.c:575 netlink_dump+0x515/0xe50 net/netlink/af_netlink.c:2245 __netlink_dump_start+0x53d/0x830 net/netlink/af_netlink.c:2353 genl_family_rcv_msg_dumpit.isra.0+0x296/0x300 net/netlink/genetlink.c:638 genl_family_rcv_msg net/netlink/genetlink.c:733 [inline] genl_rcv_msg+0x78d/0x9d0 net/netlink/genetlink.c:753 netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2469 genl_rcv+0x24/0x40 net/netlink/genetlink.c:764 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0x150/0x190 net/socket.c:672 ____sys_sendmsg+0x6d8/0x840 net/socket.c:2363 ___sys_sendmsg+0xff/0x170 net/socket.c:2417 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2450 do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: a9c8336 ("mlxsw: core: Add support for devlink info command") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit b514191 ] The commit cited below removed the RCU read-side critical section from rtnl_fdb_dump() which means that the ndo_fdb_dump() callback is invoked without RCU protection. This results in the following warning [1] in the VXLAN driver, which relied on the callback being invoked from an RCU read-side critical section. Fix this by calling rcu_read_lock() in the VXLAN driver, as already done in the bridge driver. [1] WARNING: suspicious RCU usage 5.8.0-rc4-custom-01521-g481007553ce6 Freescale#29 Not tainted ----------------------------- drivers/net/vxlan.c:1379 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by bridge/166: #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0xea/0x1090 stack backtrace: CPU: 1 PID: 166 Comm: bridge Not tainted 5.8.0-rc4-custom-01521-g481007553ce6 Freescale#29 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x100/0x184 lockdep_rcu_suspicious+0x153/0x15d vxlan_fdb_dump+0x51e/0x6d0 rtnl_fdb_dump+0x4dc/0xad0 netlink_dump+0x540/0x1090 __netlink_dump_start+0x695/0x950 rtnetlink_rcv_msg+0x802/0xbd0 netlink_rcv_skb+0x17a/0x480 rtnetlink_rcv+0x22/0x30 netlink_unicast+0x5ae/0x890 netlink_sendmsg+0x98a/0xf40 __sys_sendto+0x279/0x3b0 __x64_sys_sendto+0xe6/0x1a0 do_syscall_64+0x54/0xa0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fe14fa2ade0 Code: Bad RIP value. RSP: 002b:00007fff75bb5b88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00005614b1ba0020 RCX: 00007fe14fa2ade0 RDX: 000000000000011c RSI: 00007fff75bb5b90 RDI: 0000000000000003 RBP: 00007fff75bb5b90 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00005614b1b89160 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Fixes: 5e6d243 ("bridge: netlink dump interface at par with brctl") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit 96298f6 ] According to Core Spec Version 5.2 | Vol 3, Part A 6.1.5, the incoming L2CAP_ConfigReq should be handled during OPEN state. The section below shows the btmon trace when running L2CAP/COS/CFD/BV-12-C before and after this change. === Before === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 Freescale#22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 Freescale#23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 Freescale#24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 Freescale#25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 Freescale#26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 Freescale#27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 Freescale#28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 Freescale#29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 Freescale#30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 Freescale#31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ...... < ACL Data TX: Handle 256 flags 0x00 dlen 14 Freescale#32 L2CAP: Command Reject (0x01) ident 3 len 6 Reason: Invalid CID in request (0x0002) Destination CID: 64 Source CID: 65 > HCI Event: Number of Completed Packets (0x13) plen 5 Freescale#33 Num handles: 1 Handle: 256 Count: 1 ... === After === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 Freescale#22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 Freescale#23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 Freescale#24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 Freescale#25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 Freescale#26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 Freescale#27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 Freescale#28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 Freescale#29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 Freescale#30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 Freescale#31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ..... < ACL Data TX: Handle 256 flags 0x00 dlen 18 Freescale#32 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 < ACL Data TX: Handle 256 flags 0x00 dlen 12 Freescale#33 L2CAP: Configure Request (0x04) ident 3 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 Freescale#34 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 Freescale#35 Num handles: 1 Handle: 256 Count: 1 ... Signed-off-by: Howard Chung <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit b514191 ] The commit cited below removed the RCU read-side critical section from rtnl_fdb_dump() which means that the ndo_fdb_dump() callback is invoked without RCU protection. This results in the following warning [1] in the VXLAN driver, which relied on the callback being invoked from an RCU read-side critical section. Fix this by calling rcu_read_lock() in the VXLAN driver, as already done in the bridge driver. [1] WARNING: suspicious RCU usage 5.8.0-rc4-custom-01521-g481007553ce6 Freescale#29 Not tainted ----------------------------- drivers/net/vxlan.c:1379 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by bridge/166: #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0xea/0x1090 stack backtrace: CPU: 1 PID: 166 Comm: bridge Not tainted 5.8.0-rc4-custom-01521-g481007553ce6 Freescale#29 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x100/0x184 lockdep_rcu_suspicious+0x153/0x15d vxlan_fdb_dump+0x51e/0x6d0 rtnl_fdb_dump+0x4dc/0xad0 netlink_dump+0x540/0x1090 __netlink_dump_start+0x695/0x950 rtnetlink_rcv_msg+0x802/0xbd0 netlink_rcv_skb+0x17a/0x480 rtnetlink_rcv+0x22/0x30 netlink_unicast+0x5ae/0x890 netlink_sendmsg+0x98a/0xf40 __sys_sendto+0x279/0x3b0 __x64_sys_sendto+0xe6/0x1a0 do_syscall_64+0x54/0xa0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fe14fa2ade0 Code: Bad RIP value. RSP: 002b:00007fff75bb5b88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00005614b1ba0020 RCX: 00007fe14fa2ade0 RDX: 000000000000011c RSI: 00007fff75bb5b90 RDI: 0000000000000003 RBP: 00007fff75bb5b90 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00005614b1b89160 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Fixes: 5e6d243 ("bridge: netlink dump interface at par with brctl") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit 96298f6 ] According to Core Spec Version 5.2 | Vol 3, Part A 6.1.5, the incoming L2CAP_ConfigReq should be handled during OPEN state. The section below shows the btmon trace when running L2CAP/COS/CFD/BV-12-C before and after this change. === Before === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 Freescale#22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 Freescale#23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 Freescale#24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 Freescale#25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 Freescale#26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 Freescale#27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 Freescale#28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 Freescale#29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 Freescale#30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 Freescale#31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ...... < ACL Data TX: Handle 256 flags 0x00 dlen 14 Freescale#32 L2CAP: Command Reject (0x01) ident 3 len 6 Reason: Invalid CID in request (0x0002) Destination CID: 64 Source CID: 65 > HCI Event: Number of Completed Packets (0x13) plen 5 Freescale#33 Num handles: 1 Handle: 256 Count: 1 ... === After === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 Freescale#22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 Freescale#23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 Freescale#24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 Freescale#25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 Freescale#26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 Freescale#27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 Freescale#28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 Freescale#29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 Freescale#30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 Freescale#31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ..... < ACL Data TX: Handle 256 flags 0x00 dlen 18 Freescale#32 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 < ACL Data TX: Handle 256 flags 0x00 dlen 12 Freescale#33 L2CAP: Configure Request (0x04) ident 3 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 Freescale#34 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 Freescale#35 Num handles: 1 Handle: 256 Count: 1 ... Signed-off-by: Howard Chung <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
This patch fixes issue introduced by a previous commit where iWARP doorbell address wasn't initialized, causing call trace when any RDMA application wants to use this interface: Illegal doorbell address: 0000000000000000. Legal range for doorbell addresses is [0000000011431e08..00000000ec3799d3] WARNING: CPU: 11 PID: 11990 at drivers/net/ethernet/qlogic/qed/qed_dev.c:93 qed_db_rec_sanity.isra.12+0x48/0x70 [qed] ... hpsa scsi_transport_sas [last unloaded: crc8] CPU: 11 PID: 11990 Comm: rping Tainted: G S 5.10.0-rc1 #29 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 RIP: 0010:qed_db_rec_sanity.isra.12+0x48/0x70 [qed] ... RSP: 0018:ffffafc28458fa88 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff8d0d4c620000 RCX: 0000000000000000 RDX: ffff8d10afde7d50 RSI: ffff8d10afdd8b40 RDI: ffff8d10afdd8b40 RBP: ffffafc28458fe38 R08: 0000000000000003 R09: 0000000000007fff R10: 0000000000000001 R11: ffffafc28458f888 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8d0d43ccbbd0 R15: ffff8d0d48dae9c0 FS: 00007fbd5267e740(0000) GS:ffff8d10afdc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fbd4f258fb8 CR3: 0000000108d96003 CR4: 00000000001706e0 Call Trace: qed_db_recovery_add+0x6d/0x1f0 [qed] qedr_create_user_qp+0x57e/0xd30 [qedr] qedr_create_qp+0x5f3/0xab0 [qedr] ? lookup_get_idr_uobject.part.12+0x45/0x90 [ib_uverbs] create_qp+0x45d/0xb30 [ib_uverbs] ? ib_uverbs_cq_event_handler+0x30/0x30 [ib_uverbs] ib_uverbs_create_qp+0xb9/0xe0 [ib_uverbs] ib_uverbs_write+0x3f9/0x570 [ib_uverbs] ? security_mmap_file+0x62/0xe0 vfs_write+0xb7/0x200 ksys_write+0xaf/0xd0 ? syscall_trace_enter.isra.25+0x152/0x200 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 06e8d1d ("RDMA/qedr: Add support for user mode XRC-SRQ's") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michal Kalderon <[email protected]> Signed-off-by: Igor Russkikh <[email protected]> Signed-off-by: Alok Prasad <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
[ Upstream commit d412137 ] The perf_buffer fails on system with offline cpus: # test_progs -t perf_buffer test_perf_buffer:PASS:nr_cpus 0 nsec test_perf_buffer:PASS:nr_on_cpus 0 nsec test_perf_buffer:PASS:skel_load 0 nsec test_perf_buffer:PASS:attach_kprobe 0 nsec test_perf_buffer:PASS:perf_buf__new 0 nsec test_perf_buffer:PASS:epoll_fd 0 nsec skipping offline CPU Freescale#24 skipping offline CPU Freescale#25 skipping offline CPU Freescale#26 skipping offline CPU Freescale#27 skipping offline CPU Freescale#28 skipping offline CPU Freescale#29 skipping offline CPU Freescale#30 skipping offline CPU Freescale#31 test_perf_buffer:PASS:perf_buffer__poll 0 nsec test_perf_buffer:PASS:seen_cpu_cnt 0 nsec test_perf_buffer:FAIL:buf_cnt got 24, expected 32 Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED Changing the test to check online cpus instead of possible. Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit d412137 ] The perf_buffer fails on system with offline cpus: # test_progs -t perf_buffer test_perf_buffer:PASS:nr_cpus 0 nsec test_perf_buffer:PASS:nr_on_cpus 0 nsec test_perf_buffer:PASS:skel_load 0 nsec test_perf_buffer:PASS:attach_kprobe 0 nsec test_perf_buffer:PASS:perf_buf__new 0 nsec test_perf_buffer:PASS:epoll_fd 0 nsec skipping offline CPU Freescale#24 skipping offline CPU Freescale#25 skipping offline CPU Freescale#26 skipping offline CPU Freescale#27 skipping offline CPU Freescale#28 skipping offline CPU Freescale#29 skipping offline CPU Freescale#30 skipping offline CPU Freescale#31 test_perf_buffer:PASS:perf_buffer__poll 0 nsec test_perf_buffer:PASS:seen_cpu_cnt 0 nsec test_perf_buffer:FAIL:buf_cnt got 24, expected 32 Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED Changing the test to check online cpus instead of possible. Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Sasha Levin <[email protected]>
commit 22e2100 upstream. The trace_hardirqs_{on,off}() require the caller to setup frame pointer properly. This because these two functions use macro 'CALLER_ADDR1' (aka. __builtin_return_address(1)) to acquire caller info. If the $fp is used for other purpose, the code generated this macro (as below) could trigger memory access fault. 0xffffffff8011510e <+80>: ld a1,-16(s0) 0xffffffff80115112 <+84>: ld s2,-8(a1) # <-- paging fault here The oops message during booting if compiled with 'irqoff' tracer enabled: [ 0.039615][ T0] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000f8 [ 0.041925][ T0] Oops [Freescale#1] [ 0.042063][ T0] Modules linked in: [ 0.042864][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-rc1-00233-g9a20c48d1ed2 Freescale#29 [ 0.043568][ T0] Hardware name: riscv-virtio,qemu (DT) [ 0.044343][ T0] epc : trace_hardirqs_on+0x56/0xe2 [ 0.044601][ T0] ra : restore_all+0x12/0x6e [ 0.044721][ T0] epc : ffffffff80126a5c ra : ffffffff80003b94 sp : ffffffff81403db0 [ 0.044801][ T0] gp : ffffffff8163acd8 tp : ffffffff81414880 t0 : 0000000000000020 [ 0.044882][ T0] t1 : 0098968000000000 t2 : 0000000000000000 s0 : ffffffff81403de0 [ 0.044967][ T0] s1 : 0000000000000000 a0 : 0000000000000001 a1 : 0000000000000100 [ 0.045046][ T0] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000 [ 0.045124][ T0] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000054494d45 [ 0.045210][ T0] s2 : ffffffff80003b94 s3 : ffffffff81a8f1b0 s4 : ffffffff80e27b50 [ 0.045289][ T0] s5 : ffffffff81414880 s6 : ffffffff8160fa00 s7 : 00000000800120e8 [ 0.045389][ T0] s8 : 0000000080013100 s9 : 000000000000007f s10: 0000000000000000 [ 0.045474][ T0] s11: 0000000000000000 t3 : 7fffffffffffffff t4 : 0000000000000000 [ 0.045548][ T0] t5 : 0000000000000000 t6 : ffffffff814aa368 [ 0.045620][ T0] status: 0000000200000100 badaddr: 00000000000000f8 cause: 000000000000000d [ 0.046402][ T0] [<ffffffff80003b94>] restore_all+0x12/0x6e This because the $fp(aka. $s0) register is not used as frame pointer in the assembly entry code. resume_kernel: REG_L s0, TASK_TI_PREEMPT_COUNT(tp) bnez s0, restore_all REG_L s0, TASK_TI_FLAGS(tp) andi s0, s0, _TIF_NEED_RESCHED beqz s0, restore_all call preempt_schedule_irq j restore_all To fix above issue, here we add one extra level wrapper for function trace_hardirqs_{on,off}() so they can be safely called by low level entry code. Signed-off-by: Changbin Du <[email protected]> Fixes: 3c46979 ("riscv: Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT") Cc: [email protected] Signed-off-by: Palmer Dabbelt <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 22e2100 upstream. The trace_hardirqs_{on,off}() require the caller to setup frame pointer properly. This because these two functions use macro 'CALLER_ADDR1' (aka. __builtin_return_address(1)) to acquire caller info. If the $fp is used for other purpose, the code generated this macro (as below) could trigger memory access fault. 0xffffffff8011510e <+80>: ld a1,-16(s0) 0xffffffff80115112 <+84>: ld s2,-8(a1) # <-- paging fault here The oops message during booting if compiled with 'irqoff' tracer enabled: [ 0.039615][ T0] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000f8 [ 0.041925][ T0] Oops [Freescale#1] [ 0.042063][ T0] Modules linked in: [ 0.042864][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-rc1-00233-g9a20c48d1ed2 Freescale#29 [ 0.043568][ T0] Hardware name: riscv-virtio,qemu (DT) [ 0.044343][ T0] epc : trace_hardirqs_on+0x56/0xe2 [ 0.044601][ T0] ra : restore_all+0x12/0x6e [ 0.044721][ T0] epc : ffffffff80126a5c ra : ffffffff80003b94 sp : ffffffff81403db0 [ 0.044801][ T0] gp : ffffffff8163acd8 tp : ffffffff81414880 t0 : 0000000000000020 [ 0.044882][ T0] t1 : 0098968000000000 t2 : 0000000000000000 s0 : ffffffff81403de0 [ 0.044967][ T0] s1 : 0000000000000000 a0 : 0000000000000001 a1 : 0000000000000100 [ 0.045046][ T0] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000 [ 0.045124][ T0] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000054494d45 [ 0.045210][ T0] s2 : ffffffff80003b94 s3 : ffffffff81a8f1b0 s4 : ffffffff80e27b50 [ 0.045289][ T0] s5 : ffffffff81414880 s6 : ffffffff8160fa00 s7 : 00000000800120e8 [ 0.045389][ T0] s8 : 0000000080013100 s9 : 000000000000007f s10: 0000000000000000 [ 0.045474][ T0] s11: 0000000000000000 t3 : 7fffffffffffffff t4 : 0000000000000000 [ 0.045548][ T0] t5 : 0000000000000000 t6 : ffffffff814aa368 [ 0.045620][ T0] status: 0000000200000100 badaddr: 00000000000000f8 cause: 000000000000000d [ 0.046402][ T0] [<ffffffff80003b94>] restore_all+0x12/0x6e This because the $fp(aka. $s0) register is not used as frame pointer in the assembly entry code. resume_kernel: REG_L s0, TASK_TI_PREEMPT_COUNT(tp) bnez s0, restore_all REG_L s0, TASK_TI_FLAGS(tp) andi s0, s0, _TIF_NEED_RESCHED beqz s0, restore_all call preempt_schedule_irq j restore_all To fix above issue, here we add one extra level wrapper for function trace_hardirqs_{on,off}() so they can be safely called by low level entry code. Signed-off-by: Changbin Du <[email protected]> Fixes: 3c46979 ("riscv: Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT") Cc: [email protected] Signed-off-by: Palmer Dabbelt <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 71e2d66 upstream. The following has been observed when running stressng mmap since commit b653db7 ("mm: Clear page->private when splitting or migrating a page") watchdog: BUG: soft lockup - CPU#75 stuck for 26s! [stress-ng:9546] CPU: 75 PID: 9546 Comm: stress-ng Tainted: G E 6.0.0-revert-b653db77-fix+ Freescale#29 0357d79b60fb09775f678e4f3f64ef0579ad1374 Hardware name: SGI.COM C2112-4GP3/X10DRT-P-Series, BIOS 2.0a 05/09/2016 RIP: 0010:xas_descend+0x28/0x80 Code: cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2 RSP: 0018:ffffbbf02a2236a8 EFLAGS: 00000246 RAX: ffff9cab7d6a0002 RBX: ffffe04b0af88040 RCX: 0000000000000002 RDX: 0000000000000030 RSI: ffff9cab60509b60 RDI: ffffbbf02a2236c0 RBP: 0000000000000000 R08: ffff9cab60509b60 R09: ffffbbf02a2236c0 R10: 0000000000000001 R11: ffffbbf02a223698 R12: 0000000000000000 R13: ffff9cab4e28da80 R14: 0000000000039c01 R15: ffff9cab4e28da88 FS: 00007fab89b85e40(0000) GS:ffff9cea3fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fab84e00000 CR3: 00000040b73a4003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> xas_load+0x3a/0x50 __filemap_get_folio+0x80/0x370 ? put_swap_page+0x163/0x360 pagecache_get_page+0x13/0x90 __try_to_reclaim_swap+0x50/0x190 scan_swap_map_slots+0x31e/0x670 get_swap_pages+0x226/0x3c0 folio_alloc_swap+0x1cc/0x240 add_to_swap+0x14/0x70 shrink_page_list+0x968/0xbc0 reclaim_page_list+0x70/0xf0 reclaim_pages+0xdd/0x120 madvise_cold_or_pageout_pte_range+0x814/0xf30 walk_pgd_range+0x637/0xa30 __walk_page_range+0x142/0x170 walk_page_range+0x146/0x170 madvise_pageout+0xb7/0x280 ? asm_common_interrupt+0x22/0x40 madvise_vma_behavior+0x3b7/0xac0 ? find_vma+0x4a/0x70 ? find_vma+0x64/0x70 ? madvise_vma_anon_name+0x40/0x40 madvise_walk_vmas+0xa6/0x130 do_madvise+0x2f4/0x360 __x64_sys_madvise+0x26/0x30 do_syscall_64+0x5b/0x80 ? do_syscall_64+0x67/0x80 ? syscall_exit_to_user_mode+0x17/0x40 ? do_syscall_64+0x67/0x80 ? syscall_exit_to_user_mode+0x17/0x40 ? do_syscall_64+0x67/0x80 ? do_syscall_64+0x67/0x80 ? common_interrupt+0x8b/0xa0 entry_SYSCALL_64_after_hwframe+0x63/0xcd The problem can be reproduced with the mmtests config config-workload-stressng-mmap. It does not always happen and when it triggers is variable but it has happened on multiple machines. The intent of commit b653db7 patch was to avoid the case where PG_private is clear but folio->private is not-NULL. However, THP tail pages uses page->private for "swp_entry_t if folio_test_swapcache()" as stated in the documentation for struct folio. This patch only clobbers page->private for tail pages if the head page was not in swapcache and warns once if page->private had an unexpected value. Link: https://lkml.kernel.org/r/[email protected] Fixes: b653db7 ("mm: Clear page->private when splitting or migrating a page") Signed-off-by: Mel Gorman <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Yang Shi <[email protected]> Cc: Brian Foster <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Oleksandr Natalenko <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Vitaly Wool <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit 743edc8 ] As previously explained, the rehash delayed work migrates filters from one region to another. This is done by iterating over all chunks (all the filters with the same priority) in the region and in each chunk iterating over all the filters. When the work runs out of credits it stores the current chunk and entry as markers in the per-work context so that it would know where to resume the migration from the next time the work is scheduled. Upon error, the chunk marker is reset to NULL, but without resetting the entry markers despite being relative to it. This can result in migration being resumed from an entry that does not belong to the chunk being migrated. In turn, this will eventually lead to a chunk being iterated over as if it is an entry. Because of how the two structures happen to be defined, this does not lead to KASAN splats, but to warnings such as [1]. Fix by creating a helper that resets all the markers and call it from all the places the currently only reset the chunk marker. For good measures also call it when starting a completely new rehash. Add a warning to avoid future cases. [1] WARNING: CPU: 7 PID: 1076 at drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c:407 mlxsw_afk_encode+0x242/0x2f0 Modules linked in: CPU: 7 PID: 1076 Comm: kworker/7:24 Tainted: G W 6.9.0-rc3-custom-00880-g29e61d91b77b Freescale#29 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work RIP: 0010:mlxsw_afk_encode+0x242/0x2f0 [...] Call Trace: <TASK> mlxsw_sp_acl_atcam_entry_add+0xd9/0x3c0 mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x109/0x290 mlxsw_sp_acl_tcam_vregion_rehash_work+0x6c/0x470 process_one_work+0x151/0x370 worker_thread+0x2cb/0x3e0 kthread+0xd0/0x100 ret_from_fork+0x34/0x50 </TASK> Fixes: 6f9579d ("mlxsw: spectrum_acl: Remember where to continue rehash migration") Signed-off-by: Ido Schimmel <[email protected]> Tested-by: Alexander Zubkov <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/cc17eed86b41dd829d39b07906fec074a9ce580e.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 743edc8 ] As previously explained, the rehash delayed work migrates filters from one region to another. This is done by iterating over all chunks (all the filters with the same priority) in the region and in each chunk iterating over all the filters. When the work runs out of credits it stores the current chunk and entry as markers in the per-work context so that it would know where to resume the migration from the next time the work is scheduled. Upon error, the chunk marker is reset to NULL, but without resetting the entry markers despite being relative to it. This can result in migration being resumed from an entry that does not belong to the chunk being migrated. In turn, this will eventually lead to a chunk being iterated over as if it is an entry. Because of how the two structures happen to be defined, this does not lead to KASAN splats, but to warnings such as [1]. Fix by creating a helper that resets all the markers and call it from all the places the currently only reset the chunk marker. For good measures also call it when starting a completely new rehash. Add a warning to avoid future cases. [1] WARNING: CPU: 7 PID: 1076 at drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c:407 mlxsw_afk_encode+0x242/0x2f0 Modules linked in: CPU: 7 PID: 1076 Comm: kworker/7:24 Tainted: G W 6.9.0-rc3-custom-00880-g29e61d91b77b #29 Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019 Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work RIP: 0010:mlxsw_afk_encode+0x242/0x2f0 [...] Call Trace: <TASK> mlxsw_sp_acl_atcam_entry_add+0xd9/0x3c0 mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0 mlxsw_sp_acl_tcam_vchunk_migrate_all+0x109/0x290 mlxsw_sp_acl_tcam_vregion_rehash_work+0x6c/0x470 process_one_work+0x151/0x370 worker_thread+0x2cb/0x3e0 kthread+0xd0/0x100 ret_from_fork+0x34/0x50 </TASK> Fixes: 6f9579d ("mlxsw: spectrum_acl: Remember where to continue rehash migration") Signed-off-by: Ido Schimmel <[email protected]> Tested-by: Alexander Zubkov <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/cc17eed86b41dd829d39b07906fec074a9ce580e.1713797103.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit e64746e ] cpumask_of_node() can be called for NUMA_NO_NODE inside do_map_benchmark() resulting in the following sanitizer report: UBSAN: array-index-out-of-bounds in ./arch/x86/include/asm/topology.h:72:28 index -1 is out of range for type 'cpumask [64][1]' CPU: 1 PID: 990 Comm: dma_map_benchma Not tainted 6.9.0-rc6 #29 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:117) ubsan_epilogue (lib/ubsan.c:232) __ubsan_handle_out_of_bounds (lib/ubsan.c:429) cpumask_of_node (arch/x86/include/asm/topology.h:72) [inline] do_map_benchmark (kernel/dma/map_benchmark.c:104) map_benchmark_ioctl (kernel/dma/map_benchmark.c:246) full_proxy_unlocked_ioctl (fs/debugfs/file.c:333) __x64_sys_ioctl (fs/ioctl.c:890) do_syscall_64 (arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Use cpumask_of_node() in place when binding a kernel thread to a cpuset of a particular node. Note that the provided node id is checked inside map_benchmark_ioctl(). It's just a NUMA_NO_NODE case which is not handled properly later. Found by Linux Verification Center (linuxtesting.org). Fixes: 65789da ("dma-mapping: add benchmark support for streaming DMA APIs") Signed-off-by: Fedor Pchelkin <[email protected]> Acked-by: Barry Song <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit e64746e ] cpumask_of_node() can be called for NUMA_NO_NODE inside do_map_benchmark() resulting in the following sanitizer report: UBSAN: array-index-out-of-bounds in ./arch/x86/include/asm/topology.h:72:28 index -1 is out of range for type 'cpumask [64][1]' CPU: 1 PID: 990 Comm: dma_map_benchma Not tainted 6.9.0-rc6 Freescale#29 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:117) ubsan_epilogue (lib/ubsan.c:232) __ubsan_handle_out_of_bounds (lib/ubsan.c:429) cpumask_of_node (arch/x86/include/asm/topology.h:72) [inline] do_map_benchmark (kernel/dma/map_benchmark.c:104) map_benchmark_ioctl (kernel/dma/map_benchmark.c:246) full_proxy_unlocked_ioctl (fs/debugfs/file.c:333) __x64_sys_ioctl (fs/ioctl.c:890) do_syscall_64 (arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) Use cpumask_of_node() in place when binding a kernel thread to a cpuset of a particular node. Note that the provided node id is checked inside map_benchmark_ioctl(). It's just a NUMA_NO_NODE case which is not handled properly later. Found by Linux Verification Center (linuxtesting.org). Fixes: 65789da ("dma-mapping: add benchmark support for streaming DMA APIs") Signed-off-by: Fedor Pchelkin <[email protected]> Acked-by: Barry Song <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit eef0532 ] Run bpf_tcp_ca selftests (./test_progs -t bpf_tcp_ca) on a Loongarch platform, some "Segmentation fault" errors occur: ''' test_dctcp:PASS:bpf_dctcp__open_and_load 0 nsec test_dctcp:FAIL:bpf_map__attach_struct_ops unexpected error: -524 Freescale#29/1 bpf_tcp_ca/dctcp:FAIL test_cubic:PASS:bpf_cubic__open_and_load 0 nsec test_cubic:FAIL:bpf_map__attach_struct_ops unexpected error: -524 Freescale#29/2 bpf_tcp_ca/cubic:FAIL test_dctcp_fallback:PASS:dctcp_skel 0 nsec test_dctcp_fallback:PASS:bpf_dctcp__load 0 nsec test_dctcp_fallback:FAIL:dctcp link unexpected error: -524 Freescale#29/4 bpf_tcp_ca/dctcp_fallback:FAIL test_write_sk_pacing:PASS:open_and_load 0 nsec test_write_sk_pacing:FAIL:attach_struct_ops unexpected error: -524 Freescale#29/6 bpf_tcp_ca/write_sk_pacing:FAIL test_update_ca:PASS:open 0 nsec test_update_ca:FAIL:attach_struct_ops unexpected error: -524 settcpca:FAIL:setsockopt unexpected setsockopt: \ actual -1 == expected -1 (network_helpers.c:99: errno: No such file or directory) \ Failed to call post_socket_cb start_test:FAIL:start_server_str unexpected start_server_str: \ actual -1 == expected -1 test_update_ca:FAIL:ca1_ca1_cnt unexpected ca1_ca1_cnt: \ actual 0 <= expected 0 Freescale#29/9 bpf_tcp_ca/update_ca:FAIL Freescale#29 bpf_tcp_ca:FAIL Caught signal Freescale#11! Stack trace: ./test_progs(crash_handler+0x28)[0x5555567ed91c] linux-vdso.so.1(__vdso_rt_sigreturn+0x0)[0x7ffffee408b0] ./test_progs(bpf_link__update_map+0x80)[0x555556824a78] ./test_progs(+0x94d68)[0x5555564c4d68] ./test_progs(test_bpf_tcp_ca+0xe8)[0x5555564c6a88] ./test_progs(+0x3bde54)[0x5555567ede54] ./test_progs(main+0x61c)[0x5555567efd54] /usr/lib64/libc.so.6(+0x22208)[0x7ffff2aaa208] /usr/lib64/libc.so.6(__libc_start_main+0xac)[0x7ffff2aaa30c] ./test_progs(_start+0x48)[0x55555646bca8] Segmentation fault ''' This is because BPF trampoline is not implemented on Loongarch yet, "link" returned by bpf_map__attach_struct_ops() is NULL. test_progs crashs when this NULL link passes to bpf_link__update_map(). This patch adds NULL checks for all links in bpf_tcp_ca to fix these errors. If "link" is NULL, goto the newly added label "out" to destroy the skel. v2: - use "goto out" instead of "return" as Eduard suggested. Fixes: 06da9f3 ("selftests/bpf: Test switching TCP Congestion Control algorithms.") Signed-off-by: Geliang Tang <[email protected]> Reviewed-by: Alan Maguire <[email protected]> Link: https://lore.kernel.org/r/b4c841492bd4ed97964e4e61e92827ce51bf1dc9.1720615848.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
No description provided.