-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootloader UART setup #18
Comments
You would like the code inside your arch_decomp_setup (5 peripheral writes) to be done by GPU before kernel starts? |
Yep, just so the UART is good to go. Perhaps even have a config.txt option to change the baud rate if someone wants a different speed. |
This link is no longer valid: I'd quite like to see the code in that commit. |
Ahh sorry I had a bit of a reorganisation on github so it's moved to: |
I should add, I will happily provide you with the generated ARM assembly if that's any help. |
Can you try: which should set up UART to 115200 with a 3MHz clock as requested elsewhere. You can modify these: |
This is the same start.elf from #16 and writes to the UART without prior initialisation by the kernel worked too. |
Firmware now pushed to github. Please close issue if happy. |
I haven't actually managed to test this myself yet, but Simon has and assures me it's fine. So I'll close it for now and reopen it if I run into any trouble. Thanks! |
…S block during isolation for migration commit 0bf380b upstream. When isolating for migration, migration starts at the start of a zone which is not necessarily pageblock aligned. Further, it stops isolating when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally not aligned. This allows isolate_migratepages() to call pfn_to_page() on an invalid PFN which can result in a crash. This was originally reported against a 3.0-based kernel with the following trace in a crash dump. PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s" #0 [d72d3ad0] crash_kexec at c028cfdb raspberrypi#1 [d72d3b24] oops_end at c05c5322 raspberrypi#2 [d72d3b38] __bad_area_nosemaphore at c0227e60 raspberrypi#3 [d72d3bec] bad_area at c0227fb6 raspberrypi#4 [d72d3c00] do_page_fault at c05c72ec raspberrypi#5 [d72d3c80] error_code (via page_fault) at c05c47a4 EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000 DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50 CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002 raspberrypi#6 [d72d3cb4] isolate_migratepages at c030b15a raspberrypi#7 [d72d3d14] zone_watermark_ok at c02d26cb raspberrypi#8 [d72d3d2c] compact_zone at c030b8de raspberrypi#9 [d72d3d68] compact_zone_order at c030bba1 raspberrypi#10 [d72d3db4] try_to_compact_pages at c030bc84 raspberrypi#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7 raspberrypi#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7 raspberrypi#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97 raspberrypi#14 [d72d3eb8] alloc_pages_vma at c030a845 raspberrypi#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb raspberrypi#16 [d72d3f00] handle_mm_fault at c02f36c6 raspberrypi#17 [d72d3f30] do_page_fault at c05c70ed raspberrypi#18 [d72d3fb0] error_code (via page_fault) at c05c47a4 EAX: b71ff00 EBX: 00000001 ECX: 00001600 EDX: 00000431 DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788 SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50 CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202 It was also reported by Herbert van den Bergh against 3.1-based kernel with the following snippet from the console log. BUG: unable to handle kernel paging request at 01c00008 IP: [<c0522399>] isolate_migratepages+0x119/0x390 *pdpt = 000000002f7ce001 *pde = 0000000000000000 It is expected that it also affects 3.2.x and current mainline. The problem is that pfn_valid is only called on the first PFN being checked and that PFN is not necessarily aligned. Lets say we have a case like this H = MAX_ORDER_NR_PAGES boundary | = pageblock boundary m = cc->migrate_pfn f = cc->free_pfn o = memory hole H------|------H------|----m-Hoooooo|ooooooH-f----|------H The migrate_pfn is just below a memory hole and the free scanner is beyond the hole. When isolate_migratepages started, it scans from migrate_pfn to migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks pfn_valid() on the first PFN but then scans into the hole where there are not necessarily valid struct pages. This patch ensures that isolate_migratepages calls pfn_valid when necessary. Reported-by: Herbert van den Bergh <[email protected]> Tested-by: Herbert van den Bergh <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Michal Nazarewicz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…d reasons commit 5cf02d0 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 raspberrypi#1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] raspberrypi#2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f raspberrypi#3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 raspberrypi#4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] raspberrypi#5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] raspberrypi#6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 raspberrypi#7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 raspberrypi#8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 raspberrypi#9 [ffff8810343bf818] shrink_zone at ffffffff8112788f raspberrypi#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e raspberrypi#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f raspberrypi#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad raspberrypi#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 raspberrypi#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a raspberrypi#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 raspberrypi#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b raspberrypi#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 raspberrypi#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c raspberrypi#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 raspberrypi#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 raspberrypi#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] raspberrypi#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] raspberrypi#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 raspberrypi#24 [ffff8810343bfee8] kthread at ffffffff8108dd96 raspberrypi#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Ben Hutchings <[email protected]>
This is not working for me, please see: #105 |
Printing the "start_ip" for every secondary cpu is very noisy on a large system - and doesn't add any value. Drop this message. Console log before: Booting Node 0, Processors #1 smpboot cpu 1: start_ip = 96000 #2 smpboot cpu 2: start_ip = 96000 #3 smpboot cpu 3: start_ip = 96000 #4 smpboot cpu 4: start_ip = 96000 ... #31 smpboot cpu 31: start_ip = 96000 Brought up 32 CPUs Console log after: Booting Node 0, Processors #1 #2 #3 #4 #5 #6 #7 Ok. Booting Node 1, Processors #8 #9 #10 #11 #12 #13 #14 #15 Ok. Booting Node 0, Processors #16 #17 #18 #19 #20 #21 #22 #23 Ok. Booting Node 1, Processors #24 #25 #26 #27 #28 #29 #30 #31 Brought up 32 CPUs Acked-by: Borislav Petkov <[email protected]> Signed-off-by: Tony Luck <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
This patch simplifies the buffer allocation functions for MCI and removes unneeded memset calls. Also, a couple of unused variables are removed and a memory leak in DMA allocation is fixed. [ 1263.788267] WARNING: at /home/sujith/dev/wireless-testing/lib/dma-debug.c:875 check_unmap+0x173/0x7e0() [ 1263.788273] ath9k 0000:06:00.0: DMA-API: device driver frees DMA memory with different size [device address=0x0000000071908000] [map size=512 bytes] [unmap size=256 bytes] [ 1263.788345] Pid: 774, comm: rmmod Tainted: G W O 3.3.0-rc3-wl #18 [ 1263.788348] Call Trace: [ 1263.788355] [<ffffffff8105110f>] warn_slowpath_common+0x7f/0xc0 [ 1263.788359] [<ffffffff81051206>] warn_slowpath_fmt+0x46/0x50 [ 1263.788363] [<ffffffff8125a713>] check_unmap+0x173/0x7e0 [ 1263.788368] [<ffffffff8123fc22>] ? prio_tree_left+0x32/0xc0 [ 1263.788373] [<ffffffff8125aded>] debug_dma_free_coherent+0x6d/0x80 [ 1263.788381] [<ffffffffa0701c3c>] ath_mci_cleanup+0x7c/0x110 [ath9k] [ 1263.788387] [<ffffffffa06f4033>] ath9k_deinit_softc+0x113/0x120 [ath9k] [ 1263.788392] [<ffffffffa06f55cc>] ath9k_deinit_device+0x5c/0x70 [ath9k] [ 1263.788397] [<ffffffffa0704934>] ath_pci_remove+0x54/0xa0 [ath9k] [ 1263.788401] [<ffffffff81267d06>] pci_device_remove+0x46/0x110 [ 1263.788406] [<ffffffff813102bc>] __device_release_driver+0x7c/0xe0 [ 1263.788410] [<ffffffff81310a00>] driver_detach+0xd0/0xe0 [ 1263.788414] [<ffffffff81310118>] bus_remove_driver+0x88/0xe0 [ 1263.788418] [<ffffffff813111c2>] driver_unregister+0x62/0xa0 [ 1263.788421] [<ffffffff812680c4>] pci_unregister_driver+0x44/0xc0 [ 1263.788427] [<ffffffffa0705015>] ath_pci_exit+0x15/0x20 [ath9k] [ 1263.788432] [<ffffffffa070a92d>] ath9k_exit+0x15/0x31 [ath9k] [ 1263.788436] [<ffffffff810b971c>] sys_delete_module+0x18c/0x270 [ 1263.788441] [<ffffffff81436edd>] ? retint_swapgs+0x13/0x1b [ 1263.788446] [<ffffffff812483be>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 1263.788450] [<ffffffff814378e9>] system_call_fastpath+0x16/0x1b [ 1263.788453] ---[ end trace 3ab4d030ffde40d4 ]--- Signed-off-by: Sujith Manoharan <[email protected]> Signed-off-by: John W. Linville <[email protected]>
The warning below triggers on AMD MCM packages because physical package IDs on the cores of a _physical_ socket are the same. I.e., this field says which CPUs belong to the same physical package. However, the same two CPUs belong to two different internal, i.e. "logical" nodes in the same physical socket which is reflected in the CPU-to-node map on x86 with NUMA. Which makes this check wrong on the above topologies so circumvent it. [ 0.444413] Booting Node 0, Processors #1 #2 #3 #4 #5 Ok. [ 0.461388] ------------[ cut here ]------------ [ 0.465997] WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81() [ 0.473960] Hardware name: Dinar [ 0.477170] sched: CPU #6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. [ 0.486860] Booting Node 1, Processors #6 [ 0.491104] Modules linked in: [ 0.494141] Pid: 0, comm: swapper/6 Not tainted 3.4.0+ #1 [ 0.499510] Call Trace: [ 0.501946] [<ffffffff8144bf92>] ? topology_sane.clone.1+0x6e/0x81 [ 0.508185] [<ffffffff8102f1fc>] warn_slowpath_common+0x85/0x9d [ 0.514163] [<ffffffff8102f2b7>] warn_slowpath_fmt+0x46/0x48 [ 0.519881] [<ffffffff8144bf92>] topology_sane.clone.1+0x6e/0x81 [ 0.525943] [<ffffffff8144c234>] set_cpu_sibling_map+0x251/0x371 [ 0.532004] [<ffffffff8144c4ee>] start_secondary+0x19a/0x218 [ 0.537729] ---[ end trace 4eaa2a86a8e2da22 ]--- [ 0.628197] #7 #8 #9 #10 #11 Ok. [ 0.807108] Booting Node 3, Processors #12 #13 #14 #15 #16 #17 Ok. [ 0.897587] Booting Node 2, Processors #18 #19 #20 #21 #22 #23 Ok. [ 0.917443] Brought up 24 CPUs We ran a topology sanity check test we have here on it and it all looks ok... hopefully :). Signed-off-by: Borislav Petkov <[email protected]> Cc: Andreas Herrmann <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
…d reasons We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Cc: [email protected]
commit ce2ac52 upstream. Kjell Braden reported this oops: [ 833.211970] BUG: unable to handle kernel NULL pointer dereference at (null) [ 833.212816] IP: [< (null)>] (null) [ 833.213280] PGD 1b9b2067 PUD e9f7067 PMD 0 [ 833.213874] Oops: 0010 [#1] SMP [ 833.214344] CPU 0 [ 833.214458] Modules linked in: des_generic md4 nls_utf8 cifs vboxvideo drm snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq bnep rfcomm snd_timer bluetooth snd_seq_device ppdev snd vboxguest parport_pc joydev mac_hid soundcore snd_page_alloc psmouse i2c_piix4 serio_raw lp parport usbhid hid e1000 [ 833.215629] [ 833.215629] Pid: 1752, comm: mount.cifs Not tainted 3.0.0-rc7-bisectcifs-fec11dd9a0+ #18 innotek GmbH VirtualBox/VirtualBox [ 833.215629] RIP: 0010:[<0000000000000000>] [< (null)>] (null) [ 833.215629] RSP: 0018:ffff8800119c9c50 EFLAGS: 00010282 [ 833.215629] RAX: ffffffffa02186c0 RBX: ffff88000c427780 RCX: 0000000000000000 [ 833.215629] RDX: 0000000000000000 RSI: ffff88000c427780 RDI: ffff88000c4362e8 [ 833.215629] RBP: ffff8800119c9c88 R08: ffff88001fc15e30 R09: 00000000d69515c7 [ 833.215629] R10: ffffffffa0201972 R11: ffff88000e8f6a28 R12: ffff88000c4362e8 [ 833.215629] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88001181aaa6 [ 833.215629] FS: 00007f2986171700(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000 [ 833.215629] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 833.215629] CR2: 0000000000000000 CR3: 000000001b982000 CR4: 00000000000006f0 [ 833.215629] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 833.215629] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 833.215629] Process mount.cifs (pid: 1752, threadinfo ffff8800119c8000, task ffff88001c1c16f0) [ 833.215629] Stack: [ 833.215629] ffffffff8116a9b5 ffff8800119c9c88 ffffffff81178075 0000000000000286 [ 833.215629] 0000000000000000 ffff88000c4276c0 ffff8800119c9ce8 ffff8800119c9cc8 [ 833.215629] ffffffff8116b06e ffff88001bc6fc00 ffff88000c4276c0 ffff88000c4276c0 [ 833.215629] Call Trace: [ 833.215629] [<ffffffff8116a9b5>] ? d_alloc_and_lookup+0x45/0x90 [ 833.215629] [<ffffffff81178075>] ? d_lookup+0x35/0x60 [ 833.215629] [<ffffffff8116b06e>] __lookup_hash.part.14+0x9e/0xc0 [ 833.215629] [<ffffffff8116b1d6>] lookup_one_len+0x146/0x1e0 [ 833.215629] [<ffffffff815e4f7e>] ? _raw_spin_lock+0xe/0x20 [ 833.215629] [<ffffffffa01eef0d>] cifs_do_mount+0x26d/0x500 [cifs] [ 833.215629] [<ffffffff81163bd3>] mount_fs+0x43/0x1b0 [ 833.215629] [<ffffffff8117d41a>] vfs_kern_mount+0x6a/0xd0 [ 833.215629] [<ffffffff8117e584>] do_kern_mount+0x54/0x110 [ 833.215629] [<ffffffff8117fdc2>] do_mount+0x262/0x840 [ 833.215629] [<ffffffff81108a0e>] ? __get_free_pages+0xe/0x50 [ 833.215629] [<ffffffff8117f9ca>] ? copy_mount_options+0x3a/0x180 [ 833.215629] [<ffffffff8118075d>] sys_mount+0x8d/0xe0 [ 833.215629] [<ffffffff815ece82>] system_call_fastpath+0x16/0x1b [ 833.215629] Code: Bad RIP value. [ 833.215629] RIP [< (null)>] (null) [ 833.215629] RSP <ffff8800119c9c50> [ 833.215629] CR2: 0000000000000000 [ 833.238525] ---[ end trace ec00758b8d44f529 ]--- When walking down the path on the server, it's possible to hit a symlink. The path walking code assumes that the caller will handle that situation properly, but cifs_get_root() isn't set up for it. This patch prevents the oops by simply returning an error. A better solution would be to try and chase the symlinks here, but that's fairly complicated to handle. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=53221 Reported-and-tested-by: Kjell Braden <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Steve French <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…s struct file commit e4daf1f upstream. The following call chain: ------------------------------------------------------------ nfs4_get_vfs_file - nfsd_open - dentry_open - do_dentry_open - __get_file_write_access - get_write_access - return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY; ------------------------------------------------------------ can result in the following state: ------------------------------------------------------------ struct nfs4_file { ... fi_fds = {0xffff880c1fa65c80, 0xffffffffffffffe6, 0x0}, fi_access = {{ counter = 0x1 }, { counter = 0x0 }}, ... ------------------------------------------------------------ 1) First time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is NULL, hence nfsd_open() is called where we get status set to an error and fp->fi_fds[O_WRONLY] to -ETXTBSY. Thus we do not reach nfs4_file_get_access() and fi_access[O_WRONLY] is not incremented. 2) Second time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is NOT NULL (-ETXTBSY), so nfsd_open() is NOT called, but nfs4_file_get_access() IS called and fi_access[O_WRONLY] is incremented. Thus we leave a landmine in the form of the nfs4_file data structure in an incorrect state. 3) Eventually, when __nfs4_file_put_access() is called it finds fi_access[O_WRONLY] being non-zero, it decrements it and calls nfs4_file_put_fd() which tries to fput -ETXTBSY. ------------------------------------------------------------ ... [exception RIP: fput+0x9] RIP: ffffffff81177fa9 RSP: ffff88062e365c90 RFLAGS: 00010282 RAX: ffff880c2b3d99cc RBX: ffff880c2b3d9978 RCX: 0000000000000002 RDX: dead000000100101 RSI: 0000000000000001 RDI: ffffffffffffffe6 RBP: ffff88062e365c90 R8: ffff88041fe797d8 R9: ffff88062e365d58 R10: 0000000000000008 R11: 0000000000000000 R12: 0000000000000001 R13: 0000000000000007 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff88062e365c98] __nfs4_file_put_access at ffffffffa0562334 [nfsd] #10 [ffff88062e365cc8] nfs4_file_put_access at ffffffffa05623ab [nfsd] #11 [ffff88062e365ce8] free_generic_stateid at ffffffffa056634d [nfsd] #12 [ffff88062e365d18] release_open_stateid at ffffffffa0566e4b [nfsd] #13 [ffff88062e365d38] nfsd4_close at ffffffffa0567401 [nfsd] #14 [ffff88062e365d88] nfsd4_proc_compound at ffffffffa0557f28 [nfsd] #15 [ffff88062e365dd8] nfsd_dispatch at ffffffffa054543e [nfsd] #16 [ffff88062e365e18] svc_process_common at ffffffffa04ba5a4 [sunrpc] #17 [ffff88062e365e98] svc_process at ffffffffa04babe0 [sunrpc] #18 [ffff88062e365eb8] nfsd at ffffffffa0545b62 [nfsd] #19 [ffff88062e365ee8] kthread at ffffffff81090886 #20 [ffff88062e365f48] kernel_thread at ffffffff8100c14a ------------------------------------------------------------ Signed-off-by: Harshula Jayasuriya <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…s struct file The following call chain: ------------------------------------------------------------ nfs4_get_vfs_file - nfsd_open - dentry_open - do_dentry_open - __get_file_write_access - get_write_access - return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY; ------------------------------------------------------------ can result in the following state: ------------------------------------------------------------ struct nfs4_file { ... fi_fds = {0xffff880c1fa65c80, 0xffffffffffffffe6, 0x0}, fi_access = {{ counter = 0x1 }, { counter = 0x0 }}, ... ------------------------------------------------------------ 1) First time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is NULL, hence nfsd_open() is called where we get status set to an error and fp->fi_fds[O_WRONLY] to -ETXTBSY. Thus we do not reach nfs4_file_get_access() and fi_access[O_WRONLY] is not incremented. 2) Second time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is NOT NULL (-ETXTBSY), so nfsd_open() is NOT called, but nfs4_file_get_access() IS called and fi_access[O_WRONLY] is incremented. Thus we leave a landmine in the form of the nfs4_file data structure in an incorrect state. 3) Eventually, when __nfs4_file_put_access() is called it finds fi_access[O_WRONLY] being non-zero, it decrements it and calls nfs4_file_put_fd() which tries to fput -ETXTBSY. ------------------------------------------------------------ ... [exception RIP: fput+0x9] RIP: ffffffff81177fa9 RSP: ffff88062e365c90 RFLAGS: 00010282 RAX: ffff880c2b3d99cc RBX: ffff880c2b3d9978 RCX: 0000000000000002 RDX: dead000000100101 RSI: 0000000000000001 RDI: ffffffffffffffe6 RBP: ffff88062e365c90 R8: ffff88041fe797d8 R9: ffff88062e365d58 R10: 0000000000000008 R11: 0000000000000000 R12: 0000000000000001 R13: 0000000000000007 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff88062e365c98] __nfs4_file_put_access at ffffffffa0562334 [nfsd] #10 [ffff88062e365cc8] nfs4_file_put_access at ffffffffa05623ab [nfsd] #11 [ffff88062e365ce8] free_generic_stateid at ffffffffa056634d [nfsd] #12 [ffff88062e365d18] release_open_stateid at ffffffffa0566e4b [nfsd] #13 [ffff88062e365d38] nfsd4_close at ffffffffa0567401 [nfsd] #14 [ffff88062e365d88] nfsd4_proc_compound at ffffffffa0557f28 [nfsd] #15 [ffff88062e365dd8] nfsd_dispatch at ffffffffa054543e [nfsd] #16 [ffff88062e365e18] svc_process_common at ffffffffa04ba5a4 [sunrpc] #17 [ffff88062e365e98] svc_process at ffffffffa04babe0 [sunrpc] #18 [ffff88062e365eb8] nfsd at ffffffffa0545b62 [nfsd] #19 [ffff88062e365ee8] kthread at ffffffff81090886 #20 [ffff88062e365f48] kernel_thread at ffffffff8100c14a ------------------------------------------------------------ Cc: [email protected] Signed-off-by: Harshula Jayasuriya <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]>
current_cpu_type() is not preemption-safe. If CONFIG_PREEMPT is enabled then mipsxx_reg_setup() can be called from preemptible state. Added get_cpu()/put_cpu() pair to make it preemption-safe. This was found while testing oprofile with CONFIG_DEBUG_PREEMPT enable. /usr/zntestsuite # opcontrol --init /usr/zntestsuite # opcontrol --setup --event=L2_CACHE_ACCESSES:500 --event=L2_CACHE_MISSES:500 --no-vmlinux /usr/zntestsuite # opcontrol --start Using 2.6+ OProfile kernel interface. BUG: using smp_processor_id() in preemptible [00000000] code: oprofiled/1362 caller is mipsxx_reg_setup+0x11c/0x164 CPU: 0 PID: 1362 Comm: oprofiled Not tainted 3.10.4 #18 Stack : 00000006 70757465 00000000 00000000 00000000 00000000 80b173f6 00000037 80b10000 00000000 80b21614 88f5a220 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 89c49c00 89c49c2c 80721254 807b7927 8012c1d0 80b10000 80721254 00000000 00000552 88f5a220 80b1335c 807b78e6 89c49ba8 ... Call Trace: [<801099a4>] show_stack+0x64/0x7c [<80665520>] dump_stack+0x20/0x2c [<803a2250>] debug_smp_processor_id+0xe0/0xf0 [<8052df24>] mipsxx_reg_setup+0x11c/0x164 [<8052cd70>] op_mips_setup+0x24/0x4c [<80529cfc>] oprofile_setup+0x5c/0x12c [<8052b9f8>] event_buffer_open+0x78/0xf8 [<801c3150>] do_dentry_open.isra.15+0x2b8/0x3b0 [<801c3270>] finish_open+0x28/0x4c [<801d49b8>] do_last.isra.41+0x2cc/0xd00 [<801d54a0>] path_openat+0xb4/0x4c4 [<801d5c44>] do_filp_open+0x3c/0xac [<801c4744>] do_sys_open+0x110/0x1f4 [<8010f47c>] stack_done+0x20/0x44 Bug reported and original patch by Jerin Jacob <[email protected]>. Signed-off-by: Ralf Baechle <[email protected]> Acked-by: Jerin Jacob <[email protected]>
When booting secondary CPUs, announce_cpu() is called to show which cpu has been brought up. For example: [ 0.402751] smpboot: Booting Node 0, Processors #1 #2 #3 #4 #5 OK [ 0.525667] smpboot: Booting Node 1, Processors #6 #7 #8 #9 #10 #11 OK [ 0.755592] smpboot: Booting Node 0, Processors #12 #13 #14 #15 #16 #17 OK [ 0.890495] smpboot: Booting Node 1, Processors #18 #19 #20 #21 #22 #23 But the last "OK" is lost, because 'nr_cpu_ids-1' represents the maximum possible cpu id. It should use the maximum present cpu id in case not all CPUs booted up. Signed-off-by: Libin <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ tweaked the changelog, removed unnecessary line break, tweaked the format to align the fields vertically. ] Signed-off-by: Ingo Molnar <[email protected]>
When memcg code needs to know whether any given memcg has children, it uses the cgroup child iteration primitives and returns true/false depending on whether the iteration loop is executed at least once or not. Because a cgroup's list of children is RCU protected, these primitives require the RCU read-lock to be held, which is not the case for all memcg callers. This results in the following splat when e.g. enabling hierarchy mode: WARNING: CPU: 3 PID: 1 at kernel/cgroup.c:3043 css_next_child+0xa3/0x160() CPU: 3 PID: 1 Comm: systemd Not tainted 3.12.0-rc5-00117-g83f11a9-dirty #18 Hardware name: LENOVO 3680B56/3680B56, BIOS 6QET69WW (1.39 ) 04/26/2012 Call Trace: dump_stack+0x54/0x74 warn_slowpath_common+0x78/0xa0 warn_slowpath_null+0x1a/0x20 css_next_child+0xa3/0x160 mem_cgroup_hierarchy_write+0x5b/0xa0 cgroup_file_write+0x108/0x2a0 vfs_write+0xbd/0x1e0 SyS_write+0x4c/0xa0 system_call_fastpath+0x16/0x1b In the memcg case, we only care about children when we are attempting to modify inheritable attributes interactively. Racing with deletion could mean a spurious -EBUSY, no problem. Racing with addition is handled just fine as well through the memcg_create_mutex: if the child group is not on the list after the mutex is acquired, it won't be initialized from the parent's attributes until after the unlock. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
commit a207f59 upstream. The probe function is supposed to return NULL on failure (as we can see in kobj_lookup: kobj = probe(dev, index, data); ... if (kobj) return kobj; However, in loop and brd, it returns negative error from ERR_PTR. This causes a crash if we simulate disk allocation failure and run less -f /dev/loop0 because the negative number is interpreted as a pointer: BUG: unable to handle kernel NULL pointer dereference at 00000000000002b4 IP: [<ffffffff8118b188>] __blkdev_get+0x28/0x450 PGD 23c677067 PUD 23d6d1067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: loop hpfs nvidia(PO) ip6table_filter ip6_tables uvesafb cfbcopyarea cfbimgblt cfbfillrect fbcon font bitblit fbcon_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb fbdev msr ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc tun ipv6 cpufreq_stats cpufreq_ondemand cpufreq_userspace cpufreq_powersave cpufreq_conservative hid_generic spadfs usbhid hid fuse raid0 snd_usb_audio snd_pcm_oss snd_mixer_oss md_mod snd_pcm snd_timer snd_page_alloc snd_hwdep snd_usbmidi_lib dmi_sysfs snd_rawmidi nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack snd soundcore lm85 hwmon_vid ohci_hcd ehci_pci ehci_hcd serverworks sata_svw libata acpi_cpufreq freq_table mperf ide_core usbcore kvm_amd kvm tg3 i2c_piix4 libphy microcode e100 usb_common ptp skge i2c_core pcspkr k10temp evdev floppy hwmon pps_core mii rtc_cmos button processor unix [last unloaded: nvidia] CPU: 1 PID: 6831 Comm: less Tainted: P W O 3.10.15-devel #18 Hardware name: empty empty/S3992-E, BIOS 'V1.06 ' 06/09/2009 task: ffff880203cc6bc0 ti: ffff88023e47c000 task.ti: ffff88023e47c000 RIP: 0010:[<ffffffff8118b188>] [<ffffffff8118b188>] __blkdev_get+0x28/0x450 RSP: 0018:ffff88023e47dbd8 EFLAGS: 00010286 RAX: ffffffffffffff74 RBX: ffffffffffffff74 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001 RBP: ffff88023e47dc18 R08: 0000000000000002 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88023f519658 R13: ffffffff8118c300 R14: 0000000000000000 R15: ffff88023f519640 FS: 00007f2070bf7700(0000) GS:ffff880247400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000002b4 CR3: 000000023da1d000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: 0000000000000002 0000001d00000000 000000003e47dc50 ffff88023f519640 ffff88043d5bb668 ffffffff8118c300 ffff88023d683550 ffff88023e47de60 ffff88023e47dc98 ffffffff8118c10d 0000001d81605698 0000000000000292 Call Trace: [<ffffffff8118c300>] ? blkdev_get_by_dev+0x60/0x60 [<ffffffff8118c10d>] blkdev_get+0x1dd/0x370 [<ffffffff8118c300>] ? blkdev_get_by_dev+0x60/0x60 [<ffffffff813cea6c>] ? _raw_spin_unlock+0x2c/0x50 [<ffffffff8118c300>] ? blkdev_get_by_dev+0x60/0x60 [<ffffffff8118c365>] blkdev_open+0x65/0x80 [<ffffffff8114d12e>] do_dentry_open.isra.18+0x23e/0x2f0 [<ffffffff8114d214>] finish_open+0x34/0x50 [<ffffffff8115e122>] do_last.isra.62+0x2d2/0xc50 [<ffffffff8115eb58>] path_openat.isra.63+0xb8/0x4d0 [<ffffffff81115a8e>] ? might_fault+0x4e/0xa0 [<ffffffff8115f4f0>] do_filp_open+0x40/0x90 [<ffffffff813cea6c>] ? _raw_spin_unlock+0x2c/0x50 [<ffffffff8116db85>] ? __alloc_fd+0xa5/0x1f0 [<ffffffff8114e45f>] do_sys_open+0xef/0x1d0 [<ffffffff8114e559>] SyS_open+0x19/0x20 [<ffffffff813cff16>] system_call_fastpath+0x1a/0x1f Code: 44 00 00 55 48 89 e5 41 57 49 89 ff 41 56 41 89 d6 41 55 41 54 4c 8d 67 18 53 48 83 ec 18 89 75 cc e9 f2 00 00 00 0f 1f 44 00 00 <48> 8b 80 40 03 00 00 48 89 df 4c 8b 68 58 e8 d5 a4 07 00 44 89 RIP [<ffffffff8118b188>] __blkdev_get+0x28/0x450 RSP <ffff88023e47dbd8> CR2: 00000000000002b4 ---[ end trace bb7f32dbf02398dc ]--- The brd change should be backported to stable kernels starting with 2.6.25. The loop change should be backported to stable kernels starting with 2.6.22. Signed-off-by: Mikulas Patocka <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit a207f59 upstream. The probe function is supposed to return NULL on failure (as we can see in kobj_lookup: kobj = probe(dev, index, data); ... if (kobj) return kobj; However, in loop and brd, it returns negative error from ERR_PTR. This causes a crash if we simulate disk allocation failure and run less -f /dev/loop0 because the negative number is interpreted as a pointer: BUG: unable to handle kernel NULL pointer dereference at 00000000000002b4 IP: [<ffffffff8118b188>] __blkdev_get+0x28/0x450 PGD 23c677067 PUD 23d6d1067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: loop hpfs nvidia(PO) ip6table_filter ip6_tables uvesafb cfbcopyarea cfbimgblt cfbfillrect fbcon font bitblit fbcon_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb fbdev msr ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc tun ipv6 cpufreq_stats cpufreq_ondemand cpufreq_userspace cpufreq_powersave cpufreq_conservative hid_generic spadfs usbhid hid fuse raid0 snd_usb_audio snd_pcm_oss snd_mixer_oss md_mod snd_pcm snd_timer snd_page_alloc snd_hwdep snd_usbmidi_lib dmi_sysfs snd_rawmidi nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack snd soundcore lm85 hwmon_vid ohci_hcd ehci_pci ehci_hcd serverworks sata_svw libata acpi_cpufreq freq_table mperf ide_core usbcore kvm_amd kvm tg3 i2c_piix4 libphy microcode e100 usb_common ptp skge i2c_core pcspkr k10temp evdev floppy hwmon pps_core mii rtc_cmos button processor unix [last unloaded: nvidia] CPU: 1 PID: 6831 Comm: less Tainted: P W O 3.10.15-devel #18 Hardware name: empty empty/S3992-E, BIOS 'V1.06 ' 06/09/2009 task: ffff880203cc6bc0 ti: ffff88023e47c000 task.ti: ffff88023e47c000 RIP: 0010:[<ffffffff8118b188>] [<ffffffff8118b188>] __blkdev_get+0x28/0x450 RSP: 0018:ffff88023e47dbd8 EFLAGS: 00010286 RAX: ffffffffffffff74 RBX: ffffffffffffff74 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001 RBP: ffff88023e47dc18 R08: 0000000000000002 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88023f519658 R13: ffffffff8118c300 R14: 0000000000000000 R15: ffff88023f519640 FS: 00007f2070bf7700(0000) GS:ffff880247400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000002b4 CR3: 000000023da1d000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: 0000000000000002 0000001d00000000 000000003e47dc50 ffff88023f519640 ffff88043d5bb668 ffffffff8118c300 ffff88023d683550 ffff88023e47de60 ffff88023e47dc98 ffffffff8118c10d 0000001d81605698 0000000000000292 Call Trace: [<ffffffff8118c300>] ? blkdev_get_by_dev+0x60/0x60 [<ffffffff8118c10d>] blkdev_get+0x1dd/0x370 [<ffffffff8118c300>] ? blkdev_get_by_dev+0x60/0x60 [<ffffffff813cea6c>] ? _raw_spin_unlock+0x2c/0x50 [<ffffffff8118c300>] ? blkdev_get_by_dev+0x60/0x60 [<ffffffff8118c365>] blkdev_open+0x65/0x80 [<ffffffff8114d12e>] do_dentry_open.isra.18+0x23e/0x2f0 [<ffffffff8114d214>] finish_open+0x34/0x50 [<ffffffff8115e122>] do_last.isra.62+0x2d2/0xc50 [<ffffffff8115eb58>] path_openat.isra.63+0xb8/0x4d0 [<ffffffff81115a8e>] ? might_fault+0x4e/0xa0 [<ffffffff8115f4f0>] do_filp_open+0x40/0x90 [<ffffffff813cea6c>] ? _raw_spin_unlock+0x2c/0x50 [<ffffffff8116db85>] ? __alloc_fd+0xa5/0x1f0 [<ffffffff8114e45f>] do_sys_open+0xef/0x1d0 [<ffffffff8114e559>] SyS_open+0x19/0x20 [<ffffffff813cff16>] system_call_fastpath+0x1a/0x1f Code: 44 00 00 55 48 89 e5 41 57 49 89 ff 41 56 41 89 d6 41 55 41 54 4c 8d 67 18 53 48 83 ec 18 89 75 cc e9 f2 00 00 00 0f 1f 44 00 00 <48> 8b 80 40 03 00 00 48 89 df 4c 8b 68 58 e8 d5 a4 07 00 44 89 RIP [<ffffffff8118b188>] __blkdev_get+0x28/0x450 RSP <ffff88023e47dbd8> CR2: 00000000000002b4 ---[ end trace bb7f32dbf02398dc ]--- The brd change should be backported to stable kernels starting with 2.6.25. The loop change should be backported to stable kernels starting with 2.6.22. Signed-off-by: Mikulas Patocka <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
The .config() calls clk_get_rate() which might sleep, so we need to set pwm_chip can_sleep flag. Otherwise, we see the following warning when using PWM driven heartbeat led. WARNING: CPU: 0 PID: 0 at kernel/locking/mutex.c:856 mutex_trylock+0x184/0x1a4() DEBUG_LOCKS_WARN_ON(in_interrupt()) Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.0-rc5 #18 [<c0015420>] (unwind_backtrace) from [<c0012cb0>] (show_stack+0x10/0x14) [<c0012cb0>] (show_stack) from [<c001daf8>] (warn_slowpath_common+0x6c/0x8c) [<c001daf8>] (warn_slowpath_common) from [<c001dbac>] (warn_slowpath_fmt+0x30/0x40) [<c001dbac>] (warn_slowpath_fmt) from [<c045df74>] (mutex_trylock+0x184/0x1a4) [<c045df74>] (mutex_trylock) from [<c0360950>] (clk_prepare_lock+0xc/0xec) [<c0360950>] (clk_prepare_lock) from [<c0362020>] (clk_get_rate+0xc/0x68) [<c0362020>] (clk_get_rate) from [<c028d07c>] (mxs_pwm_config+0x20/0x198) [<c028d07c>] (mxs_pwm_config) from [<c028bde8>] (pwm_config+0x60/0x70) [<c028bde8>] (pwm_config) from [<c034b61c>] (__led_pwm_set+0x1c/0x3c) [<c034b61c>] (__led_pwm_set) from [<c034bc3c>] (led_heartbeat_function+0x70/0x110) [<c034bc3c>] (led_heartbeat_function) from [<c00292f0>] (call_timer_fn+0x7c/0x164) [<c00292f0>] (call_timer_fn) from [<c00295c8>] (run_timer_softirq+0x1f0/0x260) [<c00295c8>] (run_timer_softirq) from [<c002255c>] (__do_softirq+0xc4/0x2f0) [<c002255c>] (__do_softirq) from [<c0022890>] (irq_exit+0xa4/0x10c) [<c0022890>] (irq_exit) from [<c0010240>] (handle_IRQ+0x34/0x84) [<c0010240>] (handle_IRQ) from [<c0013524>] (__irq_svc+0x44/0x54) [<c0013524>] (__irq_svc) from [<c00107f8>] (arch_cpu_idle+0x40/0x48) [<c00107f8>] (arch_cpu_idle) from [<c005deb8>] (cpu_startup_entry+0x70/0x198) [<c005deb8>] (cpu_startup_entry) from [<c060aac8>] (start_kernel+0x2a8/0x2f8) Reported-by: Stefan Wahren <[email protected]> Signed-off-by: Shawn Guo <[email protected]> Acked-by: Alexandre Belloni <[email protected]> Tested-by: Stefan Wahren <[email protected]> Signed-off-by: Thierry Reding <[email protected]>
This patch tries to fix this crash: #5 [ffff88003c1cd690] do_invalid_op at ffffffff810166d5 #6 [ffff88003c1cd730] invalid_op at ffffffff8159b2de [exception RIP: ocfs2_direct_IO_get_blocks+359] RIP: ffffffffa05dfa27 RSP: ffff88003c1cd7e8 RFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff88003c1cdaa8 RCX: 0000000000000000 RDX: 000000000000000c RSI: ffff880027a95000 RDI: ffff88003c79b540 RBP: ffff88003c1cd858 R8: 0000000000000000 R9: ffffffff815f6ba0 R10: 00000000000001c9 R11: 00000000000001c9 R12: ffff88002d271500 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000001000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff88003c1cd860] do_direct_IO at ffffffff811cd31b #8 [ffff88003c1cd950] direct_IO_iovec at ffffffff811cde9c #9 [ffff88003c1cd9b0] do_blockdev_direct_IO at ffffffff811ce764 #10 [ffff88003c1cdb80] __blockdev_direct_IO at ffffffff811ce7cc #11 [ffff88003c1cdbb0] ocfs2_direct_IO at ffffffffa05df756 [ocfs2] #12 [ffff88003c1cdbe0] generic_file_direct_write_iter at ffffffff8112f935 #13 [ffff88003c1cdc40] ocfs2_file_write_iter at ffffffffa0600ccc [ocfs2] #14 [ffff88003c1cdd50] do_aio_write at ffffffff8119126c #15 [ffff88003c1cddc0] aio_rw_vect_retry at ffffffff811d9bb4 #16 [ffff88003c1cddf0] aio_run_iocb at ffffffff811db880 #17 [ffff88003c1cde30] io_submit_one at ffffffff811dc238 #18 [ffff88003c1cde80] do_io_submit at ffffffff811dc437 #19 [ffff88003c1cdf70] sys_io_submit at ffffffff811dc530 #20 [ffff88003c1cdf80] system_call_fastpath at ffffffff8159a159 It crashes at BUG_ON(create && (ext_flags & OCFS2_EXT_REFCOUNTED)); in ocfs2_direct_IO_get_blocks. ocfs2_direct_IO_get_blocks is expecting the OCFS2_EXT_REFCOUNTED be removed in ocfs2_prepare_inode_for_write() if it was there. But no cluster lock is taken during the time before (or inside) ocfs2_prepare_inode_for_write() and after ocfs2_direct_IO_get_blocks(). It can happen in this case: Node A(which crashes) Node B ------------------------ --------------------------- ocfs2_file_aio_write ocfs2_prepare_inode_for_write ocfs2_inode_lock ... ocfs2_inode_unlock #no refcount found .... ocfs2_reflink ocfs2_inode_lock ... ocfs2_inode_unlock #now, refcount flag set on extent ... flush change to disk ocfs2_direct_IO_get_blocks ocfs2_get_clusters #extent map miss #buffer_head miss read extents from disk found refcount flag on extent crash.. Fix: Take rw_lock in ocfs2_reflink path Signed-off-by: Wengang Wang <[email protected]> Reviewed-by: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
We have to return the boolean here if the tag presents or not, not just ANDing the TCI with the mask which results to: [ 709.412097] test_bpf: #18 LD_VLAN_TAG_PRESENT [ 709.412245] ret 4096 != 1 [ 709.412332] ret 4096 != 1 [ 709.412333] FAIL (2 times) Signed-off-by: Denis Kirjanov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
[ Upstream commit e3e82fc ] When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 #15 [ffff88aa841efb88] device_del at ffffffff82179d23 #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/[email protected] Suggested-by: "Ismail, Mustafa" <[email protected]> Signed-off-by: Shifeng Li <[email protected]> Reviewed-by: Shiraz Saleem <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit e3e82fc ] When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 #15 [ffff88aa841efb88] device_del at ffffffff82179d23 #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/[email protected] Suggested-by: "Ismail, Mustafa" <[email protected]> Signed-off-by: Shifeng Li <[email protected]> Reviewed-by: Shiraz Saleem <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Andrii Nakryiko says: ==================== BPF register bounds logic and testing improvements This patch set adds a big set of manual and auto-generated test cases validating BPF verifier's register bounds tracking and deduction logic. See details in the last patch. We start with building a tester that validates existing <range> vs <scalar> verifier logic for range bounds. To make all this work, BPF verifier's logic needed a bunch of improvements to handle some cases that previously were not covered. This had no implications as to correctness of verifier logic, but it was incomplete enough to cause significant disagreements with alternative implementation of register bounds logic that tests in this patch set implement. So we need BPF verifier logic improvements to make all the tests pass. This is what we do in patches #3 through raspberrypi#9. The end goal of this work, though, is to extend BPF verifier range state tracking such as to allow to derive new range bounds when comparing non-const registers. There is some more investigative work required to investigate and fix existing potential issues with range tracking as part of ALU/ALU64 operations, so <range> x <range> part of v5 patch set ([0]) is dropped until these issues are sorted out. For now, we include preparatory refactorings and clean ups, that set up BPF verifier code base to extend the logic to <range> vs <range> logic in subsequent patch set. Patches raspberrypi#10-raspberrypi#16 perform preliminary refactorings without functionally changing anything. But they do clean up check_cond_jmp_op() logic and generalize a bunch of other pieces in is_branch_taken() logic. [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=797178&state=* v5->v6: - dropped <range> vs <range> patches (original patches raspberrypi#18 through raspberrypi#23) to add more register range sanity checks and fix preexisting issues; - comments improvements, addressing other feedback on first 17 patches (Eduard, Alexei); v4->v5: - added entirety of verifier reg bounds tracking changes, now handling <range> vs <range> cases (Alexei); - added way more comments trying to explain why deductions added are correct, hopefully they are useful and clarify things a bit (Daniel, Shung-Hsi); - added two preliminary selftests fixes necessary for RELEASE=1 build to work again, it keeps breaking. v3->v4: - improvements to reg_bounds tester (progress report, split 32-bit and 64-bit ranges, fix various verbosity output issues, etc); v2->v3: - fix a subtle little-endianness assumption inside parge_reg_state() (CI); v1->v2: - fix compilation when building selftests with llvm-16 toolchain (CI). ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
sg_init_one() relies on linearly mapped low memory for the safe utilization of virt_to_page(). Otherwise, we trigger a kernel BUG, kernel BUG at include/linux/scatterlist.h:187! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM Modules linked in: CPU: 0 PID: 2997 Comm: syz-executor198 Not tainted 6.8.0-syzkaller #0 Hardware name: ARM-Versatile Express PC is at sg_set_buf include/linux/scatterlist.h:187 [inline] PC is at sg_init_one+0x9c/0xa8 lib/scatterlist.c:143 LR is at sg_init_table+0x2c/0x40 lib/scatterlist.c:128 Backtrace: [<807e16ac>] (sg_init_one) from [<804c1824>] (zswap_decompress+0xbc/0x208 mm/zswap.c:1089) r7:83471c80 r6:def6d08c r5:844847d0 r4:ff7e7ef4 [<804c1768>] (zswap_decompress) from [<804c4468>] (zswap_load+0x15c/0x198 mm/zswap.c:1637) r9:8446eb80 r8:8446eb80 r7:8446eb84 r6:def6d08c r5:00000001 r4:844847d0 [<804c430c>] (zswap_load) from [<804b9644>] (swap_read_folio+0xa8/0x498 mm/page_io.c:518) r9:844ac800 r8:835e6c00 r7:00000000 r6:df955d4c r5:00000001 r4:def6d08c [<804b959c>] (swap_read_folio) from [<804bb064>] (swap_cluster_readahead+0x1c4/0x34c mm/swap_state.c:684) r10:00000000 r9:00000007 r8:df955d4b r7:00000000 r6:00000000 r5:00100cca r4:00000001 [<804baea0>] (swap_cluster_readahead) from [<804bb3b8>] (swapin_readahead+0x68/0x4a8 mm/swap_state.c:904) r10:df955eb8 r9:00000000 r8:00100cca r7:84476480 r6:00000001 r5:00000000 r4:00000001 [<804bb350>] (swapin_readahead) from [<8047cde0>] (do_swap_page+0x200/0xcc4 mm/memory.c:4046) r10:00000040 r9:00000000 r8:844ac800 r7:84476480 r6:00000001 r5:00000000 r4:df955eb8 [<8047cbe0>] (do_swap_page) from [<8047e6c4>] (handle_pte_fault mm/memory.c:5301 [inline]) [<8047cbe0>] (do_swap_page) from [<8047e6c4>] (__handle_mm_fault mm/memory.c:5439 [inline]) [<8047cbe0>] (do_swap_page) from [<8047e6c4>] (handle_mm_fault+0x3d8/0x12b8 mm/memory.c:5604) r10:00000040 r9:842b3900 r8:7eb0d000 r7:84476480 r6:7eb0d000 r5:835e6c00 r4:00000254 [<8047e2ec>] (handle_mm_fault) from [<80215d28>] (do_page_fault+0x148/0x3a8 arch/arm/mm/fault.c:326) r10:00000007 r9:842b3900 r8:7eb0d000 r7:00000207 r6:00000254 r5:7eb0d9b4 r4:df955fb0 [<80215be0>] (do_page_fault) from [<80216170>] (do_DataAbort+0x38/0xa8 arch/arm/mm/fault.c:558) r10:7eb0da7c r9:00000000 r8:80215be0 r7:df955fb0 r6:7eb0d9b4 r5:00000207 r4:8261d0e0 [<80216138>] (do_DataAbort) from [<80200e3c>] (__dabt_usr+0x5c/0x60 arch/arm/kernel/entry-armv.S:427) Exception stack(0xdf955fb0 to 0xdf955ff8) 5fa0: 00000000 00000000 22d5f800 0008d158 5fc0: 00000000 7eb0d9a4 00000000 00000109 00000000 00000000 7eb0da7c 7eb0da3c 5fe0: 00000000 7eb0d9a0 00000001 00066bd4 00000010 ffffffff r8:824a9044 r7:835e6c00 r6:ffffffff r5:00000010 r4:00066bd4 Code: 1a000004 e1822003 e8860094 e89da8f0 (e7f001f2) ---[ end trace 0000000000000000 ]--- ---------------- Code disassembly (best guess): 0: 1a000004 bne 0x18 4: e1822003 orr r2, r2, r3 8: e8860094 stm r6, {r2, r4, r7} c: e89da8f0 ldm sp, {r4, r5, r6, r7, fp, sp, pc} * 10: e7f001f2 udf #18 <-- trapping instruction Consequently, we have two choices: either employ kmap_to_page() alongside sg_set_page(), or resort to copying high memory contents to a temporary buffer residing in low memory. However, considering the introduction of the WARN_ON_ONCE in commit ef6e06b ("highmem: fix kmap_to_page() for kmap_local_page() addresses"), which specifically addresses high memory concerns, it appears that memcpy remains the sole viable option. Link: https://lkml.kernel.org/r/[email protected] Fixes: 270700d ("mm/zswap: remove the memcpy if acomp is not sleepable") Signed-off-by: Barry Song <[email protected]> Reported-by: [email protected] Closes: https://lore.kernel.org/all/[email protected]/ Tested-by: [email protected] Acked-by: Yosry Ahmed <[email protected]> Reviewed-by: Nhat Pham <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Chris Li <[email protected]> Cc: Ira Weiny <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
vhost_worker will call tun call backs to receive packets. If too many illegal packets arrives, tun_do_read will keep dumping packet contents. When console is enabled, it will costs much more cpu time to dump packet and soft lockup will be detected. net_ratelimit mechanism can be used to limit the dumping rate. PID: 33036 TASK: ffff949da6f20000 CPU: 23 COMMAND: "vhost-32980" #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e #3 [fffffe00003fced0] do_nmi at ffffffff8922660d #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663 [exception RIP: io_serial_in+20] RIP: ffffffff89792594 RSP: ffffa655314979e8 RFLAGS: 00000002 RAX: ffffffff89792500 RBX: ffffffff8af428a0 RCX: 0000000000000000 RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff8af428a0 RBP: 0000000000002710 R8: 0000000000000004 R9: 000000000000000f R10: 0000000000000000 R11: ffffffff8acbf64f R12: 0000000000000020 R13: ffffffff8acbf698 R14: 0000000000000058 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07 #12 [ffffa65531497b68] printk at ffffffff89318306 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun] #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun] #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net] #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost] #18 [ffffa65531497f10] kthread at ffffffff892d2e72 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors") Signed-off-by: Lei Chen <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Acked-by: Jason Wang <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
[ Upstream commit f8bbc07 ] vhost_worker will call tun call backs to receive packets. If too many illegal packets arrives, tun_do_read will keep dumping packet contents. When console is enabled, it will costs much more cpu time to dump packet and soft lockup will be detected. net_ratelimit mechanism can be used to limit the dumping rate. PID: 33036 TASK: ffff949da6f20000 CPU: 23 COMMAND: "vhost-32980" #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e #3 [fffffe00003fced0] do_nmi at ffffffff8922660d #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663 [exception RIP: io_serial_in+20] RIP: ffffffff89792594 RSP: ffffa655314979e8 RFLAGS: 00000002 RAX: ffffffff89792500 RBX: ffffffff8af428a0 RCX: 0000000000000000 RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff8af428a0 RBP: 0000000000002710 R8: 0000000000000004 R9: 000000000000000f R10: 0000000000000000 R11: ffffffff8acbf64f R12: 0000000000000020 R13: ffffffff8acbf698 R14: 0000000000000058 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07 #12 [ffffa65531497b68] printk at ffffffff89318306 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun] #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun] #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net] #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost] #18 [ffffa65531497f10] kthread at ffffffff892d2e72 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors") Signed-off-by: Lei Chen <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Acked-by: Jason Wang <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit f8bbc07 ] vhost_worker will call tun call backs to receive packets. If too many illegal packets arrives, tun_do_read will keep dumping packet contents. When console is enabled, it will costs much more cpu time to dump packet and soft lockup will be detected. net_ratelimit mechanism can be used to limit the dumping rate. PID: 33036 TASK: ffff949da6f20000 CPU: 23 COMMAND: "vhost-32980" #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e #3 [fffffe00003fced0] do_nmi at ffffffff8922660d #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663 [exception RIP: io_serial_in+20] RIP: ffffffff89792594 RSP: ffffa655314979e8 RFLAGS: 00000002 RAX: ffffffff89792500 RBX: ffffffff8af428a0 RCX: 0000000000000000 RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff8af428a0 RBP: 0000000000002710 R8: 0000000000000004 R9: 000000000000000f R10: 0000000000000000 R11: ffffffff8acbf64f R12: 0000000000000020 R13: ffffffff8acbf698 R14: 0000000000000058 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07 #12 [ffffa65531497b68] printk at ffffffff89318306 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun] #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun] #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net] #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost] #18 [ffffa65531497f10] kthread at ffffffff892d2e72 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors") Signed-off-by: Lei Chen <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Acked-by: Jason Wang <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 769e6a1 ] ui_browser__show() is capturing the input title that is stack allocated memory in hist_browser__run(). Avoid a use after return by strdup-ing the string. Committer notes: Further explanation from Ian Rogers: My command line using tui is: $ sudo bash -c 'rm /tmp/asan.log*; export ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a sleep 1; /tmp/perf/perf mem report' I then go to the perf annotate view and quit. This triggers the asan error (from the log file): ``` ==1254591==ERROR: AddressSanitizer: stack-use-after-return on address 0x7f2813331920 at pc 0x7f28180 65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10 READ of size 80 at 0x7f2813331920 thread T0 #0 0x7f2818065990 in __interceptor_strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461 #1 0x7f2817698251 in SLsmg_write_wrapped_string (/lib/x86_64-linux-gnu/libslang.so.2+0x98251) #2 0x7f28176984b9 in SLsmg_write_nstring (/lib/x86_64-linux-gnu/libslang.so.2+0x984b9) #3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60 #4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266 #5 0x55c94045c776 in ui_browser__show ui/browser.c:288 #6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206 #7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458 #8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412 #9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527 #10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613 #11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661 #12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671 #13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141 #14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805 #15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374 #16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516 #17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350 #18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403 #19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447 #20 0x55c9400e53ad in main tools/perf/perf.c:561 #21 0x7f28170456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360 #23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId: 84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93) Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746 This frame has 1 object(s): [32, 192) 'title' (line 747) <== Memory access at offset 32 is inside this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork ``` hist_browser__run isn't on the stack so the asan error looks legit. There's no clean init/exit on struct ui_browser so I may be trading a use-after-return for a memory leak, but that seems look a good trade anyway. Fixes: 05e8b08 ("perf ui browser: Stop using 'self'") Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yanteng Si <[email protected]> Cc: Yicong Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 769e6a1 ] ui_browser__show() is capturing the input title that is stack allocated memory in hist_browser__run(). Avoid a use after return by strdup-ing the string. Committer notes: Further explanation from Ian Rogers: My command line using tui is: $ sudo bash -c 'rm /tmp/asan.log*; export ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a sleep 1; /tmp/perf/perf mem report' I then go to the perf annotate view and quit. This triggers the asan error (from the log file): ``` ==1254591==ERROR: AddressSanitizer: stack-use-after-return on address 0x7f2813331920 at pc 0x7f28180 65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10 READ of size 80 at 0x7f2813331920 thread T0 #0 0x7f2818065990 in __interceptor_strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461 #1 0x7f2817698251 in SLsmg_write_wrapped_string (/lib/x86_64-linux-gnu/libslang.so.2+0x98251) #2 0x7f28176984b9 in SLsmg_write_nstring (/lib/x86_64-linux-gnu/libslang.so.2+0x984b9) #3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60 #4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266 #5 0x55c94045c776 in ui_browser__show ui/browser.c:288 #6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206 #7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458 #8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412 #9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527 #10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613 #11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661 #12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671 #13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141 #14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805 #15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374 #16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516 #17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350 #18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403 #19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447 #20 0x55c9400e53ad in main tools/perf/perf.c:561 #21 0x7f28170456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360 #23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId: 84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93) Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746 This frame has 1 object(s): [32, 192) 'title' (line 747) <== Memory access at offset 32 is inside this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork ``` hist_browser__run isn't on the stack so the asan error looks legit. There's no clean init/exit on struct ui_browser so I may be trading a use-after-return for a memory leak, but that seems look a good trade anyway. Fixes: 05e8b08 ("perf ui browser: Stop using 'self'") Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yanteng Si <[email protected]> Cc: Yicong Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 769e6a1 ] ui_browser__show() is capturing the input title that is stack allocated memory in hist_browser__run(). Avoid a use after return by strdup-ing the string. Committer notes: Further explanation from Ian Rogers: My command line using tui is: $ sudo bash -c 'rm /tmp/asan.log*; export ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a sleep 1; /tmp/perf/perf mem report' I then go to the perf annotate view and quit. This triggers the asan error (from the log file): ``` ==1254591==ERROR: AddressSanitizer: stack-use-after-return on address 0x7f2813331920 at pc 0x7f28180 65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10 READ of size 80 at 0x7f2813331920 thread T0 #0 0x7f2818065990 in __interceptor_strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461 #1 0x7f2817698251 in SLsmg_write_wrapped_string (/lib/x86_64-linux-gnu/libslang.so.2+0x98251) #2 0x7f28176984b9 in SLsmg_write_nstring (/lib/x86_64-linux-gnu/libslang.so.2+0x984b9) #3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60 #4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266 #5 0x55c94045c776 in ui_browser__show ui/browser.c:288 #6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206 #7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458 #8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412 #9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527 #10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613 #11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661 #12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671 #13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141 #14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805 #15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374 #16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516 #17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350 #18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403 #19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447 #20 0x55c9400e53ad in main tools/perf/perf.c:561 #21 0x7f28170456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360 #23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId: 84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93) Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746 This frame has 1 object(s): [32, 192) 'title' (line 747) <== Memory access at offset 32 is inside this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork ``` hist_browser__run isn't on the stack so the asan error looks legit. There's no clean init/exit on struct ui_browser so I may be trading a use-after-return for a memory leak, but that seems look a good trade anyway. Fixes: 05e8b08 ("perf ui browser: Stop using 'self'") Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yanteng Si <[email protected]> Cc: Yicong Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] #10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] #11 dio_complete at ffffffff8c2b9fa7 #12 do_blockdev_direct_IO at ffffffff8c2bc09f #13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] #14 generic_file_direct_write at ffffffff8c1dcf14 #15 __generic_file_write_iter at ffffffff8c1dd07b #16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] #17 aio_write at ffffffff8c2cc72e #18 kmem_cache_alloc at ffffffff8c248dde #19 do_io_submit at ffffffff8c2ccada #20 do_syscall_64 at ffffffff8c004984 #21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Reviewed-by: Heming Zhao <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
commit be346c1 upstream. The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] #10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] #11 dio_complete at ffffffff8c2b9fa7 #12 do_blockdev_direct_IO at ffffffff8c2bc09f #13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] #14 generic_file_direct_write at ffffffff8c1dcf14 #15 __generic_file_write_iter at ffffffff8c1dd07b #16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] #17 aio_write at ffffffff8c2cc72e #18 kmem_cache_alloc at ffffffff8c248dde #19 do_io_submit at ffffffff8c2ccada #20 do_syscall_64 at ffffffff8c004984 #21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Reviewed-by: Heming Zhao <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit be346c1 upstream. The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] #10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] #11 dio_complete at ffffffff8c2b9fa7 #12 do_blockdev_direct_IO at ffffffff8c2bc09f #13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] #14 generic_file_direct_write at ffffffff8c1dcf14 #15 __generic_file_write_iter at ffffffff8c1dd07b #16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] #17 aio_write at ffffffff8c2cc72e #18 kmem_cache_alloc at ffffffff8c248dde #19 do_io_submit at ffffffff8c2ccada #20 do_syscall_64 at ffffffff8c004984 #21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Reviewed-by: Heming Zhao <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
When running BPF selftests (./test_progs -t sockmap_basic) on a Loongarch platform, the following kernel panic occurs: [...] Oops[#1]: CPU: 22 PID: 2824 Comm: test_progs Tainted: G OE 6.10.0-rc2+ #18 Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018 ... ... ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560 ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0 CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) PRMD: 0000000c (PPLV0 +PIE +PWE) EUEN: 00000007 (+FPE +SXE +ASXE -BTE) ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0) BADV: 0000000000000040 PRID: 0014c011 (Loongson-64bit, Loongson-3C5000) Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...) Stack : ... Call Trace: [<9000000004162774>] copy_page_to_iter+0x74/0x1c0 [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560 [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0 [<90000000049aae34>] inet_recvmsg+0x54/0x100 [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0 [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0 [<900000000481e27c>] sys_recvfrom+0x1c/0x40 [<9000000004c076ec>] do_syscall+0x8c/0xc0 [<9000000003731da4>] handle_syscall+0xc4/0x160 Code: ... ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Fatal exception Kernel relocated by 0x3510000 .text @ 0x9000000003710000 .data @ 0x9000000004d70000 .bss @ 0x9000000006469400 ---[ end Kernel panic - not syncing: Fatal exception ]--- [...] This crash happens every time when running sockmap_skb_verdict_shutdown subtest in sockmap_basic. This crash is because a NULL pointer is passed to page_address() in the sk_msg_recvmsg(). Due to the different implementations depending on the architecture, page_address(NULL) will trigger a panic on Loongarch platform but not on x86 platform. So this bug was hidden on x86 platform for a while, but now it is exposed on Loongarch platform. The root cause is that a zero length skb (skb->len == 0) was put on the queue. This zero length skb is a TCP FIN packet, which was sent by shutdown(), invoked in test_sockmap_skb_verdict_shutdown(): shutdown(p1, SHUT_WR); In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no page is put to this sge (see sg_set_page in sg_set_page), but this empty sge is queued into ingress_msg list. And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is got by sg_page(sge). Pass this NULL page to copy_page_to_iter(), which passes it to kmap_local_page() and to page_address(), then kernel panics. To solve this, we should skip this zero length skb. So in sk_msg_recvmsg(), if copy is zero, that means it's a zero length skb, skip invoking copy_page_to_iter(). We are using the EFAULT return triggered by copy_page_to_iter to check for is_fin in tcp_bpf.c. Fixes: 604326b ("bpf, sockmap: convert to generic sk_msg interface") Suggested-by: John Fastabend <[email protected]> Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/e3a16eacdc6740658ee02a33489b1b9d4912f378.1719992715.git.tanggeliang@kylinos.cn
[ Upstream commit f0c1802 ] When running BPF selftests (./test_progs -t sockmap_basic) on a Loongarch platform, the following kernel panic occurs: [...] Oops[raspberrypi#1]: CPU: 22 PID: 2824 Comm: test_progs Tainted: G OE 6.10.0-rc2+ raspberrypi#18 Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018 ... ... ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560 ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0 CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) PRMD: 0000000c (PPLV0 +PIE +PWE) EUEN: 00000007 (+FPE +SXE +ASXE -BTE) ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0) BADV: 0000000000000040 PRID: 0014c011 (Loongson-64bit, Loongson-3C5000) Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...) Stack : ... Call Trace: [<9000000004162774>] copy_page_to_iter+0x74/0x1c0 [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560 [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0 [<90000000049aae34>] inet_recvmsg+0x54/0x100 [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0 [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0 [<900000000481e27c>] sys_recvfrom+0x1c/0x40 [<9000000004c076ec>] do_syscall+0x8c/0xc0 [<9000000003731da4>] handle_syscall+0xc4/0x160 Code: ... ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Fatal exception Kernel relocated by 0x3510000 .text @ 0x9000000003710000 .data @ 0x9000000004d70000 .bss @ 0x9000000006469400 ---[ end Kernel panic - not syncing: Fatal exception ]--- [...] This crash happens every time when running sockmap_skb_verdict_shutdown subtest in sockmap_basic. This crash is because a NULL pointer is passed to page_address() in the sk_msg_recvmsg(). Due to the different implementations depending on the architecture, page_address(NULL) will trigger a panic on Loongarch platform but not on x86 platform. So this bug was hidden on x86 platform for a while, but now it is exposed on Loongarch platform. The root cause is that a zero length skb (skb->len == 0) was put on the queue. This zero length skb is a TCP FIN packet, which was sent by shutdown(), invoked in test_sockmap_skb_verdict_shutdown(): shutdown(p1, SHUT_WR); In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no page is put to this sge (see sg_set_page in sg_set_page), but this empty sge is queued into ingress_msg list. And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is got by sg_page(sge). Pass this NULL page to copy_page_to_iter(), which passes it to kmap_local_page() and to page_address(), then kernel panics. To solve this, we should skip this zero length skb. So in sk_msg_recvmsg(), if copy is zero, that means it's a zero length skb, skip invoking copy_page_to_iter(). We are using the EFAULT return triggered by copy_page_to_iter to check for is_fin in tcp_bpf.c. Fixes: 604326b ("bpf, sockmap: convert to generic sk_msg interface") Suggested-by: John Fastabend <[email protected]> Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/e3a16eacdc6740658ee02a33489b1b9d4912f378.1719992715.git.tanggeliang@kylinos.cn Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit f0c1802 ] When running BPF selftests (./test_progs -t sockmap_basic) on a Loongarch platform, the following kernel panic occurs: [...] Oops[#1]: CPU: 22 PID: 2824 Comm: test_progs Tainted: G OE 6.10.0-rc2+ #18 Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018 ... ... ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560 ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0 CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) PRMD: 0000000c (PPLV0 +PIE +PWE) EUEN: 00000007 (+FPE +SXE +ASXE -BTE) ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0) BADV: 0000000000000040 PRID: 0014c011 (Loongson-64bit, Loongson-3C5000) Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...) Stack : ... Call Trace: [<9000000004162774>] copy_page_to_iter+0x74/0x1c0 [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560 [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0 [<90000000049aae34>] inet_recvmsg+0x54/0x100 [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0 [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0 [<900000000481e27c>] sys_recvfrom+0x1c/0x40 [<9000000004c076ec>] do_syscall+0x8c/0xc0 [<9000000003731da4>] handle_syscall+0xc4/0x160 Code: ... ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Fatal exception Kernel relocated by 0x3510000 .text @ 0x9000000003710000 .data @ 0x9000000004d70000 .bss @ 0x9000000006469400 ---[ end Kernel panic - not syncing: Fatal exception ]--- [...] This crash happens every time when running sockmap_skb_verdict_shutdown subtest in sockmap_basic. This crash is because a NULL pointer is passed to page_address() in the sk_msg_recvmsg(). Due to the different implementations depending on the architecture, page_address(NULL) will trigger a panic on Loongarch platform but not on x86 platform. So this bug was hidden on x86 platform for a while, but now it is exposed on Loongarch platform. The root cause is that a zero length skb (skb->len == 0) was put on the queue. This zero length skb is a TCP FIN packet, which was sent by shutdown(), invoked in test_sockmap_skb_verdict_shutdown(): shutdown(p1, SHUT_WR); In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no page is put to this sge (see sg_set_page in sg_set_page), but this empty sge is queued into ingress_msg list. And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is got by sg_page(sge). Pass this NULL page to copy_page_to_iter(), which passes it to kmap_local_page() and to page_address(), then kernel panics. To solve this, we should skip this zero length skb. So in sk_msg_recvmsg(), if copy is zero, that means it's a zero length skb, skip invoking copy_page_to_iter(). We are using the EFAULT return triggered by copy_page_to_iter to check for is_fin in tcp_bpf.c. Fixes: 604326b ("bpf, sockmap: convert to generic sk_msg interface") Suggested-by: John Fastabend <[email protected]> Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/e3a16eacdc6740658ee02a33489b1b9d4912f378.1719992715.git.tanggeliang@kylinos.cn Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit 769e6a1 ] ui_browser__show() is capturing the input title that is stack allocated memory in hist_browser__run(). Avoid a use after return by strdup-ing the string. Committer notes: Further explanation from Ian Rogers: My command line using tui is: $ sudo bash -c 'rm /tmp/asan.log*; export ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a sleep 1; /tmp/perf/perf mem report' I then go to the perf annotate view and quit. This triggers the asan error (from the log file): ``` ==1254591==ERROR: AddressSanitizer: stack-use-after-return on address 0x7f2813331920 at pc 0x7f28180 65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10 READ of size 80 at 0x7f2813331920 thread T0 #0 0x7f2818065990 in __interceptor_strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461 #1 0x7f2817698251 in SLsmg_write_wrapped_string (/lib/x86_64-linux-gnu/libslang.so.2+0x98251) #2 0x7f28176984b9 in SLsmg_write_nstring (/lib/x86_64-linux-gnu/libslang.so.2+0x984b9) #3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60 #4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266 #5 0x55c94045c776 in ui_browser__show ui/browser.c:288 #6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206 #7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458 #8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412 #9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527 #10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613 #11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661 #12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671 #13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141 #14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805 #15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374 #16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516 #17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350 #18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403 #19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447 #20 0x55c9400e53ad in main tools/perf/perf.c:561 #21 0x7f28170456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360 #23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId: 84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93) Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746 This frame has 1 object(s): [32, 192) 'title' (line 747) <== Memory access at offset 32 is inside this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork ``` hist_browser__run isn't on the stack so the asan error looks legit. There's no clean init/exit on struct ui_browser so I may be trading a use-after-return for a memory leak, but that seems look a good trade anyway. Fixes: 05e8b08 ("perf ui browser: Stop using 'self'") Signed-off-by: Ian Rogers <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Ben Gainey <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Li Dong <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Oliver Upton <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sun Haiyong <[email protected]> Cc: Tim Chen <[email protected]> Cc: Yanteng Si <[email protected]> Cc: Yicong Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
When PG_hwpoison pages are freed they are treated differently in free_pages_prepare() and instead of being released they are isolated. Page allocation tag counters are decremented at this point since the page is considered not in use. Later on when such pages are released by unpoison_memory(), the allocation tag counters will be decremented again and the following warning gets reported: [ 113.930443][ T3282] ------------[ cut here ]------------ [ 113.931105][ T3282] alloc_tag was not set [ 113.931576][ T3282] WARNING: CPU: 2 PID: 3282 at ./include/linux/alloc_tag.h:130 pgalloc_tag_sub.part.66+0x154/0x164 [ 113.932866][ T3282] Modules linked in: hwpoison_inject fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_man4 [ 113.941638][ T3282] CPU: 2 UID: 0 PID: 3282 Comm: madvise11 Kdump: loaded Tainted: G W 6.11.0-rc4-dirty #18 [ 113.943003][ T3282] Tainted: [W]=WARN [ 113.943453][ T3282] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 [ 113.944378][ T3282] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 113.945319][ T3282] pc : pgalloc_tag_sub.part.66+0x154/0x164 [ 113.946016][ T3282] lr : pgalloc_tag_sub.part.66+0x154/0x164 [ 113.946706][ T3282] sp : ffff800087093a10 [ 113.947197][ T3282] x29: ffff800087093a10 x28: ffff0000d7a9d400 x27: ffff80008249f0a0 [ 113.948165][ T3282] x26: 0000000000000000 x25: ffff80008249f2b0 x24: 0000000000000000 [ 113.949134][ T3282] x23: 0000000000000001 x22: 0000000000000001 x21: 0000000000000000 [ 113.950597][ T3282] x20: ffff0000c08fcad8 x19: ffff80008251e000 x18: ffffffffffffffff [ 113.952207][ T3282] x17: 0000000000000000 x16: 0000000000000000 x15: ffff800081746210 [ 113.953161][ T3282] x14: 0000000000000000 x13: 205d323832335420 x12: 5b5d353031313339 [ 113.954120][ T3282] x11: ffff800087093500 x10: 000000000000005d x9 : 00000000ffffffd0 [ 113.955078][ T3282] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008236ba90 x6 : c0000000ffff7fff [ 113.956036][ T3282] x5 : ffff000b34bf4dc8 x4 : ffff8000820aba90 x3 : 0000000000000001 [ 113.956994][ T3282] x2 : ffff800ab320f000 x1 : 841d1e35ac932e00 x0 : 0000000000000000 [ 113.957962][ T3282] Call trace: [ 113.958350][ T3282] pgalloc_tag_sub.part.66+0x154/0x164 [ 113.959000][ T3282] pgalloc_tag_sub+0x14/0x1c [ 113.959539][ T3282] free_unref_page+0xf4/0x4b8 [ 113.960096][ T3282] __folio_put+0xd4/0x120 [ 113.960614][ T3282] folio_put+0x24/0x50 [ 113.961103][ T3282] unpoison_memory+0x4f0/0x5b0 [ 113.961678][ T3282] hwpoison_unpoison+0x30/0x48 [hwpoison_inject] [ 113.962436][ T3282] simple_attr_write_xsigned.isra.34+0xec/0x1cc [ 113.963183][ T3282] simple_attr_write+0x38/0x48 [ 113.963750][ T3282] debugfs_attr_write+0x54/0x80 [ 113.964330][ T3282] full_proxy_write+0x68/0x98 [ 113.964880][ T3282] vfs_write+0xdc/0x4d0 [ 113.965372][ T3282] ksys_write+0x78/0x100 [ 113.965875][ T3282] __arm64_sys_write+0x24/0x30 [ 113.966440][ T3282] invoke_syscall+0x7c/0x104 [ 113.966984][ T3282] el0_svc_common.constprop.1+0x88/0x104 [ 113.967652][ T3282] do_el0_svc+0x2c/0x38 [ 113.968893][ T3282] el0_svc+0x3c/0x1b8 [ 113.969379][ T3282] el0t_64_sync_handler+0x98/0xbc [ 113.969980][ T3282] el0t_64_sync+0x19c/0x1a0 [ 113.970511][ T3282] ---[ end trace 0000000000000000 ]--- To fix this, clear the page tag reference after the page got isolated and accounted for. Link: https://lkml.kernel.org/r/[email protected] Fixes: d224eb0 ("codetag: debug: mark codetags for reserved pages as empty") Signed-off-by: Hao Ge <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Acked-by: Suren Baghdasaryan <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Hao Ge <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: <[email protected]> [6.10+] Signed-off-by: Andrew Morton <[email protected]>
commit 5e9784e upstream. When PG_hwpoison pages are freed they are treated differently in free_pages_prepare() and instead of being released they are isolated. Page allocation tag counters are decremented at this point since the page is considered not in use. Later on when such pages are released by unpoison_memory(), the allocation tag counters will be decremented again and the following warning gets reported: [ 113.930443][ T3282] ------------[ cut here ]------------ [ 113.931105][ T3282] alloc_tag was not set [ 113.931576][ T3282] WARNING: CPU: 2 PID: 3282 at ./include/linux/alloc_tag.h:130 pgalloc_tag_sub.part.66+0x154/0x164 [ 113.932866][ T3282] Modules linked in: hwpoison_inject fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_man4 [ 113.941638][ T3282] CPU: 2 UID: 0 PID: 3282 Comm: madvise11 Kdump: loaded Tainted: G W 6.11.0-rc4-dirty #18 [ 113.943003][ T3282] Tainted: [W]=WARN [ 113.943453][ T3282] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 [ 113.944378][ T3282] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 113.945319][ T3282] pc : pgalloc_tag_sub.part.66+0x154/0x164 [ 113.946016][ T3282] lr : pgalloc_tag_sub.part.66+0x154/0x164 [ 113.946706][ T3282] sp : ffff800087093a10 [ 113.947197][ T3282] x29: ffff800087093a10 x28: ffff0000d7a9d400 x27: ffff80008249f0a0 [ 113.948165][ T3282] x26: 0000000000000000 x25: ffff80008249f2b0 x24: 0000000000000000 [ 113.949134][ T3282] x23: 0000000000000001 x22: 0000000000000001 x21: 0000000000000000 [ 113.950597][ T3282] x20: ffff0000c08fcad8 x19: ffff80008251e000 x18: ffffffffffffffff [ 113.952207][ T3282] x17: 0000000000000000 x16: 0000000000000000 x15: ffff800081746210 [ 113.953161][ T3282] x14: 0000000000000000 x13: 205d323832335420 x12: 5b5d353031313339 [ 113.954120][ T3282] x11: ffff800087093500 x10: 000000000000005d x9 : 00000000ffffffd0 [ 113.955078][ T3282] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008236ba90 x6 : c0000000ffff7fff [ 113.956036][ T3282] x5 : ffff000b34bf4dc8 x4 : ffff8000820aba90 x3 : 0000000000000001 [ 113.956994][ T3282] x2 : ffff800ab320f000 x1 : 841d1e35ac932e00 x0 : 0000000000000000 [ 113.957962][ T3282] Call trace: [ 113.958350][ T3282] pgalloc_tag_sub.part.66+0x154/0x164 [ 113.959000][ T3282] pgalloc_tag_sub+0x14/0x1c [ 113.959539][ T3282] free_unref_page+0xf4/0x4b8 [ 113.960096][ T3282] __folio_put+0xd4/0x120 [ 113.960614][ T3282] folio_put+0x24/0x50 [ 113.961103][ T3282] unpoison_memory+0x4f0/0x5b0 [ 113.961678][ T3282] hwpoison_unpoison+0x30/0x48 [hwpoison_inject] [ 113.962436][ T3282] simple_attr_write_xsigned.isra.34+0xec/0x1cc [ 113.963183][ T3282] simple_attr_write+0x38/0x48 [ 113.963750][ T3282] debugfs_attr_write+0x54/0x80 [ 113.964330][ T3282] full_proxy_write+0x68/0x98 [ 113.964880][ T3282] vfs_write+0xdc/0x4d0 [ 113.965372][ T3282] ksys_write+0x78/0x100 [ 113.965875][ T3282] __arm64_sys_write+0x24/0x30 [ 113.966440][ T3282] invoke_syscall+0x7c/0x104 [ 113.966984][ T3282] el0_svc_common.constprop.1+0x88/0x104 [ 113.967652][ T3282] do_el0_svc+0x2c/0x38 [ 113.968893][ T3282] el0_svc+0x3c/0x1b8 [ 113.969379][ T3282] el0t_64_sync_handler+0x98/0xbc [ 113.969980][ T3282] el0t_64_sync+0x19c/0x1a0 [ 113.970511][ T3282] ---[ end trace 0000000000000000 ]--- To fix this, clear the page tag reference after the page got isolated and accounted for. Link: https://lkml.kernel.org/r/[email protected] Fixes: d224eb0 ("codetag: debug: mark codetags for reserved pages as empty") Signed-off-by: Hao Ge <[email protected]> Reviewed-by: Miaohe Lin <[email protected]> Acked-by: Suren Baghdasaryan <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Hao Ge <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: <[email protected]> [6.10+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit c145eea ] mwifiex_get_priv_by_id() returns the priv pointer corresponding to the bss_num and bss_type, but without checking if the priv is actually currently in use. Unused priv pointers do not have a wiphy attached to them which can lead to NULL pointer dereferences further down the callstack. Fix this by returning only used priv pointers which have priv->bss_mode set to something else than NL80211_IFTYPE_UNSPECIFIED. Said NULL pointer dereference happened when an Accesspoint was started with wpa_supplicant -i mlan0 with this config: network={ ssid="somessid" mode=2 frequency=2412 key_mgmt=WPA-PSK WPA-PSK-SHA256 proto=RSN group=CCMP pairwise=CCMP psk="12345678" } When waiting for the AP to be established, interrupting wpa_supplicant with <ctrl-c> and starting it again this happens: | Unable to handle kernel NULL pointer dereference at virtual address 0000000000000140 | Mem abort info: | ESR = 0x0000000096000004 | EC = 0x25: DABT (current EL), IL = 32 bits | SET = 0, FnV = 0 | EA = 0, S1PTW = 0 | FSC = 0x04: level 0 translation fault | Data abort info: | ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 | CM = 0, WnR = 0, TnD = 0, TagAccess = 0 | GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 | user pgtable: 4k pages, 48-bit VAs, pgdp=0000000046d96000 | [0000000000000140] pgd=0000000000000000, p4d=0000000000000000 | Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP | Modules linked in: caam_jr caamhash_desc spidev caamalg_desc crypto_engine authenc libdes mwifiex_sdio +mwifiex crct10dif_ce cdc_acm onboard_usb_hub fsl_imx8_ddr_perf imx8m_ddrc rtc_ds1307 lm75 rtc_snvs +imx_sdma caam imx8mm_thermal spi_imx error imx_cpufreq_dt fuse ip_tables x_tables ipv6 | CPU: 0 PID: 8 Comm: kworker/0:1 Not tainted 6.9.0-00007-g937242013fce-dirty #18 | Hardware name: somemachine (DT) | Workqueue: events sdio_irq_work | pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : mwifiex_get_cfp+0xd8/0x15c [mwifiex] | lr : mwifiex_get_cfp+0x34/0x15c [mwifiex] | sp : ffff8000818b3a70 | x29: ffff8000818b3a70 x28: ffff000006bfd8a5 x27: 0000000000000004 | x26: 000000000000002c x25: 0000000000001511 x24: 0000000002e86bc9 | x23: ffff000006bfd996 x22: 0000000000000004 x21: ffff000007bec000 | x20: 000000000000002c x19: 0000000000000000 x18: 0000000000000000 | x17: 000000040044ffff x16: 00500072b5503510 x15: ccc283740681e517 | x14: 0201000101006d15 x13: 0000000002e8ff43 x12: 002c01000000ffb1 | x11: 0100000000000000 x10: 02e8ff43002c0100 x9 : 0000ffb100100157 | x8 : ffff000003d20000 x7 : 00000000000002f1 x6 : 00000000ffffe124 | x5 : 0000000000000001 x4 : 0000000000000003 x3 : 0000000000000000 | x2 : 0000000000000000 x1 : 0001000000011001 x0 : 0000000000000000 | Call trace: | mwifiex_get_cfp+0xd8/0x15c [mwifiex] | mwifiex_parse_single_response_buf+0x1d0/0x504 [mwifiex] | mwifiex_handle_event_ext_scan_report+0x19c/0x2f8 [mwifiex] | mwifiex_process_sta_event+0x298/0xf0c [mwifiex] | mwifiex_process_event+0x110/0x238 [mwifiex] | mwifiex_main_process+0x428/0xa44 [mwifiex] | mwifiex_sdio_interrupt+0x64/0x12c [mwifiex_sdio] | process_sdio_pending_irqs+0x64/0x1b8 | sdio_irq_work+0x4c/0x7c | process_one_work+0x148/0x2a0 | worker_thread+0x2fc/0x40c | kthread+0x110/0x114 | ret_from_fork+0x10/0x20 | Code: a94153f3 a8c37bfd d50323bf d65f03c0 (f940a000) | ---[ end trace 0000000000000000 ]--- Signed-off-by: Sascha Hauer <[email protected]> Acked-by: Brian Norris <[email protected]> Reviewed-by: Francesco Dolcini <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit c145eea ] mwifiex_get_priv_by_id() returns the priv pointer corresponding to the bss_num and bss_type, but without checking if the priv is actually currently in use. Unused priv pointers do not have a wiphy attached to them which can lead to NULL pointer dereferences further down the callstack. Fix this by returning only used priv pointers which have priv->bss_mode set to something else than NL80211_IFTYPE_UNSPECIFIED. Said NULL pointer dereference happened when an Accesspoint was started with wpa_supplicant -i mlan0 with this config: network={ ssid="somessid" mode=2 frequency=2412 key_mgmt=WPA-PSK WPA-PSK-SHA256 proto=RSN group=CCMP pairwise=CCMP psk="12345678" } When waiting for the AP to be established, interrupting wpa_supplicant with <ctrl-c> and starting it again this happens: | Unable to handle kernel NULL pointer dereference at virtual address 0000000000000140 | Mem abort info: | ESR = 0x0000000096000004 | EC = 0x25: DABT (current EL), IL = 32 bits | SET = 0, FnV = 0 | EA = 0, S1PTW = 0 | FSC = 0x04: level 0 translation fault | Data abort info: | ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 | CM = 0, WnR = 0, TnD = 0, TagAccess = 0 | GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 | user pgtable: 4k pages, 48-bit VAs, pgdp=0000000046d96000 | [0000000000000140] pgd=0000000000000000, p4d=0000000000000000 | Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP | Modules linked in: caam_jr caamhash_desc spidev caamalg_desc crypto_engine authenc libdes mwifiex_sdio +mwifiex crct10dif_ce cdc_acm onboard_usb_hub fsl_imx8_ddr_perf imx8m_ddrc rtc_ds1307 lm75 rtc_snvs +imx_sdma caam imx8mm_thermal spi_imx error imx_cpufreq_dt fuse ip_tables x_tables ipv6 | CPU: 0 PID: 8 Comm: kworker/0:1 Not tainted 6.9.0-00007-g937242013fce-dirty #18 | Hardware name: somemachine (DT) | Workqueue: events sdio_irq_work | pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : mwifiex_get_cfp+0xd8/0x15c [mwifiex] | lr : mwifiex_get_cfp+0x34/0x15c [mwifiex] | sp : ffff8000818b3a70 | x29: ffff8000818b3a70 x28: ffff000006bfd8a5 x27: 0000000000000004 | x26: 000000000000002c x25: 0000000000001511 x24: 0000000002e86bc9 | x23: ffff000006bfd996 x22: 0000000000000004 x21: ffff000007bec000 | x20: 000000000000002c x19: 0000000000000000 x18: 0000000000000000 | x17: 000000040044ffff x16: 00500072b5503510 x15: ccc283740681e517 | x14: 0201000101006d15 x13: 0000000002e8ff43 x12: 002c01000000ffb1 | x11: 0100000000000000 x10: 02e8ff43002c0100 x9 : 0000ffb100100157 | x8 : ffff000003d20000 x7 : 00000000000002f1 x6 : 00000000ffffe124 | x5 : 0000000000000001 x4 : 0000000000000003 x3 : 0000000000000000 | x2 : 0000000000000000 x1 : 0001000000011001 x0 : 0000000000000000 | Call trace: | mwifiex_get_cfp+0xd8/0x15c [mwifiex] | mwifiex_parse_single_response_buf+0x1d0/0x504 [mwifiex] | mwifiex_handle_event_ext_scan_report+0x19c/0x2f8 [mwifiex] | mwifiex_process_sta_event+0x298/0xf0c [mwifiex] | mwifiex_process_event+0x110/0x238 [mwifiex] | mwifiex_main_process+0x428/0xa44 [mwifiex] | mwifiex_sdio_interrupt+0x64/0x12c [mwifiex_sdio] | process_sdio_pending_irqs+0x64/0x1b8 | sdio_irq_work+0x4c/0x7c | process_one_work+0x148/0x2a0 | worker_thread+0x2fc/0x40c | kthread+0x110/0x114 | ret_from_fork+0x10/0x20 | Code: a94153f3 a8c37bfd d50323bf d65f03c0 (f940a000) | ---[ end trace 0000000000000000 ]--- Signed-off-by: Sascha Hauer <[email protected]> Acked-by: Brian Norris <[email protected]> Reviewed-by: Francesco Dolcini <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Sasha Levin <[email protected]>
…the Crashkernel Scenario [ Upstream commit 4058c39 ] The issue is that before entering the crash kernel, the DWC USB controller did not perform operations such as resetting the interrupt mask bits. After entering the crash kernel,before the USB interrupt handler registration was completed while loading the DWC USB driver,an GINTSTS_SOF interrupt was received.This triggered the misroute_irq process within the GIC handling framework,ultimately leading to the misrouting of the interrupt,causing it to be handled by the wrong interrupt handler and resulting in the issue. Summary:In a scenario where the kernel triggers a panic and enters the crash kernel,it is necessary to ensure that the interrupt mask bit is not enabled before the interrupt registration is complete. If an interrupt reaches the CPU at this moment,it will certainly not be handled correctly,especially in cases where this interrupt is reported frequently. Please refer to the Crashkernel dmesg information as follows (the message on line 3 was added before devm_request_irq is called by the dwc2_driver_probe function): [ 5.866837][ T1] dwc2 JMIC0010:01: supply vusb_d not found, using dummy regulator [ 5.874588][ T1] dwc2 JMIC0010:01: supply vusb_a not found, using dummy regulator [ 5.882335][ T1] dwc2 JMIC0010:01: before devm_request_irq irq: [71], gintmsk[0xf300080e], gintsts[0x04200009] [ 5.892686][ C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-jmnd1.2_RC #18 [ 5.900327][ C0] Hardware name: CMSS HyperCard4-25G/HyperCard4-25G, BIOS 1.6.4 Jul 8 2024 [ 5.908836][ C0] Call trace: [ 5.911965][ C0] dump_backtrace+0x0/0x1f0 [ 5.916308][ C0] show_stack+0x20/0x30 [ 5.920304][ C0] dump_stack+0xd8/0x140 [ 5.924387][ C0] pcie_xxx_handler+0x3c/0x1d8 [ 5.930121][ C0] __handle_irq_event_percpu+0x64/0x1e0 [ 5.935506][ C0] handle_irq_event+0x80/0x1d0 [ 5.940109][ C0] try_one_irq+0x138/0x174 [ 5.944365][ C0] misrouted_irq+0x134/0x140 [ 5.948795][ C0] note_interrupt+0x1d0/0x30c [ 5.953311][ C0] handle_irq_event+0x13c/0x1d0 [ 5.958001][ C0] handle_fasteoi_irq+0xd4/0x260 [ 5.962779][ C0] __handle_domain_irq+0x88/0xf0 [ 5.967555][ C0] gic_handle_irq+0x9c/0x2f0 [ 5.971985][ C0] el1_irq+0xb8/0x140 [ 5.975807][ C0] __setup_irq+0x3dc/0x7cc [ 5.980064][ C0] request_threaded_irq+0xf4/0x1b4 [ 5.985015][ C0] devm_request_threaded_irq+0x80/0x100 [ 5.990400][ C0] dwc2_driver_probe+0x1b8/0x6b0 [ 5.995178][ C0] platform_drv_probe+0x5c/0xb0 [ 5.999868][ C0] really_probe+0xf8/0x51c [ 6.004125][ C0] driver_probe_device+0xfc/0x170 [ 6.008989][ C0] device_driver_attach+0xc8/0xd0 [ 6.013853][ C0] __driver_attach+0xe8/0x1b0 [ 6.018369][ C0] bus_for_each_dev+0x7c/0xdc [ 6.022886][ C0] driver_attach+0x2c/0x3c [ 6.027143][ C0] bus_add_driver+0xdc/0x240 [ 6.031573][ C0] driver_register+0x80/0x13c [ 6.036090][ C0] __platform_driver_register+0x50/0x5c [ 6.041476][ C0] dwc2_platform_driver_init+0x24/0x30 [ 6.046774][ C0] do_one_initcall+0x50/0x25c [ 6.051291][ C0] do_initcall_level+0xe4/0xfc [ 6.055894][ C0] do_initcalls+0x80/0xa4 [ 6.060064][ C0] kernel_init_freeable+0x198/0x240 [ 6.065102][ C0] kernel_init+0x1c/0x12c Signed-off-by: Shawn Shao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
…the Crashkernel Scenario [ Upstream commit 4058c39 ] The issue is that before entering the crash kernel, the DWC USB controller did not perform operations such as resetting the interrupt mask bits. After entering the crash kernel,before the USB interrupt handler registration was completed while loading the DWC USB driver,an GINTSTS_SOF interrupt was received.This triggered the misroute_irq process within the GIC handling framework,ultimately leading to the misrouting of the interrupt,causing it to be handled by the wrong interrupt handler and resulting in the issue. Summary:In a scenario where the kernel triggers a panic and enters the crash kernel,it is necessary to ensure that the interrupt mask bit is not enabled before the interrupt registration is complete. If an interrupt reaches the CPU at this moment,it will certainly not be handled correctly,especially in cases where this interrupt is reported frequently. Please refer to the Crashkernel dmesg information as follows (the message on line 3 was added before devm_request_irq is called by the dwc2_driver_probe function): [ 5.866837][ T1] dwc2 JMIC0010:01: supply vusb_d not found, using dummy regulator [ 5.874588][ T1] dwc2 JMIC0010:01: supply vusb_a not found, using dummy regulator [ 5.882335][ T1] dwc2 JMIC0010:01: before devm_request_irq irq: [71], gintmsk[0xf300080e], gintsts[0x04200009] [ 5.892686][ C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-jmnd1.2_RC #18 [ 5.900327][ C0] Hardware name: CMSS HyperCard4-25G/HyperCard4-25G, BIOS 1.6.4 Jul 8 2024 [ 5.908836][ C0] Call trace: [ 5.911965][ C0] dump_backtrace+0x0/0x1f0 [ 5.916308][ C0] show_stack+0x20/0x30 [ 5.920304][ C0] dump_stack+0xd8/0x140 [ 5.924387][ C0] pcie_xxx_handler+0x3c/0x1d8 [ 5.930121][ C0] __handle_irq_event_percpu+0x64/0x1e0 [ 5.935506][ C0] handle_irq_event+0x80/0x1d0 [ 5.940109][ C0] try_one_irq+0x138/0x174 [ 5.944365][ C0] misrouted_irq+0x134/0x140 [ 5.948795][ C0] note_interrupt+0x1d0/0x30c [ 5.953311][ C0] handle_irq_event+0x13c/0x1d0 [ 5.958001][ C0] handle_fasteoi_irq+0xd4/0x260 [ 5.962779][ C0] __handle_domain_irq+0x88/0xf0 [ 5.967555][ C0] gic_handle_irq+0x9c/0x2f0 [ 5.971985][ C0] el1_irq+0xb8/0x140 [ 5.975807][ C0] __setup_irq+0x3dc/0x7cc [ 5.980064][ C0] request_threaded_irq+0xf4/0x1b4 [ 5.985015][ C0] devm_request_threaded_irq+0x80/0x100 [ 5.990400][ C0] dwc2_driver_probe+0x1b8/0x6b0 [ 5.995178][ C0] platform_drv_probe+0x5c/0xb0 [ 5.999868][ C0] really_probe+0xf8/0x51c [ 6.004125][ C0] driver_probe_device+0xfc/0x170 [ 6.008989][ C0] device_driver_attach+0xc8/0xd0 [ 6.013853][ C0] __driver_attach+0xe8/0x1b0 [ 6.018369][ C0] bus_for_each_dev+0x7c/0xdc [ 6.022886][ C0] driver_attach+0x2c/0x3c [ 6.027143][ C0] bus_add_driver+0xdc/0x240 [ 6.031573][ C0] driver_register+0x80/0x13c [ 6.036090][ C0] __platform_driver_register+0x50/0x5c [ 6.041476][ C0] dwc2_platform_driver_init+0x24/0x30 [ 6.046774][ C0] do_one_initcall+0x50/0x25c [ 6.051291][ C0] do_initcall_level+0xe4/0xfc [ 6.055894][ C0] do_initcalls+0x80/0xa4 [ 6.060064][ C0] kernel_init_freeable+0x198/0x240 [ 6.065102][ C0] kernel_init+0x1c/0x12c Signed-off-by: Shawn Shao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
During the migration of Soundwire runtime stream allocation from the Qualcomm Soundwire controller to SoC's soundcard drivers the sdm845 soundcard was forgotten. At this point any playback attempt or audio daemon startup, for instance on sdm845-db845c (Qualcomm RB3 board), will result in stream pointer NULL dereference: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101ecf000 [0000000000000020] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP Modules linked in: ... CPU: 5 UID: 0 PID: 1198 Comm: aplay Not tainted 6.12.0-rc2-qcomlt-arm64-00059-g9d78f315a362-dirty #18 Hardware name: Thundercomm Dragonboard 845c (DT) pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : sdw_stream_add_slave+0x44/0x380 [soundwire_bus] lr : sdw_stream_add_slave+0x44/0x380 [soundwire_bus] sp : ffff80008a2035c0 x29: ffff80008a2035c0 x28: ffff80008a203978 x27: 0000000000000000 x26: 00000000000000c0 x25: 0000000000000000 x24: ffff1676025f4800 x23: ffff167600ff1cb8 x22: ffff167600ff1c98 x21: 0000000000000003 x20: ffff167607316000 x19: ffff167604e64e80 x18: 0000000000000000 x17: 0000000000000000 x16: ffffcec265074160 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff167600ff1cec x5 : ffffcec22cfa2010 x4 : 0000000000000000 x3 : 0000000000000003 x2 : ffff167613f836c0 x1 : 0000000000000000 x0 : ffff16761feb60b8 Call trace: sdw_stream_add_slave+0x44/0x380 [soundwire_bus] wsa881x_hw_params+0x68/0x80 [snd_soc_wsa881x] snd_soc_dai_hw_params+0x3c/0xa4 __soc_pcm_hw_params+0x230/0x660 dpcm_be_dai_hw_params+0x1d0/0x3f8 dpcm_fe_dai_hw_params+0x98/0x268 snd_pcm_hw_params+0x124/0x460 snd_pcm_common_ioctl+0x998/0x16e8 snd_pcm_ioctl+0x34/0x58 __arm64_sys_ioctl+0xac/0xf8 invoke_syscall+0x48/0x104 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x34/0xe0 el0t_64_sync_handler+0x120/0x12c el0t_64_sync+0x190/0x194 Code: aa0403fb f9418400 9100e000 9400102f (f8420f22) ---[ end trace 0000000000000000 ]--- 0000000000006108 <sdw_stream_add_slave>: 6108: d503233f paciasp 610c: a9b97bfd stp x29, x30, [sp, #-112]! 6110: 910003fd mov x29, sp 6114: a90153f3 stp x19, x20, [sp, #16] 6118: a9025bf5 stp x21, x22, [sp, #32] 611c: aa0103f6 mov x22, x1 6120: 2a0303f5 mov w21, w3 6124: a90363f7 stp x23, x24, [sp, #48] 6128: aa0003f8 mov x24, x0 612c: aa0203f7 mov x23, x2 6130: a9046bf9 stp x25, x26, [sp, #64] 6134: aa0403f9 mov x25, x4 <-- x4 copied to x25 6138: a90573fb stp x27, x28, [sp, #80] 613c: aa0403fb mov x27, x4 6140: f9418400 ldr x0, [x0, #776] 6144: 9100e000 add x0, x0, #0x38 6148: 94000000 bl 0 <mutex_lock> 614c: f8420f22 ldr x2, [x25, #32]! <-- offset 0x44 ^^^ This is 0x6108 + offset 0x44 from the beginning of sdw_stream_add_slave() where data abort happens. wsa881x_hw_params() is called with stream = NULL and passes it further in register x4 (5th argument) to sdw_stream_add_slave() without any checks. Value from x4 is copied to x25 and finally it aborts on trying to load a value from address in x25 plus offset 32 (in dec) which corresponds to master_list member in struct sdw_stream_runtime: struct sdw_stream_runtime { const char * name; /* 0 8 */ struct sdw_stream_params params; /* 8 12 */ enum sdw_stream_state state; /* 20 4 */ enum sdw_stream_type type; /* 24 4 */ /* XXX 4 bytes hole, try to pack */ here-> struct list_head master_list; /* 32 16 */ int m_rt_count; /* 48 4 */ /* size: 56, cachelines: 1, members: 6 */ /* sum members: 48, holes: 1, sum holes: 4 */ /* padding: 4 */ /* last cacheline: 56 bytes */ Fix this by adding required calls to qcom_snd_sdw_startup() and sdw_release_stream() to startup and shutdown routines which restores the previous correct behaviour when ->set_stream() method is called to set a valid stream runtime pointer on playback startup. Reproduced and then fix was tested on db845c RB3 board. Reported-by: Dmitry Baryshkov <[email protected]> Cc: [email protected] Fixes: 15c7fab ("ASoC: qcom: Move Soundwire runtime stream alloc to soundcards") Cc: Srinivas Kandagatla <[email protected]> Cc: Dmitry Baryshkov <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Cc: Pierre-Louis Bossart <[email protected]> Signed-off-by: Alexey Klimov <[email protected]> Tested-by: Steev Klimaszewski <[email protected]> # Lenovo Yoga C630 Reviewed-by: Krzysztof Kozlowski <[email protected]> Reviewed-by: Srinivas Kandagatla <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Mark Brown <[email protected]>
Uprobe needs to fetch args into a percpu buffer, and then copy to ring buffer to avoid non-atomic context problem. Sometimes user-space strings, arrays can be very large, but the size of percpu buffer is only page size. And store_trace_args() won't check whether these data exceeds a single page or not, caused out-of-bounds memory access. It could be reproduced by following steps: 1. build kernel with CONFIG_KASAN enabled 2. save follow program as test.c ``` \#include <stdio.h> \#include <stdlib.h> \#include <string.h> // If string length large than MAX_STRING_SIZE, the fetch_store_strlen() // will return 0, cause __get_data_size() return shorter size, and // store_trace_args() will not trigger out-of-bounds access. // So make string length less than 4096. \#define STRLEN 4093 void generate_string(char *str, int n) { int i; for (i = 0; i < n; ++i) { char c = i % 26 + 'a'; str[i] = c; } str[n-1] = '\0'; } void print_string(char *str) { printf("%s\n", str); } int main() { char tmp[STRLEN]; generate_string(tmp, STRLEN); print_string(tmp); return 0; } ``` 3. compile program `gcc -o test test.c` 4. get the offset of `print_string()` ``` objdump -t test | grep -w print_string 0000000000401199 g F .text 000000000000001b print_string ``` 5. configure uprobe with offset 0x1199 ``` off=0x1199 cd /sys/kernel/debug/tracing/ echo "p /root/test:${off} arg1=+0(%di):ustring arg2=\$comm arg3=+0(%di):ustring" > uprobe_events echo 1 > events/uprobes/enable echo 1 > tracing_on ``` 6. run `test`, and kasan will report error. ================================================================== BUG: KASAN: use-after-free in strncpy_from_user+0x1d6/0x1f0 Write of size 8 at addr ffff88812311c004 by task test/499CPU: 0 UID: 0 PID: 499 Comm: test Not tainted 6.12.0-rc3+ #18 Hardware name: Red Hat KVM, BIOS 1.16.0-4.al8 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x55/0x70 print_address_description.constprop.0+0x27/0x310 kasan_report+0x10f/0x120 ? strncpy_from_user+0x1d6/0x1f0 strncpy_from_user+0x1d6/0x1f0 ? rmqueue.constprop.0+0x70d/0x2ad0 process_fetch_insn+0xb26/0x1470 ? __pfx_process_fetch_insn+0x10/0x10 ? _raw_spin_lock+0x85/0xe0 ? __pfx__raw_spin_lock+0x10/0x10 ? __pte_offset_map+0x1f/0x2d0 ? unwind_next_frame+0xc5f/0x1f80 ? arch_stack_walk+0x68/0xf0 ? is_bpf_text_address+0x23/0x30 ? kernel_text_address.part.0+0xbb/0xd0 ? __kernel_text_address+0x66/0xb0 ? unwind_get_return_address+0x5e/0xa0 ? __pfx_stack_trace_consume_entry+0x10/0x10 ? arch_stack_walk+0xa2/0xf0 ? _raw_spin_lock_irqsave+0x8b/0xf0 ? __pfx__raw_spin_lock_irqsave+0x10/0x10 ? depot_alloc_stack+0x4c/0x1f0 ? _raw_spin_unlock_irqrestore+0xe/0x30 ? stack_depot_save_flags+0x35d/0x4f0 ? kasan_save_stack+0x34/0x50 ? kasan_save_stack+0x24/0x50 ? mutex_lock+0x91/0xe0 ? __pfx_mutex_lock+0x10/0x10 prepare_uprobe_buffer.part.0+0x2cd/0x500 uprobe_dispatcher+0x2c3/0x6a0 ? __pfx_uprobe_dispatcher+0x10/0x10 ? __kasan_slab_alloc+0x4d/0x90 handler_chain+0xdd/0x3e0 handle_swbp+0x26e/0x3d0 ? __pfx_handle_swbp+0x10/0x10 ? uprobe_pre_sstep_notifier+0x151/0x1b0 irqentry_exit_to_user_mode+0xe2/0x1b0 asm_exc_int3+0x39/0x40 RIP: 0033:0x401199 Code: 01 c2 0f b6 45 fb 88 02 83 45 fc 01 8b 45 fc 3b 45 e4 7c b7 8b 45 e4 48 98 48 8d 50 ff 48 8b 45 e8 48 01 d0 ce RSP: 002b:00007ffdf00576a8 EFLAGS: 00000206 RAX: 00007ffdf00576b0 RBX: 0000000000000000 RCX: 0000000000000ff2 RDX: 0000000000000ffc RSI: 0000000000000ffd RDI: 00007ffdf00576b0 RBP: 00007ffdf00586b0 R08: 00007feb2f9c0d20 R09: 00007feb2f9c0d20 R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000401040 R13: 00007ffdf0058780 R14: 0000000000000000 R15: 0000000000000000 </TASK> This commit enforces the buffer's maxlen less than a page-size to avoid store_trace_args() out-of-memory access. Link: https://lore.kernel.org/all/[email protected]/ Fixes: dcad1a2 ("tracing/uprobes: Fetch args before reserving a ring buffer") Signed-off-by: Qiao Ma <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
[ Upstream commit 373b933 ] Uprobe needs to fetch args into a percpu buffer, and then copy to ring buffer to avoid non-atomic context problem. Sometimes user-space strings, arrays can be very large, but the size of percpu buffer is only page size. And store_trace_args() won't check whether these data exceeds a single page or not, caused out-of-bounds memory access. It could be reproduced by following steps: 1. build kernel with CONFIG_KASAN enabled 2. save follow program as test.c ``` \#include <stdio.h> \#include <stdlib.h> \#include <string.h> // If string length large than MAX_STRING_SIZE, the fetch_store_strlen() // will return 0, cause __get_data_size() return shorter size, and // store_trace_args() will not trigger out-of-bounds access. // So make string length less than 4096. \#define STRLEN 4093 void generate_string(char *str, int n) { int i; for (i = 0; i < n; ++i) { char c = i % 26 + 'a'; str[i] = c; } str[n-1] = '\0'; } void print_string(char *str) { printf("%s\n", str); } int main() { char tmp[STRLEN]; generate_string(tmp, STRLEN); print_string(tmp); return 0; } ``` 3. compile program `gcc -o test test.c` 4. get the offset of `print_string()` ``` objdump -t test | grep -w print_string 0000000000401199 g F .text 000000000000001b print_string ``` 5. configure uprobe with offset 0x1199 ``` off=0x1199 cd /sys/kernel/debug/tracing/ echo "p /root/test:${off} arg1=+0(%di):ustring arg2=\$comm arg3=+0(%di):ustring" > uprobe_events echo 1 > events/uprobes/enable echo 1 > tracing_on ``` 6. run `test`, and kasan will report error. ================================================================== BUG: KASAN: use-after-free in strncpy_from_user+0x1d6/0x1f0 Write of size 8 at addr ffff88812311c004 by task test/499CPU: 0 UID: 0 PID: 499 Comm: test Not tainted 6.12.0-rc3+ #18 Hardware name: Red Hat KVM, BIOS 1.16.0-4.al8 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x55/0x70 print_address_description.constprop.0+0x27/0x310 kasan_report+0x10f/0x120 ? strncpy_from_user+0x1d6/0x1f0 strncpy_from_user+0x1d6/0x1f0 ? rmqueue.constprop.0+0x70d/0x2ad0 process_fetch_insn+0xb26/0x1470 ? __pfx_process_fetch_insn+0x10/0x10 ? _raw_spin_lock+0x85/0xe0 ? __pfx__raw_spin_lock+0x10/0x10 ? __pte_offset_map+0x1f/0x2d0 ? unwind_next_frame+0xc5f/0x1f80 ? arch_stack_walk+0x68/0xf0 ? is_bpf_text_address+0x23/0x30 ? kernel_text_address.part.0+0xbb/0xd0 ? __kernel_text_address+0x66/0xb0 ? unwind_get_return_address+0x5e/0xa0 ? __pfx_stack_trace_consume_entry+0x10/0x10 ? arch_stack_walk+0xa2/0xf0 ? _raw_spin_lock_irqsave+0x8b/0xf0 ? __pfx__raw_spin_lock_irqsave+0x10/0x10 ? depot_alloc_stack+0x4c/0x1f0 ? _raw_spin_unlock_irqrestore+0xe/0x30 ? stack_depot_save_flags+0x35d/0x4f0 ? kasan_save_stack+0x34/0x50 ? kasan_save_stack+0x24/0x50 ? mutex_lock+0x91/0xe0 ? __pfx_mutex_lock+0x10/0x10 prepare_uprobe_buffer.part.0+0x2cd/0x500 uprobe_dispatcher+0x2c3/0x6a0 ? __pfx_uprobe_dispatcher+0x10/0x10 ? __kasan_slab_alloc+0x4d/0x90 handler_chain+0xdd/0x3e0 handle_swbp+0x26e/0x3d0 ? __pfx_handle_swbp+0x10/0x10 ? uprobe_pre_sstep_notifier+0x151/0x1b0 irqentry_exit_to_user_mode+0xe2/0x1b0 asm_exc_int3+0x39/0x40 RIP: 0033:0x401199 Code: 01 c2 0f b6 45 fb 88 02 83 45 fc 01 8b 45 fc 3b 45 e4 7c b7 8b 45 e4 48 98 48 8d 50 ff 48 8b 45 e8 48 01 d0 ce RSP: 002b:00007ffdf00576a8 EFLAGS: 00000206 RAX: 00007ffdf00576b0 RBX: 0000000000000000 RCX: 0000000000000ff2 RDX: 0000000000000ffc RSI: 0000000000000ffd RDI: 00007ffdf00576b0 RBP: 00007ffdf00586b0 R08: 00007feb2f9c0d20 R09: 00007feb2f9c0d20 R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000401040 R13: 00007ffdf0058780 R14: 0000000000000000 R15: 0000000000000000 </TASK> This commit enforces the buffer's maxlen less than a page-size to avoid store_trace_args() out-of-memory access. Link: https://lore.kernel.org/all/[email protected]/ Fixes: dcad1a2 ("tracing/uprobes: Fetch args before reserving a ring buffer") Signed-off-by: Qiao Ma <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
commit d0e806b upstream. During the migration of Soundwire runtime stream allocation from the Qualcomm Soundwire controller to SoC's soundcard drivers the sdm845 soundcard was forgotten. At this point any playback attempt or audio daemon startup, for instance on sdm845-db845c (Qualcomm RB3 board), will result in stream pointer NULL dereference: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101ecf000 [0000000000000020] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP Modules linked in: ... CPU: 5 UID: 0 PID: 1198 Comm: aplay Not tainted 6.12.0-rc2-qcomlt-arm64-00059-g9d78f315a362-dirty #18 Hardware name: Thundercomm Dragonboard 845c (DT) pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : sdw_stream_add_slave+0x44/0x380 [soundwire_bus] lr : sdw_stream_add_slave+0x44/0x380 [soundwire_bus] sp : ffff80008a2035c0 x29: ffff80008a2035c0 x28: ffff80008a203978 x27: 0000000000000000 x26: 00000000000000c0 x25: 0000000000000000 x24: ffff1676025f4800 x23: ffff167600ff1cb8 x22: ffff167600ff1c98 x21: 0000000000000003 x20: ffff167607316000 x19: ffff167604e64e80 x18: 0000000000000000 x17: 0000000000000000 x16: ffffcec265074160 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff167600ff1cec x5 : ffffcec22cfa2010 x4 : 0000000000000000 x3 : 0000000000000003 x2 : ffff167613f836c0 x1 : 0000000000000000 x0 : ffff16761feb60b8 Call trace: sdw_stream_add_slave+0x44/0x380 [soundwire_bus] wsa881x_hw_params+0x68/0x80 [snd_soc_wsa881x] snd_soc_dai_hw_params+0x3c/0xa4 __soc_pcm_hw_params+0x230/0x660 dpcm_be_dai_hw_params+0x1d0/0x3f8 dpcm_fe_dai_hw_params+0x98/0x268 snd_pcm_hw_params+0x124/0x460 snd_pcm_common_ioctl+0x998/0x16e8 snd_pcm_ioctl+0x34/0x58 __arm64_sys_ioctl+0xac/0xf8 invoke_syscall+0x48/0x104 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x34/0xe0 el0t_64_sync_handler+0x120/0x12c el0t_64_sync+0x190/0x194 Code: aa0403fb f9418400 9100e000 9400102f (f8420f22) ---[ end trace 0000000000000000 ]--- 0000000000006108 <sdw_stream_add_slave>: 6108: d503233f paciasp 610c: a9b97bfd stp x29, x30, [sp, #-112]! 6110: 910003fd mov x29, sp 6114: a90153f3 stp x19, x20, [sp, #16] 6118: a9025bf5 stp x21, x22, [sp, #32] 611c: aa0103f6 mov x22, x1 6120: 2a0303f5 mov w21, w3 6124: a90363f7 stp x23, x24, [sp, #48] 6128: aa0003f8 mov x24, x0 612c: aa0203f7 mov x23, x2 6130: a9046bf9 stp x25, x26, [sp, #64] 6134: aa0403f9 mov x25, x4 <-- x4 copied to x25 6138: a90573fb stp x27, x28, [sp, #80] 613c: aa0403fb mov x27, x4 6140: f9418400 ldr x0, [x0, #776] 6144: 9100e000 add x0, x0, #0x38 6148: 94000000 bl 0 <mutex_lock> 614c: f8420f22 ldr x2, [x25, #32]! <-- offset 0x44 ^^^ This is 0x6108 + offset 0x44 from the beginning of sdw_stream_add_slave() where data abort happens. wsa881x_hw_params() is called with stream = NULL and passes it further in register x4 (5th argument) to sdw_stream_add_slave() without any checks. Value from x4 is copied to x25 and finally it aborts on trying to load a value from address in x25 plus offset 32 (in dec) which corresponds to master_list member in struct sdw_stream_runtime: struct sdw_stream_runtime { const char * name; /* 0 8 */ struct sdw_stream_params params; /* 8 12 */ enum sdw_stream_state state; /* 20 4 */ enum sdw_stream_type type; /* 24 4 */ /* XXX 4 bytes hole, try to pack */ here-> struct list_head master_list; /* 32 16 */ int m_rt_count; /* 48 4 */ /* size: 56, cachelines: 1, members: 6 */ /* sum members: 48, holes: 1, sum holes: 4 */ /* padding: 4 */ /* last cacheline: 56 bytes */ Fix this by adding required calls to qcom_snd_sdw_startup() and sdw_release_stream() to startup and shutdown routines which restores the previous correct behaviour when ->set_stream() method is called to set a valid stream runtime pointer on playback startup. Reproduced and then fix was tested on db845c RB3 board. Reported-by: Dmitry Baryshkov <[email protected]> Cc: [email protected] Fixes: 15c7fab ("ASoC: qcom: Move Soundwire runtime stream alloc to soundcards") Cc: Srinivas Kandagatla <[email protected]> Cc: Dmitry Baryshkov <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Cc: Pierre-Louis Bossart <[email protected]> Signed-off-by: Alexey Klimov <[email protected]> Tested-by: Steev Klimaszewski <[email protected]> # Lenovo Yoga C630 Reviewed-by: Krzysztof Kozlowski <[email protected]> Reviewed-by: Srinivas Kandagatla <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit 373b933 ] Uprobe needs to fetch args into a percpu buffer, and then copy to ring buffer to avoid non-atomic context problem. Sometimes user-space strings, arrays can be very large, but the size of percpu buffer is only page size. And store_trace_args() won't check whether these data exceeds a single page or not, caused out-of-bounds memory access. It could be reproduced by following steps: 1. build kernel with CONFIG_KASAN enabled 2. save follow program as test.c ``` \#include <stdio.h> \#include <stdlib.h> \#include <string.h> // If string length large than MAX_STRING_SIZE, the fetch_store_strlen() // will return 0, cause __get_data_size() return shorter size, and // store_trace_args() will not trigger out-of-bounds access. // So make string length less than 4096. \#define STRLEN 4093 void generate_string(char *str, int n) { int i; for (i = 0; i < n; ++i) { char c = i % 26 + 'a'; str[i] = c; } str[n-1] = '\0'; } void print_string(char *str) { printf("%s\n", str); } int main() { char tmp[STRLEN]; generate_string(tmp, STRLEN); print_string(tmp); return 0; } ``` 3. compile program `gcc -o test test.c` 4. get the offset of `print_string()` ``` objdump -t test | grep -w print_string 0000000000401199 g F .text 000000000000001b print_string ``` 5. configure uprobe with offset 0x1199 ``` off=0x1199 cd /sys/kernel/debug/tracing/ echo "p /root/test:${off} arg1=+0(%di):ustring arg2=\$comm arg3=+0(%di):ustring" > uprobe_events echo 1 > events/uprobes/enable echo 1 > tracing_on ``` 6. run `test`, and kasan will report error. ================================================================== BUG: KASAN: use-after-free in strncpy_from_user+0x1d6/0x1f0 Write of size 8 at addr ffff88812311c004 by task test/499CPU: 0 UID: 0 PID: 499 Comm: test Not tainted 6.12.0-rc3+ #18 Hardware name: Red Hat KVM, BIOS 1.16.0-4.al8 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x55/0x70 print_address_description.constprop.0+0x27/0x310 kasan_report+0x10f/0x120 ? strncpy_from_user+0x1d6/0x1f0 strncpy_from_user+0x1d6/0x1f0 ? rmqueue.constprop.0+0x70d/0x2ad0 process_fetch_insn+0xb26/0x1470 ? __pfx_process_fetch_insn+0x10/0x10 ? _raw_spin_lock+0x85/0xe0 ? __pfx__raw_spin_lock+0x10/0x10 ? __pte_offset_map+0x1f/0x2d0 ? unwind_next_frame+0xc5f/0x1f80 ? arch_stack_walk+0x68/0xf0 ? is_bpf_text_address+0x23/0x30 ? kernel_text_address.part.0+0xbb/0xd0 ? __kernel_text_address+0x66/0xb0 ? unwind_get_return_address+0x5e/0xa0 ? __pfx_stack_trace_consume_entry+0x10/0x10 ? arch_stack_walk+0xa2/0xf0 ? _raw_spin_lock_irqsave+0x8b/0xf0 ? __pfx__raw_spin_lock_irqsave+0x10/0x10 ? depot_alloc_stack+0x4c/0x1f0 ? _raw_spin_unlock_irqrestore+0xe/0x30 ? stack_depot_save_flags+0x35d/0x4f0 ? kasan_save_stack+0x34/0x50 ? kasan_save_stack+0x24/0x50 ? mutex_lock+0x91/0xe0 ? __pfx_mutex_lock+0x10/0x10 prepare_uprobe_buffer.part.0+0x2cd/0x500 uprobe_dispatcher+0x2c3/0x6a0 ? __pfx_uprobe_dispatcher+0x10/0x10 ? __kasan_slab_alloc+0x4d/0x90 handler_chain+0xdd/0x3e0 handle_swbp+0x26e/0x3d0 ? __pfx_handle_swbp+0x10/0x10 ? uprobe_pre_sstep_notifier+0x151/0x1b0 irqentry_exit_to_user_mode+0xe2/0x1b0 asm_exc_int3+0x39/0x40 RIP: 0033:0x401199 Code: 01 c2 0f b6 45 fb 88 02 83 45 fc 01 8b 45 fc 3b 45 e4 7c b7 8b 45 e4 48 98 48 8d 50 ff 48 8b 45 e8 48 01 d0 ce RSP: 002b:00007ffdf00576a8 EFLAGS: 00000206 RAX: 00007ffdf00576b0 RBX: 0000000000000000 RCX: 0000000000000ff2 RDX: 0000000000000ffc RSI: 0000000000000ffd RDI: 00007ffdf00576b0 RBP: 00007ffdf00586b0 R08: 00007feb2f9c0d20 R09: 00007feb2f9c0d20 R10: 0000000000000001 R11: 0000000000000202 R12: 0000000000401040 R13: 00007ffdf0058780 R14: 0000000000000000 R15: 0000000000000000 </TASK> This commit enforces the buffer's maxlen less than a page-size to avoid store_trace_args() out-of-memory access. Link: https://lore.kernel.org/all/[email protected]/ Fixes: dcad1a2 ("tracing/uprobes: Fetch args before reserving a ring buffer") Signed-off-by: Qiao Ma <[email protected]> Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
commit 011e69a upstream. In binder_add_freeze_work() we iterate over the proc->nodes with the proc->inner_lock held. However, this lock is temporarily dropped to acquire the node->lock first (lock nesting order). This can race with binder_deferred_release() which removes the nodes from the proc->nodes rbtree and adds them into binder_dead_nodes list. This leads to a broken iteration in binder_add_freeze_work() as rb_next() will use data from binder_dead_nodes, triggering an out-of-bounds access: ================================================================== BUG: KASAN: global-out-of-bounds in rb_next+0xfc/0x124 Read of size 8 at addr ffffcb84285f7170 by task freeze/660 CPU: 8 UID: 0 PID: 660 Comm: freeze Not tainted 6.11.0-07343-ga727812a8d45 #18 Hardware name: linux,dummy-virt (DT) Call trace: rb_next+0xfc/0x124 binder_add_freeze_work+0x344/0x534 binder_ioctl+0x1e70/0x25ac __arm64_sys_ioctl+0x124/0x190 The buggy address belongs to the variable: binder_dead_nodes+0x10/0x40 [...] ================================================================== This is possible because proc->nodes (rbtree) and binder_dead_nodes (list) share entries in binder_node through a union: struct binder_node { [...] union { struct rb_node rb_node; struct hlist_node dead_node; }; Fix the race by checking that the proc is still alive. If not, simply break out of the iteration. Fixes: d579b04 ("binder: frozen notification") Cc: [email protected] Reviewed-by: Alice Ryhl <[email protected]> Acked-by: Todd Kjos <[email protected]> Signed-off-by: Carlos Llamas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Could the bootloader set up the PL011 UART before it hands over to the kernel? Currently I have code in bootc/linux@a91fe13 that sets up the serial port during the kernel decompression stage so that it can be used during decompression and then during early printk.
It would really be ideal if that was done outside the kernel as is more customary, and would mean you could boot compressed or uncompressed kernels and get the same behaviour.
The text was updated successfully, but these errors were encountered: