forked from grate-driver/linux
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM: dts: nexus4: add volume keys #7
Merged
okias
merged 1 commit into
okias:qcom-apq8064-mainline
from
devbis:qcom-apq8064-mainline-mako-keypad
Sep 10, 2021
Merged
ARM: dts: nexus4: add volume keys #7
okias
merged 1 commit into
okias:qcom-apq8064-mainline
from
devbis:qcom-apq8064-mainline-mako-keypad
Sep 10, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add qcom,pm8921-keypad based keypad to support vol_up and vol_down keys Signed-off-by: Ivan Belokobylskiy <[email protected]>
okias
pushed a commit
that referenced
this pull request
Sep 8, 2021
Sven Eckelmann reports [1] that the addition of local_lock to kmem_cache_cpu breaks a config with 64BIT+LOCK_STAT: general protection fault, maybe for address 0xffff888007fcf1c8: 0000 [#1] NOPTI CPU: 0 PID: 0 Comm: swapper Not tainted 5.14.0-rc5+ #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 RIP: 0010:kmem_cache_alloc+0x81/0x180 Code: 79 48 00 4c 8b 41 38 0f 84 89 00 00 00 4d 85 c0 0f 84 80 00 00 00 41 8b 44 24 28 49 8b 3c 24 48 8d 4a 01 49 8b 1c 00 4c 89 c0 <48> 0f c7 4f 38 0f 943 RSP: 0000:ffffffff81803c10 EFLAGS: 00000286 RAX: ffff88800244e7c0 RBX: ffff88800244e800 RCX: 0000000000000024 RDX: 0000000000000023 RSI: 0000000000000100 RDI: ffff888007fcf190 RBP: ffffffff81803c38 R08: ffff88800244e7c0 R09: 0000000000000dc0 R10: 0000000000004000 R11: 0000000000000000 R12: ffff8880024413c0 R13: ffffffff810d18f4 R14: 0000000000000dc0 R15: 0000000000000100 FS: 0000000000000000(0000) GS:ffffffff81840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff888002001000 CR3: 0000000001824000 CR4: 00000000000006b0 Call Trace: __get_vm_area_node.constprop.0.isra.0+0x74/0x150 __vmalloc_node_range+0x5a/0x2b0 ? kernel_clone+0x88/0x390 ? copy_process+0x1ac/0x17e0 copy_process+0x768/0x17e0 ? kernel_clone+0x88/0x390 kernel_clone+0x88/0x390 ? _vm_unmap_aliases.part.0+0xe9/0x110 ? change_page_attr_set_clr+0x10d/0x180 kernel_thread+0x43/0x50 ? rest_init+0x100/0x100 rest_init+0x1e/0x100 arch_call_rest_init+0x9/0xc start_kernel+0x481/0x493 x86_64_start_reservations+0x24/0x26 x86_64_start_kernel+0x80/0x84 secondary_startup_64_no_verify+0xc2/0xcb random: get_random_bytes called from oops_exit+0x34/0x60 with crng_init=0 ---[ end trace 2cac18ac38f640c1 ]--- RIP: 0010:kmem_cache_alloc+0x81/0x180 Code: 79 48 00 4c 8b 41 38 0f 84 89 00 00 00 4d 85 c0 0f 84 80 00 00 00 41 8b 44 24 28 49 8b 3c 24 48 8d 4a 01 49 8b 1c 00 4c 89 c0 <48> 0f c7 4f 38 0f 943 RSP: 0000:ffffffff81803c10 EFLAGS: 00000286 RAX: ffff88800244e7c0 RBX: ffff88800244e800 RCX: 0000000000000024 RDX: 0000000000000023 RSI: 0000000000000100 RDI: ffff888007fcf190 RBP: ffffffff81803c38 R08: ffff88800244e7c0 R09: 0000000000000dc0 R10: 0000000000004000 R11: 0000000000000000 R12: ffff8880024413c0 R13: ffffffff810d18f4 R14: 0000000000000dc0 R15: 0000000000000100 FS: 0000000000000000(0000) GS:ffffffff81840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff888002001000 CR3: 0000000001824000 CR4: 00000000000006b0 Kernel panic - not syncing: Attempted to kill the idle task! ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- Decoding the RIP points to this_cpu_cmpxchg_double() call in slab_alloc_node(). The problem is the particular size of local_lock_t with LOCK_STAT resulting in the following layout: struct kmem_cache_cpu { local_lock_t lock; /* 0 56 */ void * * freelist; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ long unsigned int tid; /* 64 8 */ struct page * page; /* 72 8 */ struct page * partial; /* 80 8 */ /* size: 88, cachelines: 2, members: 5 */ /* last cacheline: 24 bytes */ }; As pointed out by Sebastian Andrzej Siewior, this_cpu_cmpxchg_double() needs the freelist and tid fields to be aligned to sum of their sizes (16 bytes) but they are not in this configuration. This didn't happen with non-debug RT and !RT configs as well as lockdep. To fix this, move the lock field below partial field, so that it doesn't affect the layout. [1] https://lore.kernel.org/linux-mm/2666777.vCjUEy5FO1@sven-desktop/ This is a fixup for mmotm patch mm-slub-convert-kmem_cpu_slab-protection-to-local_lock.patch Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Reported-by: Sven Eckelmann <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]>
okias
pushed a commit
that referenced
this pull request
Sep 16, 2021
System crash was seen when I/O was run against an NVMe target and aborts were occurring. Crash stack is: -- relevant crash stack -- BUG: kernel NULL pointer dereference, address: 0000000000000010 : #6 [ffffae1f8666bdd0] page_fault at ffffffffa740122e [exception RIP: qla_nvme_abort_work+339] RIP: ffffffffc0f592e3 RSP: ffffae1f8666be80 RFLAGS: 00010297 RAX: 0000000000000000 RBX: ffff9b581fc8af80 RCX: ffffffffc0f83bd0 RDX: 0000000000000001 RSI: ffff9b5839c6c7c8 RDI: 0000000008000000 RBP: ffff9b6832f85000 R8: ffffffffc0f68160 R9: ffffffffc0f70652 R10: ffffae1f862ffdc8 R11: 0000000000000300 R12: 000000000000010d R13: 0000000000000000 R14: ffff9b5839cea000 R15: 0ffff9b583fab170 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffae1f8666be98] process_one_work at ffffffffa6aba184 #8 [ffffae1f8666bed8] worker_thread at ffffffffa6aba39d grate-driver#9 [ffffae1f8666bf10] kthread at ffffffffa6ac06ed The crash was due to a stale SRB structure access after it was aborted. Fix the issue by removing stale access. Link: https://lore.kernel.org/r/[email protected] Fixes: 2cabf10 ("scsi: qla2xxx: Fix hang on NVMe command timeouts") Cc: [email protected] Reviewed-by: Himanshu Madhani <[email protected]> Signed-off-by: Arun Easi <[email protected]> Signed-off-by: Nilesh Javali <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
okias
pushed a commit
that referenced
this pull request
Sep 16, 2021
Ido Schimmel says: ==================== mlxsw: Add support for transceiver modules reset This patchset prepares mlxsw for future transceiver modules related [1] changes and adds reset support via the existing 'ETHTOOL_RESET' interface. Patches #1-#6 are relatively straightforward preparations. Patch #7 tracks the number of logical ports that are mapped to the transceiver module and the number of logical ports using it that are administratively up. Needed for both reset support and power mode policy support. Patches #8-grate-driver#9 add required fields in device registers. Patch grate-driver#10 implements support for ethtool_ops::reset in order to reset transceiver modules. [1] https://lore.kernel.org/netdev/[email protected]/ ==================== Signed-off-by: David S. Miller <[email protected]>
okias
pushed a commit
that referenced
this pull request
Sep 20, 2021
It's later supposed to be either a correct address or NULL. Without the initialization, it may contain an undefined value which results in the following segmentation fault: # perf top --sort comm -g --ignore-callees=do_idle terminates with: #0 0x00007ffff56b7685 in __strlen_avx2 () from /lib64/libc.so.6 #1 0x00007ffff55e3802 in strdup () from /lib64/libc.so.6 #2 0x00005555558cb139 in hist_entry__init (callchain_size=<optimized out>, sample_self=true, template=0x7fffde7fb110, he=0x7fffd801c250) at util/hist.c:489 #3 hist_entry__new (template=template@entry=0x7fffde7fb110, sample_self=sample_self@entry=true) at util/hist.c:564 #4 0x00005555558cb4ba in hists__findnew_entry (hists=hists@entry=0x5555561d9e38, entry=entry@entry=0x7fffde7fb110, al=al@entry=0x7fffde7fb420, sample_self=sample_self@entry=true) at util/hist.c:657 #5 0x00005555558cba1b in __hists__add_entry (hists=hists@entry=0x5555561d9e38, al=0x7fffde7fb420, sym_parent=<optimized out>, bi=bi@entry=0x0, mi=mi@entry=0x0, sample=sample@entry=0x7fffde7fb4b0, sample_self=true, ops=0x0, block_info=0x0) at util/hist.c:288 #6 0x00005555558cbb70 in hists__add_entry (sample_self=true, sample=0x7fffde7fb4b0, mi=0x0, bi=0x0, sym_parent=<optimized out>, al=<optimized out>, hists=0x5555561d9e38) at util/hist.c:1056 #7 iter_add_single_cumulative_entry (iter=0x7fffde7fb460, al=<optimized out>) at util/hist.c:1056 #8 0x00005555558cc8a4 in hist_entry_iter__add (iter=iter@entry=0x7fffde7fb460, al=al@entry=0x7fffde7fb420, max_stack_depth=<optimized out>, arg=arg@entry=0x7fffffff7db0) at util/hist.c:1231 grate-driver#9 0x00005555557cdc9a in perf_event__process_sample (machine=<optimized out>, sample=0x7fffde7fb4b0, evsel=<optimized out>, event=<optimized out>, tool=0x7fffffff7db0) at builtin-top.c:842 grate-driver#10 deliver_event (qe=<optimized out>, qevent=<optimized out>) at builtin-top.c:1202 grate-driver#11 0x00005555558a9318 in do_flush (show_progress=false, oe=0x7fffffff80e0) at util/ordered-events.c:244 grate-driver#12 __ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP, timestamp=timestamp@entry=0) at util/ordered-events.c:323 grate-driver#13 0x00005555558a9789 in __ordered_events__flush (timestamp=<optimized out>, how=<optimized out>, oe=<optimized out>) at util/ordered-events.c:339 grate-driver#14 ordered_events__flush (how=OE_FLUSH__TOP, oe=0x7fffffff80e0) at util/ordered-events.c:341 grate-driver#15 ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP) at util/ordered-events.c:339 grate-driver#16 0x00005555557cd631 in process_thread (arg=0x7fffffff7db0) at builtin-top.c:1114 grate-driver#17 0x00007ffff7bb817a in start_thread () from /lib64/libpthread.so.0 grate-driver#18 0x00007ffff5656dc3 in clone () from /lib64/libc.so.6 If you look at the frame #2, the code is: 488 if (he->srcline) { 489 he->srcline = strdup(he->srcline); 490 if (he->srcline == NULL) 491 goto err_rawdata; 492 } If he->srcline is not NULL (it is not NULL if it is uninitialized rubbish), it gets strdupped and strdupping a rubbish random string causes the problem. Also, if you look at the commit 1fb7d06, it adds the srcline property into the struct, but not initializing it everywhere needed. Committer notes: Now I see, when using --ignore-callees=do_idle we end up here at line 2189 in add_callchain_ip(): 2181 if (al.sym != NULL) { 2182 if (perf_hpp_list.parent && !*parent && 2183 symbol__match_regex(al.sym, &parent_regex)) 2184 *parent = al.sym; 2185 else if (have_ignore_callees && root_al && 2186 symbol__match_regex(al.sym, &ignore_callees_regex)) { 2187 /* Treat this symbol as the root, 2188 forgetting its callees. */ 2189 *root_al = al; 2190 callchain_cursor_reset(cursor); 2191 } 2192 } And the al that doesn't have the ->srcline field initialized will be copied to the root_al, so then, back to: 1211 int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al, 1212 int max_stack_depth, void *arg) 1213 { 1214 int err, err2; 1215 struct map *alm = NULL; 1216 1217 if (al) 1218 alm = map__get(al->map); 1219 1220 err = sample__resolve_callchain(iter->sample, &callchain_cursor, &iter->parent, 1221 iter->evsel, al, max_stack_depth); 1222 if (err) { 1223 map__put(alm); 1224 return err; 1225 } 1226 1227 err = iter->ops->prepare_entry(iter, al); 1228 if (err) 1229 goto out; 1230 1231 err = iter->ops->add_single_entry(iter, al); 1232 if (err) 1233 goto out; 1234 That al at line 1221 is what hist_entry_iter__add() (called from sample__resolve_callchain()) saw as 'root_al', and then: iter->ops->add_single_entry(iter, al); will go on with al->srcline with a bogus value, I'll add the above sequence to the cset and apply, thanks! Signed-off-by: Michael Petlan <[email protected]> CC: Milian Wolff <[email protected]> Cc: Jiri Olsa <[email protected]> Fixes: 1fb7d06 ("perf report Use srcline from callchain for hist entries") Link: https //lore.kernel.org/r/[email protected] Reported-by: Juri Lelli <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
okias
pushed a commit
that referenced
this pull request
Sep 20, 2021
FD uses xyarray__entry that may return NULL if an index is out of bounds. If NULL is returned then a segv happens as FD unconditionally dereferences the pointer. This was happening in a case of with perf iostat as shown below. The fix is to make FD an "int*" rather than an int and handle the NULL case as either invalid input or a closed fd. $ sudo gdb --args perf stat --iostat list ... Breakpoint 1, perf_evsel__alloc_fd (evsel=0x5555560951a0, ncpus=1, nthreads=1) at evsel.c:50 50 { (gdb) bt #0 perf_evsel__alloc_fd (evsel=0x5555560951a0, ncpus=1, nthreads=1) at evsel.c:50 #1 0x000055555585c188 in evsel__open_cpu (evsel=0x5555560951a0, cpus=0x555556093410, threads=0x555556086fb0, start_cpu=0, end_cpu=1) at util/evsel.c:1792 #2 0x000055555585cfb2 in evsel__open (evsel=0x5555560951a0, cpus=0x0, threads=0x555556086fb0) at util/evsel.c:2045 #3 0x000055555585d0db in evsel__open_per_thread (evsel=0x5555560951a0, threads=0x555556086fb0) at util/evsel.c:2065 #4 0x00005555558ece64 in create_perf_stat_counter (evsel=0x5555560951a0, config=0x555555c34700 <stat_config>, target=0x555555c2f1c0 <target>, cpu=0) at util/stat.c:590 #5 0x000055555578e927 in __run_perf_stat (argc=1, argv=0x7fffffffe4a0, run_idx=0) at builtin-stat.c:833 #6 0x000055555578f3c6 in run_perf_stat (argc=1, argv=0x7fffffffe4a0, run_idx=0) at builtin-stat.c:1048 #7 0x0000555555792ee5 in cmd_stat (argc=1, argv=0x7fffffffe4a0) at builtin-stat.c:2534 #8 0x0000555555835ed3 in run_builtin (p=0x555555c3f540 <commands+288>, argc=3, argv=0x7fffffffe4a0) at perf.c:313 grate-driver#9 0x0000555555836154 in handle_internal_command (argc=3, argv=0x7fffffffe4a0) at perf.c:365 grate-driver#10 0x000055555583629f in run_argv (argcp=0x7fffffffe2ec, argv=0x7fffffffe2e0) at perf.c:409 grate-driver#11 0x0000555555836692 in main (argc=3, argv=0x7fffffffe4a0) at perf.c:539 ... (gdb) c Continuing. Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (uncore_iio_0/event=0x83,umask=0x04,ch_mask=0xF,fc_mask=0x07/). /bin/dmesg | grep -i perf may provide additional information. Program received signal SIGSEGV, Segmentation fault. 0x00005555559b03ea in perf_evsel__close_fd_cpu (evsel=0x5555560951a0, cpu=1) at evsel.c:166 166 if (FD(evsel, cpu, thread) >= 0) v3. fixes a bug in perf_evsel__run_ioctl where the sense of a branch was backward. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
okias
pushed a commit
that referenced
this pull request
Sep 27, 2021
Host crashes when pci_enable_atomic_ops_to_root() is called for VFs with virtual buses. The virtual buses added to SR-IOV have bus->self set to NULL and host crashes due to this. PID: 4481 TASK: ffff89c6941b0000 CPU: 53 COMMAND: "bash" ... #3 [ffff9a9481713808] oops_end at ffffffffb9025cd6 #4 [ffff9a9481713828] page_fault_oops at ffffffffb906e417 #5 [ffff9a9481713888] exc_page_fault at ffffffffb9a0ad14 #6 [ffff9a94817138b0] asm_exc_page_fault at ffffffffb9c00ace [exception RIP: pcie_capability_read_dword+28] RIP: ffffffffb952fd5c RSP: ffff9a9481713960 RFLAGS: 00010246 RAX: 0000000000000001 RBX: ffff89c6b1096000 RCX: 0000000000000000 RDX: ffff9a9481713990 RSI: 0000000000000024 RDI: 0000000000000000 RBP: 0000000000000080 R8: 0000000000000008 R9: ffff89c64341a2f8 R10: 0000000000000002 R11: 0000000000000000 R12: ffff89c648bab000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff89c648bab0c8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff9a9481713988] pci_enable_atomic_ops_to_root at ffffffffb95359a6 #8 [ffff9a94817139c0] bnxt_qplib_determine_atomics at ffffffffc08c1a33 [bnxt_re] grate-driver#9 [ffff9a94817139d0] bnxt_re_dev_init at ffffffffc08ba2d1 [bnxt_re] Per PCIe r5.0, sec 9.3.5.10, the AtomicOp Requester Enable bit in Device Control 2 is reserved for VFs. The PF value applies to all associated PFs. Return -EINVAL if pci_enable_atomic_ops_to_root() is called for a VF. Link: https://lore.kernel.org/r/[email protected] Fixes: 35f5ace ("RDMA/bnxt_re: Enable global atomic ops if platform supports") Fixes: 430a236 ("PCI: Add pci_enable_atomic_ops_to_root()") Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Andy Gospodarek <[email protected]>
okias
pushed a commit
that referenced
this pull request
Oct 5, 2021
As a full union is always sent, ensure all bytes of the union are initialized with memset to avoid msan warnings of use of uninitialized memory. An example warning from the daemon test: Uninitialized bytes in __interceptor_write at offset 71 inside [0x7ffd98da6280, 72) ==11602==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x5597edccdbe4 in ion tools/lib/perf/lib.c:18:6 #1 0x5597edccdbe4 in writen tools/lib/perf/lib.c:47:9 #2 0x5597ed221d30 in send_cmd tools/perf/builtin-daemon.c:1376:22 #3 0x5597ed21b48c in cmd_daemon tools/perf/builtin-daemon.c #4 0x5597ed1d6b67 in run_builtin tools/perf/perf.c:313:11 #5 0x5597ed1d6036 in handle_internal_command tools/perf/perf.c:365:8 #6 0x5597ed1d6036 in run_argv tools/perf/perf.c:409:2 #7 0x5597ed1d6036 in main tools/perf/perf.c:539:3 SUMMARY: MemorySanitizer: use-of-uninitialized-value tools/lib/perf/lib.c:18:6 in ion Exiting Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
okias
pushed a commit
that referenced
this pull request
Oct 8, 2021
After removing /dev/kmem, sanitizing /proc/kcore and handling /dev/mem, this series tackles the last sane way how a VM could accidentially access logically unplugged memory managed by a virtio-mem device: /proc/vmcore When dumping memory via "makedumpfile", PG_offline pages, used by virtio-mem to flag logically unplugged memory, are already properly excluded; however, especially when accessing/copying /proc/vmcore "the usual way", we can still end up reading logically unplugged memory part of a virtio-mem device. Patch #1-#3 are cleanups. Patch #4 extends the existing oldmem_pfn_is_ram mechanism. Patch #5-#7 are virtio-mem refactorings for patch #8, which implements the virtio-mem logic to query the state of device blocks. Patch #8: " Although virtio-mem currently supports reading unplugged memory in the hypervisor, this will change in the future, indicated to the device via a new feature flag. We similarly sanitized /proc/kcore access recently. [...] Distributions that support virtio-mem+kdump have to make sure that the virtio_mem module will be part of the kdump kernel or the kdump initrd; dracut was recently [2] extended to include virtio-mem in the generated initrd. As long as no special kdump kernels are used, this will automatically make sure that virtio-mem will be around in the kdump initrd and sanitize /proc/vmcore access -- with dracut. " This is the last remaining bit to support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE [3] in the Linux implementation of virtio-mem. Note: this is best-effort. We'll never be able to control what runs inside the second kernel, really, but we also don't have to care: we only care about sane setups where we don't want our VM getting zapped once we touch the wrong memory location while dumping. While we usually expect sane setups to use "makedumfile", nothing really speaks against just copying /proc/vmcore, especially in environments where HWpoisioning isn't typically expected. Also, we really don't want to put all our trust completely on the memmap, so sanitizing also makes sense when just using "makedumpfile". [1] https://lkml.kernel.org/r/[email protected] [2] dracutdevs/dracut#1157 [3] https://lists.oasis-open.org/archives/virtio-comment/202109/msg00021.html This patch (of 9): The callback is only used for the vmcore nowadays. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: Dave Young <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]>
okias
pushed a commit
that referenced
this pull request
Oct 12, 2021
After removing /dev/kmem, sanitizing /proc/kcore and handling /dev/mem, this series tackles the last sane way how a VM could accidentially access logically unplugged memory managed by a virtio-mem device: /proc/vmcore When dumping memory via "makedumpfile", PG_offline pages, used by virtio-mem to flag logically unplugged memory, are already properly excluded; however, especially when accessing/copying /proc/vmcore "the usual way", we can still end up reading logically unplugged memory part of a virtio-mem device. Patch #1-#3 are cleanups. Patch #4 extends the existing oldmem_pfn_is_ram mechanism. Patch #5-#7 are virtio-mem refactorings for patch #8, which implements the virtio-mem logic to query the state of device blocks. Patch #8: " Although virtio-mem currently supports reading unplugged memory in the hypervisor, this will change in the future, indicated to the device via a new feature flag. We similarly sanitized /proc/kcore access recently. [...] Distributions that support virtio-mem+kdump have to make sure that the virtio_mem module will be part of the kdump kernel or the kdump initrd; dracut was recently [2] extended to include virtio-mem in the generated initrd. As long as no special kdump kernels are used, this will automatically make sure that virtio-mem will be around in the kdump initrd and sanitize /proc/vmcore access -- with dracut. " This is the last remaining bit to support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE [3] in the Linux implementation of virtio-mem. Note: this is best-effort. We'll never be able to control what runs inside the second kernel, really, but we also don't have to care: we only care about sane setups where we don't want our VM getting zapped once we touch the wrong memory location while dumping. While we usually expect sane setups to use "makedumfile", nothing really speaks against just copying /proc/vmcore, especially in environments where HWpoisioning isn't typically expected. Also, we really don't want to put all our trust completely on the memmap, so sanitizing also makes sense when just using "makedumpfile". [1] https://lkml.kernel.org/r/[email protected] [2] dracutdevs/dracut#1157 [3] https://lists.oasis-open.org/archives/virtio-comment/202109/msg00021.html This patch (of 9): The callback is only used for the vmcore nowadays. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: Dave Young <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]>
okias
pushed a commit
that referenced
this pull request
Oct 23, 2021
Attempting to defragment a Btrfs file containing a transparent huge page immediately deadlocks with the following stack trace: #0 context_switch (kernel/sched/core.c:4940:2) #1 __schedule (kernel/sched/core.c:6287:8) #2 schedule (kernel/sched/core.c:6366:3) #3 io_schedule (kernel/sched/core.c:8389:2) #4 wait_on_page_bit_common (mm/filemap.c:1356:4) #5 __lock_page (mm/filemap.c:1648:2) #6 lock_page (./include/linux/pagemap.h:625:3) #7 pagecache_get_page (mm/filemap.c:1910:4) #8 find_or_create_page (./include/linux/pagemap.h:420:9) grate-driver#9 defrag_prepare_one_page (fs/btrfs/ioctl.c:1068:9) grate-driver#10 defrag_one_range (fs/btrfs/ioctl.c:1326:14) grate-driver#11 defrag_one_cluster (fs/btrfs/ioctl.c:1421:9) grate-driver#12 btrfs_defrag_file (fs/btrfs/ioctl.c:1523:9) grate-driver#13 btrfs_ioctl_defrag (fs/btrfs/ioctl.c:3117:9) grate-driver#14 btrfs_ioctl (fs/btrfs/ioctl.c:4872:10) grate-driver#15 vfs_ioctl (fs/ioctl.c:51:10) grate-driver#16 __do_sys_ioctl (fs/ioctl.c:874:11) grate-driver#17 __se_sys_ioctl (fs/ioctl.c:860:1) grate-driver#18 __x64_sys_ioctl (fs/ioctl.c:860:1) grate-driver#19 do_syscall_x64 (arch/x86/entry/common.c:50:14) grate-driver#20 do_syscall_64 (arch/x86/entry/common.c:80:7) grate-driver#21 entry_SYSCALL_64+0x7c/0x15b (arch/x86/entry/entry_64.S:113) A huge page is represented by a compound page, which consists of a struct page for each PAGE_SIZE page within the huge page. The first struct page is the "head page", and the remaining are "tail pages". Defragmentation attempts to lock each page in the range. However, lock_page() on a tail page actually locks the corresponding head page. So, if defragmentation tries to lock more than one struct page in a compound page, it tries to lock the same head page twice and deadlocks with itself. Ideally, we should be able to defragment transparent huge pages. However, THP for filesystems is currently read-only, so a lot of code is not ready to use huge pages for I/O. For now, let's just return ETXTBUSY. This can be reproduced with the following on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y: $ cat create_thp_file.c #include <fcntl.h> #include <stdbool.h> #include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> static const char zeroes[1024 * 1024]; static const size_t FILE_SIZE = 2 * 1024 * 1024; int main(int argc, char **argv) { if (argc != 2) { fprintf(stderr, "usage: %s PATH\n", argv[0]); return EXIT_FAILURE; } int fd = creat(argv[1], 0777); if (fd == -1) { perror("creat"); return EXIT_FAILURE; } size_t written = 0; while (written < FILE_SIZE) { ssize_t ret = write(fd, zeroes, sizeof(zeroes) < FILE_SIZE - written ? sizeof(zeroes) : FILE_SIZE - written); if (ret < 0) { perror("write"); return EXIT_FAILURE; } written += ret; } close(fd); fd = open(argv[1], O_RDONLY); if (fd == -1) { perror("open"); return EXIT_FAILURE; } /* * Reserve some address space so that we can align the file mapping to * the huge page size. */ void *placeholder_map = mmap(NULL, FILE_SIZE * 2, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (placeholder_map == MAP_FAILED) { perror("mmap (placeholder)"); return EXIT_FAILURE; } void *aligned_address = (void *)(((uintptr_t)placeholder_map + FILE_SIZE - 1) & ~(FILE_SIZE - 1)); void *map = mmap(aligned_address, FILE_SIZE, PROT_READ | PROT_EXEC, MAP_SHARED | MAP_FIXED, fd, 0); if (map == MAP_FAILED) { perror("mmap"); return EXIT_FAILURE; } if (madvise(map, FILE_SIZE, MADV_HUGEPAGE) < 0) { perror("madvise"); return EXIT_FAILURE; } char *line = NULL; size_t line_capacity = 0; FILE *smaps_file = fopen("/proc/self/smaps", "r"); if (!smaps_file) { perror("fopen"); return EXIT_FAILURE; } for (;;) { for (size_t off = 0; off < FILE_SIZE; off += 4096) ((volatile char *)map)[off]; ssize_t ret; bool this_mapping = false; while ((ret = getline(&line, &line_capacity, smaps_file)) > 0) { unsigned long start, end, huge; if (sscanf(line, "%lx-%lx", &start, &end) == 2) { this_mapping = (start <= (uintptr_t)map && (uintptr_t)map < end); } else if (this_mapping && sscanf(line, "FilePmdMapped: %ld", &huge) == 1 && huge > 0) { return EXIT_SUCCESS; } } sleep(6); rewind(smaps_file); fflush(smaps_file); } } $ ./create_thp_file huge $ btrfs fi defrag -czstd ./huge Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
okias
pushed a commit
that referenced
this pull request
Oct 23, 2021
After removing /dev/kmem, sanitizing /proc/kcore and handling /dev/mem, this series tackles the last sane way how a VM could accidentially access logically unplugged memory managed by a virtio-mem device: /proc/vmcore When dumping memory via "makedumpfile", PG_offline pages, used by virtio-mem to flag logically unplugged memory, are already properly excluded; however, especially when accessing/copying /proc/vmcore "the usual way", we can still end up reading logically unplugged memory part of a virtio-mem device. Patch #1-#3 are cleanups. Patch #4 extends the existing oldmem_pfn_is_ram mechanism. Patch #5-#7 are virtio-mem refactorings for patch #8, which implements the virtio-mem logic to query the state of device blocks. Patch #8: " Although virtio-mem currently supports reading unplugged memory in the hypervisor, this will change in the future, indicated to the device via a new feature flag. We similarly sanitized /proc/kcore access recently. [...] Distributions that support virtio-mem+kdump have to make sure that the virtio_mem module will be part of the kdump kernel or the kdump initrd; dracut was recently [2] extended to include virtio-mem in the generated initrd. As long as no special kdump kernels are used, this will automatically make sure that virtio-mem will be around in the kdump initrd and sanitize /proc/vmcore access -- with dracut. " This is the last remaining bit to support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE [3] in the Linux implementation of virtio-mem. Note: this is best-effort. We'll never be able to control what runs inside the second kernel, really, but we also don't have to care: we only care about sane setups where we don't want our VM getting zapped once we touch the wrong memory location while dumping. While we usually expect sane setups to use "makedumfile", nothing really speaks against just copying /proc/vmcore, especially in environments where HWpoisioning isn't typically expected. Also, we really don't want to put all our trust completely on the memmap, so sanitizing also makes sense when just using "makedumpfile". [1] https://lkml.kernel.org/r/[email protected] [2] dracutdevs/dracut#1157 [3] https://lists.oasis-open.org/archives/virtio-comment/202109/msg00021.html This patch (of 9): The callback is only used for the vmcore nowadays. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: Dave Young <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]>
okias
pushed a commit
that referenced
this pull request
Oct 25, 2021
After removing /dev/kmem, sanitizing /proc/kcore and handling /dev/mem, this series tackles the last sane way how a VM could accidentially access logically unplugged memory managed by a virtio-mem device: /proc/vmcore When dumping memory via "makedumpfile", PG_offline pages, used by virtio-mem to flag logically unplugged memory, are already properly excluded; however, especially when accessing/copying /proc/vmcore "the usual way", we can still end up reading logically unplugged memory part of a virtio-mem device. Patch #1-#3 are cleanups. Patch #4 extends the existing oldmem_pfn_is_ram mechanism. Patch #5-#7 are virtio-mem refactorings for patch #8, which implements the virtio-mem logic to query the state of device blocks. Patch #8: " Although virtio-mem currently supports reading unplugged memory in the hypervisor, this will change in the future, indicated to the device via a new feature flag. We similarly sanitized /proc/kcore access recently. [...] Distributions that support virtio-mem+kdump have to make sure that the virtio_mem module will be part of the kdump kernel or the kdump initrd; dracut was recently [2] extended to include virtio-mem in the generated initrd. As long as no special kdump kernels are used, this will automatically make sure that virtio-mem will be around in the kdump initrd and sanitize /proc/vmcore access -- with dracut. " This is the last remaining bit to support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE [3] in the Linux implementation of virtio-mem. Note: this is best-effort. We'll never be able to control what runs inside the second kernel, really, but we also don't have to care: we only care about sane setups where we don't want our VM getting zapped once we touch the wrong memory location while dumping. While we usually expect sane setups to use "makedumfile", nothing really speaks against just copying /proc/vmcore, especially in environments where HWpoisioning isn't typically expected. Also, we really don't want to put all our trust completely on the memmap, so sanitizing also makes sense when just using "makedumpfile". [1] https://lkml.kernel.org/r/[email protected] [2] dracutdevs/dracut#1157 [3] https://lists.oasis-open.org/archives/virtio-comment/202109/msg00021.html This patch (of 9): The callback is only used for the vmcore nowadays. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: Dave Young <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]>
okias
pushed a commit
that referenced
this pull request
Oct 27, 2021
Attempting to defragment a Btrfs file containing a transparent huge page immediately deadlocks with the following stack trace: #0 context_switch (kernel/sched/core.c:4940:2) #1 __schedule (kernel/sched/core.c:6287:8) #2 schedule (kernel/sched/core.c:6366:3) #3 io_schedule (kernel/sched/core.c:8389:2) #4 wait_on_page_bit_common (mm/filemap.c:1356:4) #5 __lock_page (mm/filemap.c:1648:2) #6 lock_page (./include/linux/pagemap.h:625:3) #7 pagecache_get_page (mm/filemap.c:1910:4) #8 find_or_create_page (./include/linux/pagemap.h:420:9) grate-driver#9 defrag_prepare_one_page (fs/btrfs/ioctl.c:1068:9) grate-driver#10 defrag_one_range (fs/btrfs/ioctl.c:1326:14) grate-driver#11 defrag_one_cluster (fs/btrfs/ioctl.c:1421:9) grate-driver#12 btrfs_defrag_file (fs/btrfs/ioctl.c:1523:9) grate-driver#13 btrfs_ioctl_defrag (fs/btrfs/ioctl.c:3117:9) grate-driver#14 btrfs_ioctl (fs/btrfs/ioctl.c:4872:10) grate-driver#15 vfs_ioctl (fs/ioctl.c:51:10) grate-driver#16 __do_sys_ioctl (fs/ioctl.c:874:11) grate-driver#17 __se_sys_ioctl (fs/ioctl.c:860:1) grate-driver#18 __x64_sys_ioctl (fs/ioctl.c:860:1) grate-driver#19 do_syscall_x64 (arch/x86/entry/common.c:50:14) grate-driver#20 do_syscall_64 (arch/x86/entry/common.c:80:7) grate-driver#21 entry_SYSCALL_64+0x7c/0x15b (arch/x86/entry/entry_64.S:113) A huge page is represented by a compound page, which consists of a struct page for each PAGE_SIZE page within the huge page. The first struct page is the "head page", and the remaining are "tail pages". Defragmentation attempts to lock each page in the range. However, lock_page() on a tail page actually locks the corresponding head page. So, if defragmentation tries to lock more than one struct page in a compound page, it tries to lock the same head page twice and deadlocks with itself. Ideally, we should be able to defragment transparent huge pages. However, THP for filesystems is currently read-only, so a lot of code is not ready to use huge pages for I/O. For now, let's just return ETXTBUSY. This can be reproduced with the following on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y: $ cat create_thp_file.c #include <fcntl.h> #include <stdbool.h> #include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> static const char zeroes[1024 * 1024]; static const size_t FILE_SIZE = 2 * 1024 * 1024; int main(int argc, char **argv) { if (argc != 2) { fprintf(stderr, "usage: %s PATH\n", argv[0]); return EXIT_FAILURE; } int fd = creat(argv[1], 0777); if (fd == -1) { perror("creat"); return EXIT_FAILURE; } size_t written = 0; while (written < FILE_SIZE) { ssize_t ret = write(fd, zeroes, sizeof(zeroes) < FILE_SIZE - written ? sizeof(zeroes) : FILE_SIZE - written); if (ret < 0) { perror("write"); return EXIT_FAILURE; } written += ret; } close(fd); fd = open(argv[1], O_RDONLY); if (fd == -1) { perror("open"); return EXIT_FAILURE; } /* * Reserve some address space so that we can align the file mapping to * the huge page size. */ void *placeholder_map = mmap(NULL, FILE_SIZE * 2, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (placeholder_map == MAP_FAILED) { perror("mmap (placeholder)"); return EXIT_FAILURE; } void *aligned_address = (void *)(((uintptr_t)placeholder_map + FILE_SIZE - 1) & ~(FILE_SIZE - 1)); void *map = mmap(aligned_address, FILE_SIZE, PROT_READ | PROT_EXEC, MAP_SHARED | MAP_FIXED, fd, 0); if (map == MAP_FAILED) { perror("mmap"); return EXIT_FAILURE; } if (madvise(map, FILE_SIZE, MADV_HUGEPAGE) < 0) { perror("madvise"); return EXIT_FAILURE; } char *line = NULL; size_t line_capacity = 0; FILE *smaps_file = fopen("/proc/self/smaps", "r"); if (!smaps_file) { perror("fopen"); return EXIT_FAILURE; } for (;;) { for (size_t off = 0; off < FILE_SIZE; off += 4096) ((volatile char *)map)[off]; ssize_t ret; bool this_mapping = false; while ((ret = getline(&line, &line_capacity, smaps_file)) > 0) { unsigned long start, end, huge; if (sscanf(line, "%lx-%lx", &start, &end) == 2) { this_mapping = (start <= (uintptr_t)map && (uintptr_t)map < end); } else if (this_mapping && sscanf(line, "FilePmdMapped: %ld", &huge) == 1 && huge > 0) { return EXIT_SUCCESS; } } sleep(6); rewind(smaps_file); fflush(smaps_file); } } $ ./create_thp_file huge $ btrfs fi defrag -czstd ./huge Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
okias
pushed a commit
that referenced
this pull request
Oct 29, 2021
After removing /dev/kmem, sanitizing /proc/kcore and handling /dev/mem, this series tackles the last sane way how a VM could accidentially access logically unplugged memory managed by a virtio-mem device: /proc/vmcore When dumping memory via "makedumpfile", PG_offline pages, used by virtio-mem to flag logically unplugged memory, are already properly excluded; however, especially when accessing/copying /proc/vmcore "the usual way", we can still end up reading logically unplugged memory part of a virtio-mem device. Patch #1-#3 are cleanups. Patch #4 extends the existing oldmem_pfn_is_ram mechanism. Patch #5-#7 are virtio-mem refactorings for patch #8, which implements the virtio-mem logic to query the state of device blocks. Patch #8: " Although virtio-mem currently supports reading unplugged memory in the hypervisor, this will change in the future, indicated to the device via a new feature flag. We similarly sanitized /proc/kcore access recently. [...] Distributions that support virtio-mem+kdump have to make sure that the virtio_mem module will be part of the kdump kernel or the kdump initrd; dracut was recently [2] extended to include virtio-mem in the generated initrd. As long as no special kdump kernels are used, this will automatically make sure that virtio-mem will be around in the kdump initrd and sanitize /proc/vmcore access -- with dracut. " This is the last remaining bit to support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE [3] in the Linux implementation of virtio-mem. Note: this is best-effort. We'll never be able to control what runs inside the second kernel, really, but we also don't have to care: we only care about sane setups where we don't want our VM getting zapped once we touch the wrong memory location while dumping. While we usually expect sane setups to use "makedumfile", nothing really speaks against just copying /proc/vmcore, especially in environments where HWpoisioning isn't typically expected. Also, we really don't want to put all our trust completely on the memmap, so sanitizing also makes sense when just using "makedumpfile". [1] https://lkml.kernel.org/r/[email protected] [2] dracutdevs/dracut#1157 [3] https://lists.oasis-open.org/archives/virtio-comment/202109/msg00021.html This patch (of 9): The callback is only used for the vmcore nowadays. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: Dave Young <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]>
okias
pushed a commit
that referenced
this pull request
Oct 29, 2021
After commit 9298e63 ("bpf/tests: Add exhaustive tests of ALU operand magnitudes"), when modprobe test_bpf.ko with JIT on mips64, there exists segment fault due to the following reason: [...] ALU64_MOV_X: all register value magnitudes jited:1 Break instruction in kernel code[#1] [...] It seems that the related JIT implementations of some test cases in test_bpf() have problems. At this moment, I do not care about the segment fault while I just want to verify the test cases of tail calls. Based on the above background and motivation, add the following module parameter test_suite to the test_bpf.ko: test_suite=<string>: only the specified test suite will be run, the string can be "test_bpf", "test_tail_calls" or "test_skb_segment". If test_suite is not specified, but test_id, test_name or test_range is specified, set 'test_bpf' as the default test suite. This is useful to only test the corresponding test suite when specifying the valid test_suite string. Any invalid test suite will result in -EINVAL being returned and no tests being run. If the test_suite is not specified or specified as empty string, it does not change the current logic, all of the test cases will be run. Here are some test results: # dmesg -c # modprobe test_bpf # dmesg | grep Summary test_bpf: Summary: 1009 PASSED, 0 FAILED, [0/997 JIT'ed] test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [0/8 JIT'ed] test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED # rmmod test_bpf # dmesg -c # modprobe test_bpf test_suite=test_bpf # dmesg | tail -1 test_bpf: Summary: 1009 PASSED, 0 FAILED, [0/997 JIT'ed] # rmmod test_bpf # dmesg -c # modprobe test_bpf test_suite=test_tail_calls # dmesg test_bpf: #0 Tail call leaf jited:0 21 PASS [...] test_bpf: #7 Tail call error path, index out of range jited:0 32 PASS test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [0/8 JIT'ed] # rmmod test_bpf # dmesg -c # modprobe test_bpf test_suite=test_skb_segment # dmesg test_bpf: #0 gso_with_rx_frags PASS test_bpf: #1 gso_linear_no_head_frag PASS test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED # rmmod test_bpf # dmesg -c # modprobe test_bpf test_id=1 # dmesg test_bpf: test_bpf: set 'test_bpf' as the default test_suite. test_bpf: #1 TXA jited:0 54 51 50 PASS test_bpf: Summary: 1 PASSED, 0 FAILED, [0/1 JIT'ed] # rmmod test_bpf # dmesg -c # modprobe test_bpf test_suite=test_bpf test_name=TXA # dmesg test_bpf: #1 TXA jited:0 54 50 51 PASS test_bpf: Summary: 1 PASSED, 0 FAILED, [0/1 JIT'ed] # rmmod test_bpf # dmesg -c # modprobe test_bpf test_suite=test_tail_calls test_range=6,7 # dmesg test_bpf: #6 Tail call error path, NULL target jited:0 41 PASS test_bpf: #7 Tail call error path, index out of range jited:0 32 PASS test_bpf: test_tail_calls: Summary: 2 PASSED, 0 FAILED, [0/2 JIT'ed] # rmmod test_bpf # dmesg -c # modprobe test_bpf test_suite=test_skb_segment test_id=1 # dmesg test_bpf: #1 gso_linear_no_head_frag PASS test_bpf: test_skb_segment: Summary: 1 PASSED, 0 FAILED By the way, the above segment fault has been fixed in the latest bpf-next tree which contains the mips64 JIT rework. Signed-off-by: Tiezhu Yang <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Tested-by: Johan Almbladh <[email protected]> Acked-by: Johan Almbladh <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
okias
pushed a commit
that referenced
this pull request
Nov 1, 2021
After removing /dev/kmem, sanitizing /proc/kcore and handling /dev/mem, this series tackles the last sane way how a VM could accidentially access logically unplugged memory managed by a virtio-mem device: /proc/vmcore When dumping memory via "makedumpfile", PG_offline pages, used by virtio-mem to flag logically unplugged memory, are already properly excluded; however, especially when accessing/copying /proc/vmcore "the usual way", we can still end up reading logically unplugged memory part of a virtio-mem device. Patch #1-#3 are cleanups. Patch #4 extends the existing oldmem_pfn_is_ram mechanism. Patch #5-#7 are virtio-mem refactorings for patch #8, which implements the virtio-mem logic to query the state of device blocks. Patch #8: " Although virtio-mem currently supports reading unplugged memory in the hypervisor, this will change in the future, indicated to the device via a new feature flag. We similarly sanitized /proc/kcore access recently. [...] Distributions that support virtio-mem+kdump have to make sure that the virtio_mem module will be part of the kdump kernel or the kdump initrd; dracut was recently [2] extended to include virtio-mem in the generated initrd. As long as no special kdump kernels are used, this will automatically make sure that virtio-mem will be around in the kdump initrd and sanitize /proc/vmcore access -- with dracut. " This is the last remaining bit to support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE [3] in the Linux implementation of virtio-mem. Note: this is best-effort. We'll never be able to control what runs inside the second kernel, really, but we also don't have to care: we only care about sane setups where we don't want our VM getting zapped once we touch the wrong memory location while dumping. While we usually expect sane setups to use "makedumfile", nothing really speaks against just copying /proc/vmcore, especially in environments where HWpoisioning isn't typically expected. Also, we really don't want to put all our trust completely on the memmap, so sanitizing also makes sense when just using "makedumpfile". [1] https://lkml.kernel.org/r/[email protected] [2] dracutdevs/dracut#1157 [3] https://lists.oasis-open.org/archives/virtio-comment/202109/msg00021.html This patch (of 9): The callback is only used for the vmcore nowadays. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: Dave Young <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]>
okias
pushed a commit
that referenced
this pull request
Nov 7, 2021
Host crashes when pci_enable_atomic_ops_to_root() is called for VFs with virtual buses. The virtual buses added to SR-IOV have bus->self set to NULL and host crashes due to this. PID: 4481 TASK: ffff89c6941b0000 CPU: 53 COMMAND: "bash" ... #3 [ffff9a9481713808] oops_end at ffffffffb9025cd6 #4 [ffff9a9481713828] page_fault_oops at ffffffffb906e417 #5 [ffff9a9481713888] exc_page_fault at ffffffffb9a0ad14 #6 [ffff9a94817138b0] asm_exc_page_fault at ffffffffb9c00ace [exception RIP: pcie_capability_read_dword+28] RIP: ffffffffb952fd5c RSP: ffff9a9481713960 RFLAGS: 00010246 RAX: 0000000000000001 RBX: ffff89c6b1096000 RCX: 0000000000000000 RDX: ffff9a9481713990 RSI: 0000000000000024 RDI: 0000000000000000 RBP: 0000000000000080 R8: 0000000000000008 R9: ffff89c64341a2f8 R10: 0000000000000002 R11: 0000000000000000 R12: ffff89c648bab000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff89c648bab0c8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff9a9481713988] pci_enable_atomic_ops_to_root at ffffffffb95359a6 #8 [ffff9a94817139c0] bnxt_qplib_determine_atomics at ffffffffc08c1a33 [bnxt_re] grate-driver#9 [ffff9a94817139d0] bnxt_re_dev_init at ffffffffc08ba2d1 [bnxt_re] Per PCIe r5.0, sec 9.3.5.10, the AtomicOp Requester Enable bit in Device Control 2 is reserved for VFs. The PF value applies to all associated VFs. Return -EINVAL if pci_enable_atomic_ops_to_root() is called for a VF. Link: https://lore.kernel.org/r/[email protected] Fixes: 35f5ace ("RDMA/bnxt_re: Enable global atomic ops if platform supports") Fixes: 430a236 ("PCI: Add pci_enable_atomic_ops_to_root()") Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Andy Gospodarek <[email protected]>
okias
pushed a commit
that referenced
this pull request
Nov 9, 2021
After removing /dev/kmem, sanitizing /proc/kcore and handling /dev/mem, this series tackles the last sane way how a VM could accidentially access logically unplugged memory managed by a virtio-mem device: /proc/vmcore When dumping memory via "makedumpfile", PG_offline pages, used by virtio-mem to flag logically unplugged memory, are already properly excluded; however, especially when accessing/copying /proc/vmcore "the usual way", we can still end up reading logically unplugged memory part of a virtio-mem device. Patch #1-#3 are cleanups. Patch #4 extends the existing oldmem_pfn_is_ram mechanism. Patch #5-#7 are virtio-mem refactorings for patch #8, which implements the virtio-mem logic to query the state of device blocks. Patch #8: " Although virtio-mem currently supports reading unplugged memory in the hypervisor, this will change in the future, indicated to the device via a new feature flag. We similarly sanitized /proc/kcore access recently. [...] Distributions that support virtio-mem+kdump have to make sure that the virtio_mem module will be part of the kdump kernel or the kdump initrd; dracut was recently [2] extended to include virtio-mem in the generated initrd. As long as no special kdump kernels are used, this will automatically make sure that virtio-mem will be around in the kdump initrd and sanitize /proc/vmcore access -- with dracut. " This is the last remaining bit to support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE [3] in the Linux implementation of virtio-mem. Note: this is best-effort. We'll never be able to control what runs inside the second kernel, really, but we also don't have to care: we only care about sane setups where we don't want our VM getting zapped once we touch the wrong memory location while dumping. While we usually expect sane setups to use "makedumfile", nothing really speaks against just copying /proc/vmcore, especially in environments where HWpoisioning isn't typically expected. Also, we really don't want to put all our trust completely on the memmap, so sanitizing also makes sense when just using "makedumpfile". [1] https://lkml.kernel.org/r/[email protected] [2] dracutdevs/dracut#1157 [3] https://lists.oasis-open.org/archives/virtio-comment/202109/msg00021.html This patch (of 9): The callback is only used for the vmcore nowadays. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: Dave Young <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]>
okias
pushed a commit
that referenced
this pull request
Nov 10, 2021
After removing /dev/kmem, sanitizing /proc/kcore and handling /dev/mem, this series tackles the last sane way how a VM could accidentially access logically unplugged memory managed by a virtio-mem device: /proc/vmcore When dumping memory via "makedumpfile", PG_offline pages, used by virtio-mem to flag logically unplugged memory, are already properly excluded; however, especially when accessing/copying /proc/vmcore "the usual way", we can still end up reading logically unplugged memory part of a virtio-mem device. Patch #1-#3 are cleanups. Patch #4 extends the existing oldmem_pfn_is_ram mechanism. Patch #5-#7 are virtio-mem refactorings for patch #8, which implements the virtio-mem logic to query the state of device blocks. Patch #8: "Although virtio-mem currently supports reading unplugged memory in the hypervisor, this will change in the future, indicated to the device via a new feature flag. We similarly sanitized /proc/kcore access recently. [...] Distributions that support virtio-mem+kdump have to make sure that the virtio_mem module will be part of the kdump kernel or the kdump initrd; dracut was recently [2] extended to include virtio-mem in the generated initrd. As long as no special kdump kernels are used, this will automatically make sure that virtio-mem will be around in the kdump initrd and sanitize /proc/vmcore access -- with dracut" This is the last remaining bit to support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE [3] in the Linux implementation of virtio-mem. Note: this is best-effort. We'll never be able to control what runs inside the second kernel, really, but we also don't have to care: we only care about sane setups where we don't want our VM getting zapped once we touch the wrong memory location while dumping. While we usually expect sane setups to use "makedumfile", nothing really speaks against just copying /proc/vmcore, especially in environments where HWpoisioning isn't typically expected. Also, we really don't want to put all our trust completely on the memmap, so sanitizing also makes sense when just using "makedumpfile". [1] https://lkml.kernel.org/r/[email protected] [2] dracutdevs/dracut#1157 [3] https://lists.oasis-open.org/archives/virtio-comment/202109/msg00021.html This patch (of 9): The callback is only used for the vmcore nowadays. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: Dave Young <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
okias
pushed a commit
that referenced
this pull request
Nov 23, 2021
When IPv6 module gets initialized, but it's hitting an error in inet6_init() where it then needs to undo all the prior initialization work, it also might do a call to ndisc_cleanup() which then calls neigh_table_clear(). In there is a missing timer cancellation of the table's managed_work item. The kernel test robot explicitly triggered this error path and caused a UAF crash similar to the below: [...] [ 28.833183][ C0] BUG: unable to handle page fault for address: f7a43288 [ 28.833973][ C0] #PF: supervisor write access in kernel mode [ 28.834660][ C0] #PF: error_code(0x0002) - not-present page [ 28.835319][ C0] *pde = 06b2c067 *pte = 00000000 [ 28.835853][ C0] Oops: 0002 [#1] PREEMPT [ 28.836367][ C0] CPU: 0 PID: 303 Comm: sed Not tainted 5.16.0-rc1-00233-g83ff5faa0d3b #7 [ 28.837293][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1 04/01/2014 [ 28.838338][ C0] EIP: __run_timers.constprop.0+0x82/0x440 [...] [ 28.845607][ C0] Call Trace: [ 28.845942][ C0] <SOFTIRQ> [ 28.846333][ C0] ? check_preemption_disabled.isra.0+0x2a/0x80 [ 28.846975][ C0] ? __this_cpu_preempt_check+0x8/0xa [ 28.847570][ C0] run_timer_softirq+0xd/0x40 [ 28.848050][ C0] __do_softirq+0xf5/0x576 [ 28.848547][ C0] ? __softirqentry_text_start+0x10/0x10 [ 28.849127][ C0] do_softirq_own_stack+0x2b/0x40 [ 28.849749][ C0] </SOFTIRQ> [ 28.850087][ C0] irq_exit_rcu+0x7d/0xc0 [ 28.850587][ C0] common_interrupt+0x2a/0x40 [ 28.851068][ C0] asm_common_interrupt+0x119/0x120 [...] Note that IPv6 module cannot be unloaded as per 8ce4406 ("ipv6: do not allow ipv6 module to be removed") hence this can only be seen during module initialization error. Tested with kernel test robot's reproducer. Fixes: 7482e38 ("net, neigh: Add NTF_MANAGED flag for managed neighbor entries") Reported-by: kernel test robot <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Cc: Li Zhijian <[email protected]> Signed-off-by: David S. Miller <[email protected]>
okias
pushed a commit
that referenced
this pull request
Jul 31, 2023
[ Upstream commit 99d4850 ] Found by leak sanitizer: ``` ==1632594==ERROR: LeakSanitizer: detected memory leaks Direct leak of 21 byte(s) in 1 object(s) allocated from: #0 0x7f2953a7077b in __interceptor_strdup ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439 #1 0x556701d6fbbf in perf_env__read_cpuid util/env.c:369 #2 0x556701d70589 in perf_env__cpuid util/env.c:465 #3 0x55670204bba2 in x86__is_amd_cpu arch/x86/util/env.c:14 #4 0x5567020487a2 in arch__post_evsel_config arch/x86/util/evsel.c:83 #5 0x556701d8f78b in evsel__config util/evsel.c:1366 #6 0x556701ef5872 in evlist__config util/record.c:108 #7 0x556701cd6bcd in test__PERF_RECORD tests/perf-record.c:112 #8 0x556701cacd07 in run_test tests/builtin-test.c:236 grate-driver#9 0x556701cacfac in test_and_print tests/builtin-test.c:265 grate-driver#10 0x556701cadddb in __cmd_test tests/builtin-test.c:402 grate-driver#11 0x556701caf2aa in cmd_test tests/builtin-test.c:559 grate-driver#12 0x556701d3b557 in run_builtin tools/perf/perf.c:323 grate-driver#13 0x556701d3bac8 in handle_internal_command tools/perf/perf.c:377 grate-driver#14 0x556701d3be90 in run_argv tools/perf/perf.c:421 grate-driver#15 0x556701d3c3f8 in main tools/perf/perf.c:537 grate-driver#16 0x7f2952a46189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 SUMMARY: AddressSanitizer: 21 byte(s) leaked in 1 allocation(s). ``` Fixes: f7b58cb ("perf mem/c2c: Add load store event mappings for AMD") Signed-off-by: Ian Rogers <[email protected]> Acked-by: Ravi Bangoria <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
okias
pushed a commit
that referenced
this pull request
Jul 31, 2023
[ Upstream commit b684c09 ] ppc_save_regs() skips one stack frame while saving the CPU register states. Instead of saving current R1, it pulls the previous stack frame pointer. When vmcores caused by direct panic call (such as `echo c > /proc/sysrq-trigger`), are debugged with gdb, gdb fails to show the backtrace correctly. On further analysis, it was found that it was because of mismatch between r1 and NIP. GDB uses NIP to get current function symbol and uses corresponding debug info of that function to unwind previous frames, but due to the mismatching r1 and NIP, the unwinding does not work, and it fails to unwind to the 2nd frame and hence does not show the backtrace. GDB backtrace with vmcore of kernel without this patch: --------- (gdb) bt #0 0xc0000000002a53e8 in crash_setup_regs (oldregs=<optimized out>, newregs=0xc000000004f8f8d8) at ./arch/powerpc/include/asm/kexec.h:69 #1 __crash_kexec (regs=<optimized out>) at kernel/kexec_core.c:974 #2 0x0000000000000063 in ?? () #3 0xc000000003579320 in ?? () --------- Further analysis revealed that the mismatch occurred because "ppc_save_regs" was saving the previous stack's SP instead of the current r1. This patch fixes this by storing current r1 in the saved pt_regs. GDB backtrace with vmcore of patched kernel: -------- (gdb) bt #0 0xc0000000002a53e8 in crash_setup_regs (oldregs=0x0, newregs=0xc00000000670b8d8) at ./arch/powerpc/include/asm/kexec.h:69 #1 __crash_kexec (regs=regs@entry=0x0) at kernel/kexec_core.c:974 #2 0xc000000000168918 in panic (fmt=fmt@entry=0xc000000001654a60 "sysrq triggered crash\n") at kernel/panic.c:358 #3 0xc000000000b735f8 in sysrq_handle_crash (key=<optimized out>) at drivers/tty/sysrq.c:155 #4 0xc000000000b742cc in __handle_sysrq (key=key@entry=99, check_mask=check_mask@entry=false) at drivers/tty/sysrq.c:602 #5 0xc000000000b7506c in write_sysrq_trigger (file=<optimized out>, buf=<optimized out>, count=2, ppos=<optimized out>) at drivers/tty/sysrq.c:1163 #6 0xc00000000069a7bc in pde_write (ppos=<optimized out>, count=<optimized out>, buf=<optimized out>, file=<optimized out>, pde=0xc00000000362cb40) at fs/proc/inode.c:340 #7 proc_reg_write (file=<optimized out>, buf=<optimized out>, count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:352 #8 0xc0000000005b3bbc in vfs_write (file=file@entry=0xc000000006aa6b00, buf=buf@entry=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>, count=count@entry=2, pos=pos@entry=0xc00000000670bda0) at fs/read_write.c:582 grate-driver#9 0xc0000000005b4264 in ksys_write (fd=<optimized out>, buf=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>, count=2) at fs/read_write.c:637 grate-driver#10 0xc00000000002ea2c in system_call_exception (regs=0xc00000000670be80, r0=<optimized out>) at arch/powerpc/kernel/syscall.c:171 grate-driver#11 0xc00000000000c270 in system_call_vectored_common () at arch/powerpc/kernel/interrupt_64.S:192 -------- Nick adds: So this now saves regs as though it was an interrupt taken in the caller, at the instruction after the call to ppc_save_regs, whereas previously the NIP was there, but R1 came from the caller's caller and that mismatch is what causes gdb's dwarf unwinder to go haywire. Signed-off-by: Aditya Gupta <[email protected]> Fixes: d16a58f ("powerpc: Improve ppc_save_regs()") Reivewed-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected] Signed-off-by: Sasha Levin <[email protected]>
okias
pushed a commit
that referenced
this pull request
Dec 4, 2023
When scanning namespaces, it is possible to get valid data from the first call to nvme_identify_ns() in nvme_alloc_ns(), but not from the second call in nvme_update_ns_info_block(). In particular, if the NSID becomes inactive between the two commands, a storage device may return a buffer filled with zero as per 4.1.5.1. In this case, we can get a kernel crash due to a divide-by-zero in blk_stack_limits() because ns->lba_shift will be set to zero. PID: 326 TASK: ffff95fec3cd8000 CPU: 29 COMMAND: "kworker/u98:10" #0 [ffffad8f8702f9e0] machine_kexec at ffffffff91c76ec7 #1 [ffffad8f8702fa38] __crash_kexec at ffffffff91dea4fa #2 [ffffad8f8702faf8] crash_kexec at ffffffff91deb788 #3 [ffffad8f8702fb00] oops_end at ffffffff91c2e4bb #4 [ffffad8f8702fb20] do_trap at ffffffff91c2a4ce #5 [ffffad8f8702fb70] do_error_trap at ffffffff91c2a595 #6 [ffffad8f8702fbb0] exc_divide_error at ffffffff928506e6 #7 [ffffad8f8702fbd0] asm_exc_divide_error at ffffffff92a00926 [exception RIP: blk_stack_limits+434] RIP: ffffffff92191872 RSP: ffffad8f8702fc80 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff95efa0c91800 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001 RBP: 00000000ffffffff R8: ffff95fec7df35a8 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff95fed33c09a8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffffad8f8702fce0] nvme_update_ns_info_block at ffffffffc06d3533 [nvme_core] grate-driver#9 [ffffad8f8702fd18] nvme_scan_ns at ffffffffc06d6fa7 [nvme_core] This happened when the check for valid data was moved out of nvme_identify_ns() into one of the callers. Fix this by checking in both callers. Link: https://bugzilla.kernel.org/show_bug.cgi?id=218186 Fixes: 0dd6fff ("nvme: bring back auto-removal of deleted namespaces during sequential scan") Cc: [email protected] Signed-off-by: Ewan D. Milne <[email protected]> Signed-off-by: Keith Busch <[email protected]>
okias
pushed a commit
that referenced
this pull request
Dec 27, 2023
Jiri Pirko says: ==================== devlink: introduce notifications filtering From: Jiri Pirko <[email protected]> Currently the user listening on a socket for devlink notifications gets always all messages for all existing devlink instances and objects, even if he is interested only in one of those. That may cause unnecessary overhead on setups with thousands of instances present. User is currently able to narrow down the devlink objects replies to dump commands by specifying select attributes. Allow similar approach for notifications providing user a new notify-filter-set command to select attributes with values the notification message has to match. In that case, it is delivered to the socket. Note that the filtering is done per-socket, so multiple users may specify different selection of attributes with values. This patchset initially introduces support for following attributes: DEVLINK_ATTR_BUS_NAME DEVLINK_ATTR_DEV_NAME DEVLINK_ATTR_PORT_INDEX Patches #1 - #4 are preparations in devlink code, patch #3 is an optimization done on the way. Patches #5 - #7 are preparations in netlink and generic netlink code. Patch #8 is the main one in this set implementing of the notify-filter-set command and the actual per-socket filtering. Patch grate-driver#9 extends the infrastructure allowing to filter according to a port index. Example: $ devlink mon port pci/0000:08:00.0/32768 [port,new] pci/0000:08:00.0/32768: type notset flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable [port,new] pci/0000:08:00.0/32768: type eth flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable [port,new] pci/0000:08:00.0/32768: type eth netdev eth3 flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable [port,new] pci/0000:08:00.0/32768: type eth netdev eth3 flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable [port,new] pci/0000:08:00.0/32768: type eth flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable [port,new] pci/0000:08:00.0/32768: type notset flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable [port,del] pci/0000:08:00.0/32768: type notset flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
okias
pushed a commit
that referenced
this pull request
Dec 27, 2023
Andrii Nakryiko says: ==================== Enhance BPF global subprogs with argument tags This patch set adds verifier support for annotating user's global BPF subprog arguments with few commonly requested annotations, to improve global subprog verification experience. These tags are: - ability to annotate a special PTR_TO_CTX argument; - ability to annotate a generic PTR_TO_MEM as non-null. We utilize btf_decl_tag attribute for this and provide two helper macros as part of bpf_helpers.h in libbpf (patch #8). Besides this we also add abilit to pass a pointer to dynptr into global subprog. This is done based on type name match (struct bpf_dynptr *). This allows to pass dynptrs into global subprogs, for use cases that deal with variable-sized generic memory pointers. Big chunk of the patch set (patches #1 through #5) are various refactorings to make verifier internals around global subprog validation logic easier to extend and support long term, eliminating BTF parsing logic duplication, factoring out argument expectation definitions from BTF parsing, etc. New functionality is added in patch #6 (ctx and non-null) and patch #7 (dynptr), extending global subprog checks with awareness for arg tags. Patch grate-driver#9 adds simple tests validating each of the added tags and dynptr argument passing. Patch grate-driver#10 adds a simple negative case for freplace programs to make sure that target BPF programs with "unreliable" BTF func proto cannot be freplaced. v2->v3: - patch grate-driver#10 improved by checking expected verifier error (Eduard); v1->v2: - dropped packet args for now (Eduard); - added back unreliable=true detection for entry BPF programs (Eduard); - improved subprog arg validation (Eduard); - switched dynptr arg from tag to just type name based check (Eduard). ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
okias
pushed a commit
that referenced
this pull request
Dec 27, 2023
Ido Schimmel says: ==================== Add MDB bulk deletion support This patchset adds MDB bulk deletion support, allowing user space to request the deletion of matching entries instead of dumping the entire MDB and issuing a separate deletion request for each matching entry. Support is added in both the bridge and VXLAN drivers in a similar fashion to the existing FDB bulk deletion support. The parameters according to which bulk deletion can be performed are similar to the FDB ones, namely: Destination port, VLAN ID, state (e.g., "permanent"), routing protocol, source / destination VNI, destination IP and UDP port. Flushing based on flags (e.g., "offload", "fast_leave", "added_by_star_ex", "blocked") is not currently supported, but can be added in the future, if a use case arises. Patch #1 adds a new uAPI attribute to allow specifying the state mask according to which bulk deletion will be performed, if any. Patch #2 adds a new policy according to which bulk deletion requests (with 'NLM_F_BULK' flag set) will be parsed. Patches #3-#4 add a new NDO for MDB bulk deletion and invoke it from the rtnetlink code when a bulk deletion request is made. Patches #5-#6 implement the MDB bulk deletion NDO in the bridge and VXLAN drivers, respectively. Patch #7 allows user space to issue MDB bulk deletion requests by no longer rejecting the 'NLM_F_BULK' flag when it is set in 'RTM_DELMDB' requests. Patches #8-grate-driver#9 add selftests for both drivers, for both good and bad flows. iproute2 changes can be found here [1]. https://github.com/idosch/iproute2/tree/submit/mdb_flush_v1 ==================== Signed-off-by: David S. Miller <[email protected]>
okias
pushed a commit
that referenced
this pull request
Dec 27, 2023
An issue occurred while reading an ELF file in libbpf.c during fuzzing: Program received signal SIGSEGV, Segmentation fault. 0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206 4206 in libbpf.c (gdb) bt #0 0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206 #1 0x000000000094f9d6 in bpf_object.collect_relos () at libbpf.c:6706 #2 0x000000000092bef3 in bpf_object_open () at libbpf.c:7437 #3 0x000000000092c046 in bpf_object.open_mem () at libbpf.c:7497 #4 0x0000000000924afa in LLVMFuzzerTestOneInput () at fuzz/bpf-object-fuzzer.c:16 #5 0x000000000060be11 in testblitz_engine::fuzzer::Fuzzer::run_one () #6 0x000000000087ad92 in tracing::span::Span::in_scope () #7 0x00000000006078aa in testblitz_engine::fuzzer::util::walkdir () #8 0x00000000005f3217 in testblitz_engine::entrypoint::main::{{closure}} () grate-driver#9 0x00000000005f2601 in main () (gdb) scn_data was null at this code(tools/lib/bpf/src/libbpf.c): if (rel->r_offset % BPF_INSN_SZ || rel->r_offset >= scn_data->d_size) { The scn_data is derived from the code above: scn = elf_sec_by_idx(obj, sec_idx); scn_data = elf_sec_data(obj, scn); relo_sec_name = elf_sec_str(obj, shdr->sh_name); sec_name = elf_sec_name(obj, scn); if (!relo_sec_name || !sec_name)// don't check whether scn_data is NULL return -EINVAL; In certain special scenarios, such as reading a malformed ELF file, it is possible that scn_data may be a null pointer Signed-off-by: Mingyi Zhang <[email protected]> Signed-off-by: Xin Liu <[email protected]> Signed-off-by: Changye Wu <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
okias
pushed a commit
that referenced
this pull request
Jan 7, 2024
[ Upstream commit a84fbf2 ] Generating metrics llc_code_read_mpi_demand_plus_prefetch, llc_data_read_mpi_demand_plus_prefetch, llc_miss_local_memory_bandwidth_read, llc_miss_local_memory_bandwidth_write, nllc_miss_remote_memory_bandwidth_read, memory_bandwidth_read, memory_bandwidth_write, uncore_frequency, upi_data_transmit_bw, C2_Pkg_Residency, C3_Core_Residency, C3_Pkg_Residency, C6_Core_Residency, C6_Pkg_Residency, C7_Core_Residency, C7_Pkg_Residency, UNCORE_FREQ and tma_info_system_socket_clks would trigger an address sanitizer heap-buffer-overflows on a SkylakeX. ``` ==2567752==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x5020003ed098 at pc 0x5621a816654e bp 0x7fffb55d4da0 sp 0x7fffb55d4d98 READ of size 4 at 0x5020003eee78 thread T0 #0 0x558265d6654d in aggr_cpu_id__is_empty tools/perf/util/cpumap.c:694:12 #1 0x558265c914da in perf_stat__get_aggr tools/perf/builtin-stat.c:1490:6 #2 0x558265c914da in perf_stat__get_global_cached tools/perf/builtin-stat.c:1530:9 #3 0x558265e53290 in should_skip_zero_counter tools/perf/util/stat-display.c:947:31 #4 0x558265e53290 in print_counter_aggrdata tools/perf/util/stat-display.c:985:18 #5 0x558265e51931 in print_counter tools/perf/util/stat-display.c:1110:3 #6 0x558265e51931 in evlist__print_counters tools/perf/util/stat-display.c:1571:5 #7 0x558265c8ec87 in print_counters tools/perf/builtin-stat.c:981:2 #8 0x558265c8cc71 in cmd_stat tools/perf/builtin-stat.c:2837:3 grate-driver#9 0x558265bb9bd4 in run_builtin tools/perf/perf.c:323:11 grate-driver#10 0x558265bb98eb in handle_internal_command tools/perf/perf.c:377:8 grate-driver#11 0x558265bb9389 in run_argv tools/perf/perf.c:421:2 grate-driver#12 0x558265bb9389 in main tools/perf/perf.c:537:3 ``` The issue was the use of testing a cpumap with NULL rather than using empty, as a map containing the dummy value isn't NULL and the -1 results in an empty aggr map being allocated which legitimately overflows when any member is accessed. Fixes: 8a96f45 ("perf stat: Avoid SEGV if core.cpus isn't set") Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
okias
pushed a commit
that referenced
this pull request
Jan 7, 2024
[ Upstream commit ede72dc ] Fuzzing found that an invalid tracepoint name would create a memory leak with an address sanitizer build: ``` $ perf stat -e '*:o/' true event syntax error: '*:o/' \___ parser error Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events ================================================================= ==59380==ERROR: LeakSanitizer: detected memory leaks Direct leak of 4 byte(s) in 2 object(s) allocated from: #0 0x7f38ac07077b in __interceptor_strdup ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439 #1 0x55f2f41be73b in str util/parse-events.l:49 #2 0x55f2f41d08e8 in parse_events_lex util/parse-events.l:338 #3 0x55f2f41dc3b1 in parse_events_parse util/parse-events-bison.c:1464 #4 0x55f2f410b8b3 in parse_events__scanner util/parse-events.c:1822 #5 0x55f2f410d1b9 in __parse_events util/parse-events.c:2094 #6 0x55f2f410e57f in parse_events_option util/parse-events.c:2279 #7 0x55f2f4427b56 in get_value tools/lib/subcmd/parse-options.c:251 #8 0x55f2f4428d98 in parse_short_opt tools/lib/subcmd/parse-options.c:351 grate-driver#9 0x55f2f4429d80 in parse_options_step tools/lib/subcmd/parse-options.c:539 grate-driver#10 0x55f2f442acb9 in parse_options_subcommand tools/lib/subcmd/parse-options.c:654 grate-driver#11 0x55f2f3ec99fc in cmd_stat tools/perf/builtin-stat.c:2501 grate-driver#12 0x55f2f4093289 in run_builtin tools/perf/perf.c:322 grate-driver#13 0x55f2f40937f5 in handle_internal_command tools/perf/perf.c:375 grate-driver#14 0x55f2f4093bbd in run_argv tools/perf/perf.c:419 grate-driver#15 0x55f2f409412b in main tools/perf/perf.c:535 SUMMARY: AddressSanitizer: 4 byte(s) leaked in 2 allocation(s). ``` Fix by adding the missing destructor. Fixes: 865582c ("perf tools: Adds the tracepoint name parsing support") Signed-off-by: Ian Rogers <[email protected]> Cc: He Kuang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
okias
pushed a commit
that referenced
this pull request
Jan 7, 2024
commit 1a5352a upstream. The ath11k active pdevs are protected by RCU but the temperature event handling code calling ath11k_mac_get_ar_by_pdev_id() was not marked as a read-side critical section as reported by RCU lockdep: ============================= WARNING: suspicious RCU usage 6.6.0-rc6 #7 Not tainted ----------------------------- drivers/net/wireless/ath/ath11k/mac.c:638 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 no locks held by swapper/0/0. ... Call trace: ... lockdep_rcu_suspicious+0x16c/0x22c ath11k_mac_get_ar_by_pdev_id+0x194/0x1b0 [ath11k] ath11k_wmi_tlv_op_rx+0xa84/0x2c1c [ath11k] ath11k_htc_rx_completion_handler+0x388/0x510 [ath11k] Mark the code in question as an RCU read-side critical section to avoid any potential use-after-free issues. Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23 Fixes: a41d103 ("ath11k: add thermal sensor device support") Cc: [email protected] # 5.7 Signed-off-by: Johan Hovold <[email protected]> Acked-by: Jeff Johnson <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
okias
pushed a commit
that referenced
this pull request
Jan 7, 2024
commit d8b90d6 upstream. When scanning namespaces, it is possible to get valid data from the first call to nvme_identify_ns() in nvme_alloc_ns(), but not from the second call in nvme_update_ns_info_block(). In particular, if the NSID becomes inactive between the two commands, a storage device may return a buffer filled with zero as per 4.1.5.1. In this case, we can get a kernel crash due to a divide-by-zero in blk_stack_limits() because ns->lba_shift will be set to zero. PID: 326 TASK: ffff95fec3cd8000 CPU: 29 COMMAND: "kworker/u98:10" #0 [ffffad8f8702f9e0] machine_kexec at ffffffff91c76ec7 #1 [ffffad8f8702fa38] __crash_kexec at ffffffff91dea4fa #2 [ffffad8f8702faf8] crash_kexec at ffffffff91deb788 #3 [ffffad8f8702fb00] oops_end at ffffffff91c2e4bb #4 [ffffad8f8702fb20] do_trap at ffffffff91c2a4ce #5 [ffffad8f8702fb70] do_error_trap at ffffffff91c2a595 #6 [ffffad8f8702fbb0] exc_divide_error at ffffffff928506e6 #7 [ffffad8f8702fbd0] asm_exc_divide_error at ffffffff92a00926 [exception RIP: blk_stack_limits+434] RIP: ffffffff92191872 RSP: ffffad8f8702fc80 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff95efa0c91800 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001 RBP: 00000000ffffffff R8: ffff95fec7df35a8 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff95fed33c09a8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffffad8f8702fce0] nvme_update_ns_info_block at ffffffffc06d3533 [nvme_core] grate-driver#9 [ffffad8f8702fd18] nvme_scan_ns at ffffffffc06d6fa7 [nvme_core] This happened when the check for valid data was moved out of nvme_identify_ns() into one of the callers. Fix this by checking in both callers. Link: https://bugzilla.kernel.org/show_bug.cgi?id=218186 Fixes: 0dd6fff ("nvme: bring back auto-removal of deleted namespaces during sequential scan") Cc: [email protected] Signed-off-by: Ewan D. Milne <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
okias
pushed a commit
that referenced
this pull request
Jan 7, 2024
[ Upstream commit e3e82fc ] When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] grate-driver#9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] grate-driver#10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] grate-driver#11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] grate-driver#12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb grate-driver#13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 grate-driver#14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 grate-driver#15 [ffff88aa841efb88] device_del at ffffffff82179d23 grate-driver#16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] grate-driver#17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] grate-driver#18 [ffff88aa841efde8] process_one_work at ffffffff811c589a grate-driver#19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff grate-driver#20 [ffff88aa841eff10] kthread at ffffffff811d87a0 grate-driver#21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/[email protected] Suggested-by: "Ismail, Mustafa" <[email protected]> Signed-off-by: Shifeng Li <[email protected]> Reviewed-by: Shiraz Saleem <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
okias
pushed a commit
that referenced
this pull request
Jan 7, 2024
commit fe2b122 upstream. When working on LED support for r8169 I got the following lockdep warning. Easiest way to prevent this scenario seems to be to take the RTNL lock before the trigger_data lock in set_device_name(). ====================================================== WARNING: possible circular locking dependency detected 6.7.0-rc2-next-20231124+ #2 Not tainted ------------------------------------------------------ bash/383 is trying to acquire lock: ffff888103aa1c68 (&trigger_data->lock){+.+.}-{3:3}, at: netdev_trig_notify+0xec/0x190 [ledtrig_netdev] but task is already holding lock: ffffffff8cddf808 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x12/0x20 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (rtnl_mutex){+.+.}-{3:3}: __mutex_lock+0x9b/0xb50 mutex_lock_nested+0x16/0x20 rtnl_lock+0x12/0x20 set_device_name+0xa9/0x120 [ledtrig_netdev] netdev_trig_activate+0x1a1/0x230 [ledtrig_netdev] led_trigger_set+0x172/0x2c0 led_trigger_write+0xf1/0x140 sysfs_kf_bin_write+0x5d/0x80 kernfs_fop_write_iter+0x15d/0x210 vfs_write+0x1f0/0x510 ksys_write+0x6c/0xf0 __x64_sys_write+0x14/0x20 do_syscall_64+0x3f/0xf0 entry_SYSCALL_64_after_hwframe+0x6c/0x74 -> #0 (&trigger_data->lock){+.+.}-{3:3}: __lock_acquire+0x1459/0x25a0 lock_acquire+0xc8/0x2d0 __mutex_lock+0x9b/0xb50 mutex_lock_nested+0x16/0x20 netdev_trig_notify+0xec/0x190 [ledtrig_netdev] call_netdevice_register_net_notifiers+0x5a/0x100 register_netdevice_notifier+0x85/0x120 netdev_trig_activate+0x1d4/0x230 [ledtrig_netdev] led_trigger_set+0x172/0x2c0 led_trigger_write+0xf1/0x140 sysfs_kf_bin_write+0x5d/0x80 kernfs_fop_write_iter+0x15d/0x210 vfs_write+0x1f0/0x510 ksys_write+0x6c/0xf0 __x64_sys_write+0x14/0x20 do_syscall_64+0x3f/0xf0 entry_SYSCALL_64_after_hwframe+0x6c/0x74 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock(&trigger_data->lock); lock(rtnl_mutex); lock(&trigger_data->lock); *** DEADLOCK *** 8 locks held by bash/383: #0: ffff888103ff33f0 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x6c/0xf0 #1: ffff888103aa1e88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x114/0x210 #2: ffff8881036f1890 (kn->active#82){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x11d/0x210 #3: ffff888108e2c358 (&led_cdev->led_access){+.+.}-{3:3}, at: led_trigger_write+0x30/0x140 #4: ffffffff8cdd9e10 (triggers_list_lock){++++}-{3:3}, at: led_trigger_write+0x75/0x140 #5: ffff888108e2c270 (&led_cdev->trigger_lock){++++}-{3:3}, at: led_trigger_write+0xe3/0x140 #6: ffffffff8cdde3d0 (pernet_ops_rwsem){++++}-{3:3}, at: register_netdevice_notifier+0x1c/0x120 #7: ffffffff8cddf808 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x12/0x20 stack backtrace: CPU: 0 PID: 383 Comm: bash Not tainted 6.7.0-rc2-next-20231124+ #2 Hardware name: Default string Default string/Default string, BIOS ADLN.M6.SODIMM.ZB.CY.015 08/08/2023 Call Trace: <TASK> dump_stack_lvl+0x5c/0xd0 dump_stack+0x10/0x20 print_circular_bug+0x2dd/0x410 check_noncircular+0x131/0x150 __lock_acquire+0x1459/0x25a0 lock_acquire+0xc8/0x2d0 ? netdev_trig_notify+0xec/0x190 [ledtrig_netdev] __mutex_lock+0x9b/0xb50 ? netdev_trig_notify+0xec/0x190 [ledtrig_netdev] ? __this_cpu_preempt_check+0x13/0x20 ? netdev_trig_notify+0xec/0x190 [ledtrig_netdev] ? __cancel_work_timer+0x11c/0x1b0 ? __mutex_lock+0x123/0xb50 mutex_lock_nested+0x16/0x20 ? mutex_lock_nested+0x16/0x20 netdev_trig_notify+0xec/0x190 [ledtrig_netdev] call_netdevice_register_net_notifiers+0x5a/0x100 register_netdevice_notifier+0x85/0x120 netdev_trig_activate+0x1d4/0x230 [ledtrig_netdev] led_trigger_set+0x172/0x2c0 ? preempt_count_add+0x49/0xc0 led_trigger_write+0xf1/0x140 sysfs_kf_bin_write+0x5d/0x80 kernfs_fop_write_iter+0x15d/0x210 vfs_write+0x1f0/0x510 ksys_write+0x6c/0xf0 __x64_sys_write+0x14/0x20 do_syscall_64+0x3f/0xf0 entry_SYSCALL_64_after_hwframe+0x6c/0x74 RIP: 0033:0x7f269055d034 Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d 35 c3 0d 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48 RSP: 002b:00007ffddb7ef748 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f269055d034 RDX: 0000000000000007 RSI: 000055bf5f4af3c0 RDI: 0000000000000001 RBP: 000055bf5f4af3c0 R08: 0000000000000073 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000007 R13: 00007f26906325c0 R14: 00007f269062ff20 R15: 0000000000000000 </TASK> Fixes: d5e0126 ("leds: trigger: netdev: add additional specific link speed mode") Cc: [email protected] Signed-off-by: Heiner Kallweit <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Acked-by: Lee Jones <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add qcom,pm8921-keypad based keypad to support vol_up and vol_down keys
Signed-off-by: Ivan Belokobylskiy [email protected]