Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standing pull request from jadonk #4

Closed
wants to merge 35 commits into from
Closed

Conversation

jadonk
Copy link
Member

@jadonk jadonk commented Aug 27, 2014

Please don't pull this request. I'm using this to test the ability for http://builds.beagleboard.org to build pull requests.

cape-universal-00A0.dts: added

Imported from https://github.com/cdsteinkuehler/beaglebone-universal-io
Commit-id: 829efbf1594ab60f16dccce3e0bc86757f179683

RobertCNelson and others added 30 commits August 25, 2014 17:05
Signed-off-by: Robert Nelson <[email protected]>
Signed-off-by: Robert Nelson <[email protected]>
(split into pinmux and cape patches rcn-ee)

Signed-off-by: John Syn <[email protected]>
Signed-off-by: Robert Nelson <[email protected]>
Signed-off-by: Robert Nelson <[email protected]>
Signed-off-by: Robert Nelson <[email protected]>
Rewritten using includes, v3.16.1

Signed-off-by: Dave Lambert <[email protected]>
Signed-off-by: Robert Nelson <[email protected]>
Signed-off-by: Robert Nelson <[email protected]>
Signed-off-by: Robert Nelson <[email protected]>
Signed-off-by: Robert Nelson <[email protected]>
Signed-off-by: Robert Nelson <[email protected]>
Signed-off-by: Robert Nelson <[email protected]>
#git://git.ti.com/ti-cm3-pm-firmware/amx3-cm3.git
#fb0117edd5810a8d3bd9b1cd8abe34e12ff2d0ba

Signed-off-by: Robert Nelson <[email protected]>
Signed-off-by: Robert Nelson <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Dec 17, 2021
[ Upstream commit 8f96a5b ]

We update the ctime/mtime of a block device when we remove it so that
blkid knows the device changed.  However we do this by re-opening the
block device and calling filp_update_time.  This is more correct because
it'll call the inode->i_op->update_time if it exists, but the block dev
inodes do not do this.  Instead call generic_update_time() on the
bd_inode in order to avoid the blkdev_open path and get rid of the
following lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc2+ #406 Not tainted
------------------------------------------------------
losetup/11596 is trying to acquire lock:
ffff939640d2f538 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0

but task is already holding lock:
ffff939655510c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (&lo->lo_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       lo_open+0x28/0x60 [loop]
       blkdev_get_whole+0x25/0xf0
       blkdev_get_by_dev.part.0+0x168/0x3c0
       blkdev_open+0xd2/0xe0
       do_dentry_open+0x161/0x390
       path_openat+0x3cc/0xa20
       do_filp_open+0x96/0x120
       do_sys_openat2+0x7b/0x130
       __x64_sys_openat+0x46/0x70
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #3 (&disk->open_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       blkdev_get_by_dev.part.0+0x56/0x3c0
       blkdev_open+0xd2/0xe0
       do_dentry_open+0x161/0x390
       path_openat+0x3cc/0xa20
       do_filp_open+0x96/0x120
       file_open_name+0xc7/0x170
       filp_open+0x2c/0x50
       btrfs_scratch_superblocks.part.0+0x10f/0x170
       btrfs_rm_device.cold+0xe8/0xed
       btrfs_ioctl+0x2a31/0x2e70
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #2 (sb_writers#12){.+.+}-{0:0}:
       lo_write_bvec+0xc2/0x240 [loop]
       loop_process_work+0x238/0xd00 [loop]
       process_one_work+0x26b/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
       process_one_work+0x245/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #0 ((wq_completion)loop0){+.+.}-{0:0}:
       __lock_acquire+0x10ea/0x1d90
       lock_acquire+0xb5/0x2b0
       flush_workqueue+0x91/0x5e0
       drain_workqueue+0xa0/0x110
       destroy_workqueue+0x36/0x250
       __loop_clr_fd+0x9a/0x660 [loop]
       block_ioctl+0x3f/0x50
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

Chain exists of:
  (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&lo->lo_mutex);
                               lock(&disk->open_mutex);
                               lock(&lo->lo_mutex);
  lock((wq_completion)loop0);

 *** DEADLOCK ***

1 lock held by losetup/11596:
 #0: ffff939655510c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

stack backtrace:
CPU: 1 PID: 11596 Comm: losetup Not tainted 5.14.0-rc2+ #406
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack_lvl+0x57/0x72
 check_noncircular+0xcf/0xf0
 ? stack_trace_save+0x3b/0x50
 __lock_acquire+0x10ea/0x1d90
 lock_acquire+0xb5/0x2b0
 ? flush_workqueue+0x67/0x5e0
 ? lockdep_init_map_type+0x47/0x220
 flush_workqueue+0x91/0x5e0
 ? flush_workqueue+0x67/0x5e0
 ? verify_cpu+0xf0/0x100
 drain_workqueue+0xa0/0x110
 destroy_workqueue+0x36/0x250
 __loop_clr_fd+0x9a/0x660 [loop]
 ? blkdev_ioctl+0x8d/0x2a0
 block_ioctl+0x3f/0x50
 __x64_sys_ioctl+0x80/0xb0
 do_syscall_64+0x38/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Reviewed-by: Anand Jain <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Dec 17, 2021
[ Upstream commit 8ef9dc0 ]

We got the following lockdep splat while running fstests (specifically
btrfs/003 and btrfs/020 in a row) with the new rc.  This was uncovered
by 87579e9 ("loop: use worker per cgroup instead of kworker") which
converted loop to using workqueues, which comes with lockdep
annotations that don't exist with kworkers.  The lockdep splat is as
follows:

  WARNING: possible circular locking dependency detected
  5.14.0-rc2-custom+ #34 Not tainted
  ------------------------------------------------------
  losetup/156417 is trying to acquire lock:
  ffff9c7645b02d38 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x84/0x600

  but task is already holding lock:
  ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #5 (&lo->lo_mutex){+.+.}-{3:3}:
	 __mutex_lock+0xba/0x7c0
	 lo_open+0x28/0x60 [loop]
	 blkdev_get_whole+0x28/0xf0
	 blkdev_get_by_dev.part.0+0x168/0x3c0
	 blkdev_open+0xd2/0xe0
	 do_dentry_open+0x163/0x3a0
	 path_openat+0x74d/0xa40
	 do_filp_open+0x9c/0x140
	 do_sys_openat2+0xb1/0x170
	 __x64_sys_openat+0x54/0x90
	 do_syscall_64+0x3b/0x90
	 entry_SYSCALL_64_after_hwframe+0x44/0xae

  -> #4 (&disk->open_mutex){+.+.}-{3:3}:
	 __mutex_lock+0xba/0x7c0
	 blkdev_get_by_dev.part.0+0xd1/0x3c0
	 blkdev_get_by_path+0xc0/0xd0
	 btrfs_scan_one_device+0x52/0x1f0 [btrfs]
	 btrfs_control_ioctl+0xac/0x170 [btrfs]
	 __x64_sys_ioctl+0x83/0xb0
	 do_syscall_64+0x3b/0x90
	 entry_SYSCALL_64_after_hwframe+0x44/0xae

  -> #3 (uuid_mutex){+.+.}-{3:3}:
	 __mutex_lock+0xba/0x7c0
	 btrfs_rm_device+0x48/0x6a0 [btrfs]
	 btrfs_ioctl+0x2d1c/0x3110 [btrfs]
	 __x64_sys_ioctl+0x83/0xb0
	 do_syscall_64+0x3b/0x90
	 entry_SYSCALL_64_after_hwframe+0x44/0xae

  -> #2 (sb_writers#11){.+.+}-{0:0}:
	 lo_write_bvec+0x112/0x290 [loop]
	 loop_process_work+0x25f/0xcb0 [loop]
	 process_one_work+0x28f/0x5d0
	 worker_thread+0x55/0x3c0
	 kthread+0x140/0x170
	 ret_from_fork+0x22/0x30

  -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
	 process_one_work+0x266/0x5d0
	 worker_thread+0x55/0x3c0
	 kthread+0x140/0x170
	 ret_from_fork+0x22/0x30

  -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
	 __lock_acquire+0x1130/0x1dc0
	 lock_acquire+0xf5/0x320
	 flush_workqueue+0xae/0x600
	 drain_workqueue+0xa0/0x110
	 destroy_workqueue+0x36/0x250
	 __loop_clr_fd+0x9a/0x650 [loop]
	 lo_ioctl+0x29d/0x780 [loop]
	 block_ioctl+0x3f/0x50
	 __x64_sys_ioctl+0x83/0xb0
	 do_syscall_64+0x3b/0x90
	 entry_SYSCALL_64_after_hwframe+0x44/0xae

  other info that might help us debug this:
  Chain exists of:
    (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
   Possible unsafe locking scenario:
	 CPU0                    CPU1
	 ----                    ----
    lock(&lo->lo_mutex);
				 lock(&disk->open_mutex);
				 lock(&lo->lo_mutex);
    lock((wq_completion)loop0);

   *** DEADLOCK ***
  1 lock held by losetup/156417:
   #0: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]

  stack backtrace:
  CPU: 8 PID: 156417 Comm: losetup Not tainted 5.14.0-rc2-custom+ #34
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Call Trace:
   dump_stack_lvl+0x57/0x72
   check_noncircular+0x10a/0x120
   __lock_acquire+0x1130/0x1dc0
   lock_acquire+0xf5/0x320
   ? flush_workqueue+0x84/0x600
   flush_workqueue+0xae/0x600
   ? flush_workqueue+0x84/0x600
   drain_workqueue+0xa0/0x110
   destroy_workqueue+0x36/0x250
   __loop_clr_fd+0x9a/0x650 [loop]
   lo_ioctl+0x29d/0x780 [loop]
   ? __lock_acquire+0x3a0/0x1dc0
   ? update_dl_rq_load_avg+0x152/0x360
   ? lock_is_held_type+0xa5/0x120
   ? find_held_lock.constprop.0+0x2b/0x80
   block_ioctl+0x3f/0x50
   __x64_sys_ioctl+0x83/0xb0
   do_syscall_64+0x3b/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7f645884de6b

Usually the uuid_mutex exists to protect the fs_devices that map
together all of the devices that match a specific uuid.  In rm_device
we're messing with the uuid of a device, so it makes sense to protect
that here.

However in doing that it pulls in a whole host of lockdep dependencies,
as we call mnt_may_write() on the sb before we grab the uuid_mutex, thus
we end up with the dependency chain under the uuid_mutex being added
under the normal sb write dependency chain, which causes problems with
loop devices.

We don't need the uuid mutex here however.  If we call
btrfs_scan_one_device() before we scratch the super block we will find
the fs_devices and not find the device itself and return EBUSY because
the fs_devices is open.  If we call it after the scratch happens it will
not appear to be a valid btrfs file system.

We do not need to worry about other fs_devices modifying operations here
because we're protected by the exclusive operations locking.

So drop the uuid_mutex here in order to fix the lockdep splat.

A more detailed explanation from the discussion:

We are worried about rm and scan racing with each other, before this
change we'll zero the device out under the UUID mutex so when scan does
run it'll make sure that it can go through the whole device scan thing
without rm messing with us.

We aren't worried if the scratch happens first, because the result is we
don't think this is a btrfs device and we bail out.

The only case we are concerned with is we scratch _after_ scan is able
to read the superblock and gets a seemingly valid super block, so lets
consider this case.

Scan will call device_list_add() with the device we're removing.  We'll
call find_fsid_with_metadata_uuid() and get our fs_devices for this
UUID.  At this point we lock the fs_devices->device_list_mutex.  This is
what protects us in this case, but we have two cases here.

1. We aren't to the device removal part of the RM.  We found our device,
   and device name matches our path, we go down and we set total_devices
   to our super number of devices, which doesn't affect anything because
   we haven't done the remove yet.

2. We are past the device removal part, which is protected by the
   device_list_mutex.  Scan doesn't find the device, it goes down and
   does the

   if (fs_devices->opened)
	   return -EBUSY;

   check and we bail out.

Nothing about this situation is ideal, but the lockdep splat is real,
and the fix is safe, tho admittedly a bit scary looking.

Reviewed-by: Anand Jain <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
[ copy more from the discussion ]
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Dec 17, 2021
[ Upstream commit f0caea8 ]

Olga reports seeing the following Oops when doing O_DIRECT writes to a
pNFS flexfiles server:

Oops: 0000 [#1] SMP PTI
CPU: 1 PID: 234186 Comm: kworker/u8:1 Not tainted 5.15.0-rc4+ #4
Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014
Workqueue: nfsiod rpc_async_release [sunrpc]
RIP: 0010:nfs_mark_request_commit+0x12/0x30 [nfs]
Code: ff ff be 03 00 00 00 e8 ac 34 83 eb e9 29 ff ff
ff e8 22 bc d7 eb 66 90 0f 1f 44 00 00 48 85 f6 74 16 48 8b 42 10 48
8b 40 18 <48> 8b 40 18 48 85 c0 74 05 e9 70 fc 15 ec 48 89 d6 e9 68 ed
ff ff
RSP: 0018:ffffa82f0159fe00 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8f3393141880 RCX: 0000000000000000
RDX: ffffa82f0159fe08 RSI: ffff8f3381252500 RDI: ffff8f3393141880
RBP: ffff8f33ac317c00 R08: 0000000000000000 R09: ffff8f3487724cb0
R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000001
R13: ffff8f3485bccee0 R14: ffff8f33ac317c10 R15: ffff8f33ac317cd8
FS:  0000000000000000(0000) GS:ffff8f34fbc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000018 CR3: 0000000122120006 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 nfs_direct_write_completion+0x13b/0x250 [nfs]
 rpc_free_task+0x39/0x60 [sunrpc]
 rpc_async_release+0x29/0x40 [sunrpc]
 process_one_work+0x1ce/0x370
 worker_thread+0x30/0x380
 ? process_one_work+0x370/0x370
 kthread+0x11a/0x140
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x22/0x30

Reported-by: Olga Kornievskaia <[email protected]>
Fixes: 9c455a8 ("NFS/pNFS: Clean up pNFS commit operations")
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Jan 12, 2022
commit f35838a upstream.

Line 1169 (#3) allocates a memory chunk for victim_name by kmalloc(),
but  when the function returns in line 1184 (#4) victim_name allocated
by line 1169 (#3) is not freed, which will lead to a memory leak.
There is a similar snippet of code in this function as allocating a memory
chunk for victim_name in line 1104 (#1) as well as releasing the memory
in line 1116 (#2).

We should kfree() victim_name when the return value of backref_in_log()
is less than zero and before the function returns in line 1184 (#4).

1057 static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
1058 				  struct btrfs_root *root,
1059 				  struct btrfs_path *path,
1060 				  struct btrfs_root *log_root,
1061 				  struct btrfs_inode *dir,
1062 				  struct btrfs_inode *inode,
1063 				  u64 inode_objectid, u64 parent_objectid,
1064 				  u64 ref_index, char *name, int namelen,
1065 				  int *search_done)
1066 {

1104 	victim_name = kmalloc(victim_name_len, GFP_NOFS);
	// #1: kmalloc (victim_name-1)
1105 	if (!victim_name)
1106 		return -ENOMEM;

1112	ret = backref_in_log(log_root, &search_key,
1113			parent_objectid, victim_name,
1114			victim_name_len);
1115	if (ret < 0) {
1116		kfree(victim_name); // #2: kfree (victim_name-1)
1117		return ret;
1118	} else if (!ret) {

1169 	victim_name = kmalloc(victim_name_len, GFP_NOFS);
	// #3: kmalloc (victim_name-2)
1170 	if (!victim_name)
1171 		return -ENOMEM;

1180 	ret = backref_in_log(log_root, &search_key,
1181 			parent_objectid, victim_name,
1182 			victim_name_len);
1183 	if (ret < 0) {
1184 		return ret; // #4: missing kfree (victim_name-2)
1185 	} else if (!ret) {

1241 	return 0;
1242 }

Fixes: d3316c8 ("btrfs: Properly handle backref_in_log retval")
CC: [email protected] # 5.10+
Reviewed-by: Qu Wenruo <[email protected]>
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Jianglei Nie <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Feb 10, 2022
icss_iep_perout_enable_hw() is invoked with spin_lock_irqsave() and
hence cannot use sleeping lock. Enable regmap fast_io for
iep_regmap_config so that it will use spinlock instead of mutex.

This fixes the following dump
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:947
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 1103, name: sh
5 locks held by sh/1103:
 #0: ffff00002b814438 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x6c/0xf8
 #1: ffff0000322ae888 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf0/0x1b8
 #2: ffff0000744837d0 (kn->active#158){++++}-{0:0}, at: kernfs_fop_write_iter+0xf8/0x1b8
 #3: ffff0000344401b8 (&iep->ptp_clk_mutex){+.+.}-{3:3}, at: icss_iep_ptp_enable+0x168/0x2f0 [icss_iep]
 #4: ffff0000344401f8 (&iep->irq_lock){....}-{2:2}, at: icss_iep_ptp_enable+0x19c/0x2f0 [icss_iep]
irq event stamp: 92858
hardirqs last  enabled at (92857): [<ffff800010cbefa0>] _raw_spin_unlock_irqrestore+0x80/0x98
hardirqs last disabled at (92858): [<ffff800010cbf660>] _raw_spin_lock_irqsave+0xb0/0x144
softirqs last  enabled at (92658): [<ffff80001001062c>] efi_header_end+0x62c/0x6b0
softirqs last disabled at (92645): [<ffff80001005993c>] irq_exit+0x1c4/0x1d8
Preemption disabled at:
[<ffff800009491484>] icss_iep_ptp_enable+0x19c/0x2f0 [icss_iep]
CPU: 1 PID: 1103 Comm: sh Not tainted 5.10.65-08510-g2bea885230fd-dirty #204
Hardware name: Texas Instruments AM642 EVM (DT)
Call trace:
 dump_backtrace+0x0/0x1d0
 show_stack+0x18/0x28
 dump_stack+0xec/0x154
 ___might_sleep+0x194/0x240
 __might_sleep+0x50/0x88
 __mutex_lock+0x5c/0x900
 mutex_lock_nested+0x34/0x50
 regmap_lock_mutex+0x14/0x20
 regmap_write+0x3c/0x78
 icss_iep_perout_enable_hw+0xd0/0x2c0 [icss_iep]
 icss_iep_ptp_enable+0x2b4/0x2f0 [icss_iep]
 pps_enable_store+0xc0/0xe0
 dev_attr_store+0x18/0x30
 sysfs_kf_write+0x4c/0x78
 kernfs_fop_write_iter+0x120/0x1b8
 new_sync_write+0xe8/0x188
 vfs_write+0x2ac/0x450
 ksys_write+0x6c/0xf8
 __arm64_sys_write+0x1c/0x28
 el0_svc_common.constprop.0+0x7c/0x1f0
 do_el0_svc+0x24/0x90
 el0_svc+0x20/0x30
 el0_sync_handler+0xb0/0xb8
 el0_sync+0x180/0x1c0

=============================
[ BUG: Invalid wait context ]
5.10.65-08510-g2bea885230fd-dirty #204 Tainted: G        W
-----------------------------
sh/1103 is trying to lock:
ffff000034440468 (icss_iep:959:(iep->plat_data->config)->lock){+.+.}-{3:3}, at: regmap_lock_mutex+0x14/0x20
other info that might help us debug this:
context-{4:4}
5 locks held by sh/1103:
 #0: ffff00002b814438 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x6c/0xf8
 #1: ffff0000322ae888 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf0/0x1b8
 #2: ffff0000744837d0 (kn->active#158){++++}-{0:0}, at: kernfs_fop_write_iter+0xf8/0x1b8
 #3: ffff0000344401b8 (&iep->ptp_clk_mutex){+.+.}-{3:3}, at: icss_iep_ptp_enable+0x168/0x2f0 [icss_iep]
 #4: ffff0000344401f8 (&iep->irq_lock){....}-{2:2}, at: icss_iep_ptp_enable+0x19c/0x2f0 [icss_iep]
stack backtrace:
CPU: 1 PID: 1103 Comm: sh Tainted: G        W         5.10.65-08510-g2bea885230fd-dirty #204
Hardware name: Texas Instruments AM642 EVM (DT)
Call trace:
 dump_backtrace+0x0/0x1d0
 show_stack+0x18/0x28
 dump_stack+0xec/0x154
 __lock_acquire+0x1d38/0x1d60
 lock_acquire+0x154/0x410
 __mutex_lock+0x9c/0x900
 mutex_lock_nested+0x34/0x50
 regmap_lock_mutex+0x14/0x20
 regmap_write+0x3c/0x78
 icss_iep_perout_enable_hw+0xd0/0x2c0 [icss_iep]
 icss_iep_ptp_enable+0x2b4/0x2f0 [icss_iep]
 pps_enable_store+0xc0/0xe0
 dev_attr_store+0x18/0x30
 sysfs_kf_write+0x4c/0x78
 kernfs_fop_write_iter+0x120/0x1b8
 new_sync_write+0xe8/0x188
 vfs_write+0x2ac/0x450
 ksys_write+0x6c/0xf8
 __arm64_sys_write+0x1c/0x28
 el0_svc_common.constprop.0+0x7c/0x1f0
 do_el0_svc+0x24/0x90
 el0_svc+0x20/0x30
 el0_sync_handler+0xb0/0xb8
 el0_sync+0x180/0x1c0

Fixes: 5d58df8 ("net: ethernet: ti: icss_iep: fix pps irq race vs pps disable")
Signed-off-by: Kishon Vijay Abraham I <[email protected]>
Signed-off-by: Vignesh Raghavendra <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Feb 16, 2022
commit 232796d upstream.

When enabling quotas, we attempt to commit a transaction while holding the
mutex fs_info->qgroup_ioctl_lock. This can result on a deadlock with other
quota operations such as:

- qgroup creation and deletion, ioctl BTRFS_IOC_QGROUP_CREATE;

- adding and removing qgroup relations, ioctl BTRFS_IOC_QGROUP_ASSIGN.

This is because these operations join a transaction and after that they
attempt to lock the mutex fs_info->qgroup_ioctl_lock. Acquiring that mutex
after joining or starting a transaction is a pattern followed everywhere
in qgroups, so the quota enablement operation is the one at fault here,
and should not commit a transaction while holding that mutex.

Fix this by making the transaction commit while not holding the mutex.
We are safe from two concurrent tasks trying to enable quotas because
we are serialized by the rw semaphore fs_info->subvol_sem at
btrfs_ioctl_quota_ctl(), which is the only call site for enabling
quotas.

When this deadlock happens, it produces a trace like the following:

  INFO: task syz-executor:25604 blocked for more than 143 seconds.
  Not tainted 5.15.0-rc6 #4
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:syz-executor state:D stack:24800 pid:25604 ppid: 24873 flags:0x00004004
  Call Trace:
  context_switch kernel/sched/core.c:4940 [inline]
  __schedule+0xcd9/0x2530 kernel/sched/core.c:6287
  schedule+0xd3/0x270 kernel/sched/core.c:6366
  btrfs_commit_transaction+0x994/0x2e90 fs/btrfs/transaction.c:2201
  btrfs_quota_enable+0x95c/0x1790 fs/btrfs/qgroup.c:1120
  btrfs_ioctl_quota_ctl fs/btrfs/ioctl.c:4229 [inline]
  btrfs_ioctl+0x637e/0x7b70 fs/btrfs/ioctl.c:5010
  vfs_ioctl fs/ioctl.c:51 [inline]
  __do_sys_ioctl fs/ioctl.c:874 [inline]
  __se_sys_ioctl fs/ioctl.c:860 [inline]
  __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7f86920b2c4d
  RSP: 002b:00007f868f61ac58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007f86921d90a0 RCX: 00007f86920b2c4d
  RDX: 0000000020005e40 RSI: 00000000c0109428 RDI: 0000000000000008
  RBP: 00007f869212bd80 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f86921d90a0
  R13: 00007fff6d233e4f R14: 00007fff6d233ff0 R15: 00007f868f61adc0
  INFO: task syz-executor:25628 blocked for more than 143 seconds.
  Not tainted 5.15.0-rc6 #4
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:syz-executor state:D stack:29080 pid:25628 ppid: 24873 flags:0x00004004
  Call Trace:
  context_switch kernel/sched/core.c:4940 [inline]
  __schedule+0xcd9/0x2530 kernel/sched/core.c:6287
  schedule+0xd3/0x270 kernel/sched/core.c:6366
  schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6425
  __mutex_lock_common kernel/locking/mutex.c:669 [inline]
  __mutex_lock+0xc96/0x1680 kernel/locking/mutex.c:729
  btrfs_remove_qgroup+0xb7/0x7d0 fs/btrfs/qgroup.c:1548
  btrfs_ioctl_qgroup_create fs/btrfs/ioctl.c:4333 [inline]
  btrfs_ioctl+0x683c/0x7b70 fs/btrfs/ioctl.c:5014
  vfs_ioctl fs/ioctl.c:51 [inline]
  __do_sys_ioctl fs/ioctl.c:874 [inline]
  __se_sys_ioctl fs/ioctl.c:860 [inline]
  __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Reported-by: Hao Sun <[email protected]>
Link: https://lore.kernel.org/linux-btrfs/CACkBjsZQF19bQ1C6=yetF3BvL10OSORpFUcWXTP6HErshDB4dQ@mail.gmail.com/
Fixes: 340f1aa ("btrfs: qgroups: Move transaction management inside btrfs_quota_enable/disable")
CC: [email protected] # 4.19
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Mar 31, 2022
commit b7d6f44 upstream.

Rolf Eike Beer reported the following bug:

[1274934.746891] Bad Address (null pointer deref?): Code=15 (Data TLB miss fault) at addr 0000004140000018
[1274934.746891] CPU: 3 PID: 5549 Comm: cmake Not tainted 5.15.4-gentoo-parisc64 #4
[1274934.746891] Hardware name: 9000/785/C8000
[1274934.746891]
[1274934.746891]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[1274934.746891] PSW: 00001000000001001111111000001110 Not tainted
[1274934.746891] r00-03  000000ff0804fe0e 0000000040bc9bc0 00000000406760e4 0000004140000000
[1274934.746891] r04-07  0000000040b693c0 0000004140000000 000000004a2b08b0 0000000000000001
[1274934.746891] r08-11  0000000041f98810 0000000000000000 000000004a0a7000 0000000000000001
[1274934.746891] r12-15  0000000040bddbc0 0000000040c0cbc0 0000000040bddbc0 0000000040bddbc0
[1274934.746891] r16-19  0000000040bde3c0 0000000040bddbc0 0000000040bde3c0 0000000000000007
[1274934.746891] r20-23  0000000000000006 000000004a368950 0000000000000000 0000000000000001
[1274934.746891] r24-27  0000000000001fff 000000000800000e 000000004a1710f0 0000000040b693c0
[1274934.746891] r28-31  0000000000000001 0000000041f988b0 0000000041f98840 000000004a171118
[1274934.746891] sr00-03  00000000066e5800 0000000000000000 0000000000000000 00000000066e5800
[1274934.746891] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[1274934.746891]
[1274934.746891] IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000406760e8 00000000406760ec
[1274934.746891]  IIR: 48780030    ISR: 0000000000000000  IOR: 0000004140000018
[1274934.746891]  CPU:        3   CR30: 00000040e3a9c000 CR31: ffffffffffffffff
[1274934.746891]  ORIG_R28: 0000000040acdd58
[1274934.746891]  IAOQ[0]: sba_unmap_sg+0xb0/0x118
[1274934.746891]  IAOQ[1]: sba_unmap_sg+0xb4/0x118
[1274934.746891]  RP(r2): sba_unmap_sg+0xac/0x118
[1274934.746891] Backtrace:
[1274934.746891]  [<00000000402740cc>] dma_unmap_sg_attrs+0x6c/0x70
[1274934.746891]  [<000000004074d6bc>] scsi_dma_unmap+0x54/0x60
[1274934.746891]  [<00000000407a3488>] mptscsih_io_done+0x150/0xd70
[1274934.746891]  [<0000000040798600>] mpt_interrupt+0x168/0xa68
[1274934.746891]  [<0000000040255a48>] __handle_irq_event_percpu+0xc8/0x278
[1274934.746891]  [<0000000040255c34>] handle_irq_event_percpu+0x3c/0xd8
[1274934.746891]  [<000000004025ecb4>] handle_percpu_irq+0xb4/0xf0
[1274934.746891]  [<00000000402548e0>] generic_handle_irq+0x50/0x70
[1274934.746891]  [<000000004019a254>] call_on_stack+0x18/0x24
[1274934.746891]
[1274934.746891] Kernel panic - not syncing: Bad Address (null pointer deref?)

The bug is caused by overrunning the sglist and incorrectly testing
sg_dma_len(sglist) before nents. Normally this doesn't cause a crash,
but in this case sglist crossed a page boundary. This occurs in the
following code:

	while (sg_dma_len(sglist) && nents--) {

The fix is simply to test nents first and move the decrement of nents
into the loop.

Reported-by: Rolf Eike Beer <[email protected]>
Signed-off-by: John David Anglin <[email protected]>
Cc: [email protected]
Signed-off-by: Helge Deller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Mar 31, 2022
[ Upstream commit 3d6cc98 ]

When cifs_get_root() fails during cifs_smb3_do_mount() we call
deactivate_locked_super() which eventually will call delayed_free() which
will free the context.
In this situation we should not proceed to enter the out: section in
cifs_smb3_do_mount() and free the same resources a second time.

[Thu Feb 10 12:59:06 2022] BUG: KASAN: use-after-free in rcu_cblist_dequeue+0x32/0x60
[Thu Feb 10 12:59:06 2022] Read of size 8 at addr ffff888364f4d110 by task swapper/1/0

[Thu Feb 10 12:59:06 2022] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G           OE     5.17.0-rc3+ #4
[Thu Feb 10 12:59:06 2022] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 12/17/2019
[Thu Feb 10 12:59:06 2022] Call Trace:
[Thu Feb 10 12:59:06 2022]  <IRQ>
[Thu Feb 10 12:59:06 2022]  dump_stack_lvl+0x5d/0x78
[Thu Feb 10 12:59:06 2022]  print_address_description.constprop.0+0x24/0x150
[Thu Feb 10 12:59:06 2022]  ? rcu_cblist_dequeue+0x32/0x60
[Thu Feb 10 12:59:06 2022]  kasan_report.cold+0x7d/0x117
[Thu Feb 10 12:59:06 2022]  ? rcu_cblist_dequeue+0x32/0x60
[Thu Feb 10 12:59:06 2022]  __asan_load8+0x86/0xa0
[Thu Feb 10 12:59:06 2022]  rcu_cblist_dequeue+0x32/0x60
[Thu Feb 10 12:59:06 2022]  rcu_core+0x547/0xca0
[Thu Feb 10 12:59:06 2022]  ? call_rcu+0x3c0/0x3c0
[Thu Feb 10 12:59:06 2022]  ? __this_cpu_preempt_check+0x13/0x20
[Thu Feb 10 12:59:06 2022]  ? lock_is_held_type+0xea/0x140
[Thu Feb 10 12:59:06 2022]  rcu_core_si+0xe/0x10
[Thu Feb 10 12:59:06 2022]  __do_softirq+0x1d4/0x67b
[Thu Feb 10 12:59:06 2022]  __irq_exit_rcu+0x100/0x150
[Thu Feb 10 12:59:06 2022]  irq_exit_rcu+0xe/0x30
[Thu Feb 10 12:59:06 2022]  sysvec_hyperv_stimer0+0x9d/0xc0
...
[Thu Feb 10 12:59:07 2022] Freed by task 58179:
[Thu Feb 10 12:59:07 2022]  kasan_save_stack+0x26/0x50
[Thu Feb 10 12:59:07 2022]  kasan_set_track+0x25/0x30
[Thu Feb 10 12:59:07 2022]  kasan_set_free_info+0x24/0x40
[Thu Feb 10 12:59:07 2022]  ____kasan_slab_free+0x137/0x170
[Thu Feb 10 12:59:07 2022]  __kasan_slab_free+0x12/0x20
[Thu Feb 10 12:59:07 2022]  slab_free_freelist_hook+0xb3/0x1d0
[Thu Feb 10 12:59:07 2022]  kfree+0xcd/0x520
[Thu Feb 10 12:59:07 2022]  cifs_smb3_do_mount+0x149/0xbe0 [cifs]
[Thu Feb 10 12:59:07 2022]  smb3_get_tree+0x1a0/0x2e0 [cifs]
[Thu Feb 10 12:59:07 2022]  vfs_get_tree+0x52/0x140
[Thu Feb 10 12:59:07 2022]  path_mount+0x635/0x10c0
[Thu Feb 10 12:59:07 2022]  __x64_sys_mount+0x1bf/0x210
[Thu Feb 10 12:59:07 2022]  do_syscall_64+0x5c/0xc0
[Thu Feb 10 12:59:07 2022]  entry_SYSCALL_64_after_hwframe+0x44/0xae

[Thu Feb 10 12:59:07 2022] Last potentially related work creation:
[Thu Feb 10 12:59:07 2022]  kasan_save_stack+0x26/0x50
[Thu Feb 10 12:59:07 2022]  __kasan_record_aux_stack+0xb6/0xc0
[Thu Feb 10 12:59:07 2022]  kasan_record_aux_stack_noalloc+0xb/0x10
[Thu Feb 10 12:59:07 2022]  call_rcu+0x76/0x3c0
[Thu Feb 10 12:59:07 2022]  cifs_umount+0xce/0xe0 [cifs]
[Thu Feb 10 12:59:07 2022]  cifs_kill_sb+0xc8/0xe0 [cifs]
[Thu Feb 10 12:59:07 2022]  deactivate_locked_super+0x5d/0xd0
[Thu Feb 10 12:59:07 2022]  cifs_smb3_do_mount+0xab9/0xbe0 [cifs]
[Thu Feb 10 12:59:07 2022]  smb3_get_tree+0x1a0/0x2e0 [cifs]
[Thu Feb 10 12:59:07 2022]  vfs_get_tree+0x52/0x140
[Thu Feb 10 12:59:07 2022]  path_mount+0x635/0x10c0
[Thu Feb 10 12:59:07 2022]  __x64_sys_mount+0x1bf/0x210
[Thu Feb 10 12:59:07 2022]  do_syscall_64+0x5c/0xc0
[Thu Feb 10 12:59:07 2022]  entry_SYSCALL_64_after_hwframe+0x44/0xae

Reported-by: Shyam Prasad N <[email protected]>
Reviewed-by: Shyam Prasad N <[email protected]>
Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Jun 28, 2022
[ Upstream commit 841aee4 ]

Put NVMe/TCP sockets in their own class to avoid some lockdep warnings.
Sockets created by nvme-tcp are not exposed to user-space, and will not
trigger certain code paths that the general socket API exposes.

Lockdep complains about a circular dependency between the socket and
filesystem locks, because setsockopt can trigger a page fault with a
socket lock held, but nvme-tcp sends requests on the socket while file
system locks are held.

  ======================================================
  WARNING: possible circular locking dependency detected
  5.15.0-rc3 #1 Not tainted
  ------------------------------------------------------
  fio/1496 is trying to acquire lock:
  (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendpage+0x23/0x80

  but task is already holding lock:
  (&xfs_dir_ilock_class/5){+.+.}-{3:3}, at: xfs_ilock+0xcf/0x290 [xfs]

  which lock already depends on the new lock.

  other info that might help us debug this:

  chain exists of:
   sk_lock-AF_INET --> sb_internal --> &xfs_dir_ilock_class/5

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&xfs_dir_ilock_class/5);
                                lock(sb_internal);
                                lock(&xfs_dir_ilock_class/5);
   lock(sk_lock-AF_INET);

  *** DEADLOCK ***

  6 locks held by fio/1496:
   #0: (sb_writers#13){.+.+}-{0:0}, at: path_openat+0x9fc/0xa20
   #1: (&inode->i_sb->s_type->i_mutex_dir_key){++++}-{3:3}, at: path_openat+0x296/0xa20
   #2: (sb_internal){.+.+}-{0:0}, at: xfs_trans_alloc_icreate+0x41/0xd0 [xfs]
   #3: (&xfs_dir_ilock_class/5){+.+.}-{3:3}, at: xfs_ilock+0xcf/0x290 [xfs]
   #4: (hctx->srcu){....}-{0:0}, at: hctx_lock+0x51/0xd0
   #5: (&queue->send_mutex){+.+.}-{3:3}, at: nvme_tcp_queue_rq+0x33e/0x380 [nvme_tcp]

This annotation lets lockdep analyze nvme-tcp controlled sockets
independently of what the user-space sockets API does.

Link: https://lore.kernel.org/linux-nvme/CAHj4cs9MDYLJ+q+2_GXUK9HxFizv2pxUryUR0toX974M040z7g@mail.gmail.com/

Signed-off-by: Chris Leech <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Jun 28, 2022
commit a80ced6 upstream.

As guest_irq is coming from KVM_IRQFD API call, it may trigger
crash in svm_update_pi_irte() due to out-of-bounds:

crash> bt
PID: 22218  TASK: ffff951a6ad74980  CPU: 73  COMMAND: "vcpu8"
 #0 [ffffb1ba6707fa40] machine_kexec at ffffffff8565b397
 #1 [ffffb1ba6707fa90] __crash_kexec at ffffffff85788a6d
 #2 [ffffb1ba6707fb58] crash_kexec at ffffffff8578995d
 #3 [ffffb1ba6707fb70] oops_end at ffffffff85623c0d
 #4 [ffffb1ba6707fb90] no_context at ffffffff856692c9
 #5 [ffffb1ba6707fbf8] exc_page_fault at ffffffff85f95b51
 #6 [ffffb1ba6707fc50] asm_exc_page_fault at ffffffff86000ace
    [exception RIP: svm_update_pi_irte+227]
    RIP: ffffffffc0761b53  RSP: ffffb1ba6707fd08  RFLAGS: 00010086
    RAX: ffffb1ba6707fd78  RBX: ffffb1ba66d91000  RCX: 0000000000000001
    RDX: 00003c803f63f1c0  RSI: 000000000000019a  RDI: ffffb1ba66db2ab8
    RBP: 000000000000019a   R8: 0000000000000040   R9: ffff94ca41b82200
    R10: ffffffffffffffcf  R11: 0000000000000001  R12: 0000000000000001
    R13: 0000000000000001  R14: ffffffffffffffcf  R15: 000000000000005f
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffb1ba6707fdb8] kvm_irq_routing_update at ffffffffc09f19a1 [kvm]
 #8 [ffffb1ba6707fde0] kvm_set_irq_routing at ffffffffc09f2133 [kvm]
 #9 [ffffb1ba6707fe18] kvm_vm_ioctl at ffffffffc09ef544 [kvm]
    RIP: 00007f143c36488b  RSP: 00007f143a4e04b8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007f05780041d0  RCX: 00007f143c36488b
    RDX: 00007f05780041d0  RSI: 000000004008ae6a  RDI: 0000000000000020
    RBP: 00000000000004e8   R8: 0000000000000008   R9: 00007f05780041e0
    R10: 00007f0578004560  R11: 0000000000000246  R12: 00000000000004e0
    R13: 000000000000001a  R14: 00007f1424001c60  R15: 00007f0578003bc0
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

Vmx have been fix this in commit 3a8b067 (KVM: VMX: Do not BUG() on
out-of-bounds guest IRQ), so we can just copy source from that to fix
this.

Co-developed-by: Yi Liu <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Signed-off-by: Yi Wang <[email protected]>
Message-Id: <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Jun 28, 2022
[ Upstream commit 447c799 ]

Noticed the below warning while running a pytorch workload on vega10
GPUs. Change to trylock to avoid conflicts with already held reservation
locks.

[  +0.000003] WARNING: possible recursive locking detected
[  +0.000003] 5.13.0-kfd-rajneesh #1030 Not tainted
[  +0.000004] --------------------------------------------
[  +0.000002] python/4822 is trying to acquire lock:
[  +0.000004] ffff932cd9a259f8 (reservation_ww_class_mutex){+.+.}-{3:3},
at: amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000203]
              but task is already holding lock:
[  +0.000003] ffff932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3},
at: ttm_eu_reserve_buffers+0x270/0x470 [ttm]
[  +0.000017]
              other info that might help us debug this:
[  +0.000002]  Possible unsafe locking scenario:

[  +0.000003]        CPU0
[  +0.000002]        ----
[  +0.000002]   lock(reservation_ww_class_mutex);
[  +0.000004]   lock(reservation_ww_class_mutex);
[  +0.000003]
               *** DEADLOCK ***

[  +0.000002]  May be due to missing lock nesting notation

[  +0.000003] 7 locks held by python/4822:
[  +0.000003]  #0: ffff932c4ac028d0 (&process->mutex){+.+.}-{3:3}, at:
kfd_ioctl_map_memory_to_gpu+0x10b/0x320 [amdgpu]
[  +0.000232]  #1: ffff932c55e830a8 (&info->lock#2){+.+.}-{3:3}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x64/0xf60 [amdgpu]
[  +0.000241]  #2: ffff932cc45b5e68 (&(*mem)->lock){+.+.}-{3:3}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0xdf/0xf60 [amdgpu]
[  +0.000236]  #3: ffffb2b35606fd28
(reservation_ww_class_acquire){+.+.}-{0:0}, at:
amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x232/0xf60 [amdgpu]
[  +0.000235]  #4: ffff932cbb7181f8
(reservation_ww_class_mutex){+.+.}-{3:3}, at:
ttm_eu_reserve_buffers+0x270/0x470 [ttm]
[  +0.000015]  #5: ffffffffc045f700 (*(sspp++)){....}-{0:0}, at:
drm_dev_enter+0x5/0xa0 [drm]
[  +0.000038]  #6: ffff932c52da7078 (&vm->eviction_lock){+.+.}-{3:3},
at: amdgpu_vm_bo_update_mapping+0xd5/0x4f0 [amdgpu]
[  +0.000195]
              stack backtrace:
[  +0.000003] CPU: 11 PID: 4822 Comm: python Not tainted
5.13.0-kfd-rajneesh #1030
[  +0.000005] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F02
08/29/2018
[  +0.000003] Call Trace:
[  +0.000003]  dump_stack+0x6d/0x89
[  +0.000010]  __lock_acquire+0xb93/0x1a90
[  +0.000009]  lock_acquire+0x25d/0x2d0
[  +0.000005]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000184]  ? lock_is_held_type+0xa2/0x110
[  +0.000006]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000184]  __ww_mutex_lock.constprop.17+0xca/0x1060
[  +0.000007]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000183]  ? lock_release+0x13f/0x270
[  +0.000005]  ? lock_is_held_type+0xa2/0x110
[  +0.000006]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000183]  amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
[  +0.000185]  ttm_bo_release+0x4c6/0x580 [ttm]
[  +0.000010]  amdgpu_bo_unref+0x1a/0x30 [amdgpu]
[  +0.000183]  amdgpu_vm_free_table+0x76/0xa0 [amdgpu]
[  +0.000189]  amdgpu_vm_free_pts+0xb8/0xf0 [amdgpu]
[  +0.000189]  amdgpu_vm_update_ptes+0x411/0x770 [amdgpu]
[  +0.000191]  amdgpu_vm_bo_update_mapping+0x324/0x4f0 [amdgpu]
[  +0.000191]  amdgpu_vm_bo_update+0x251/0x610 [amdgpu]
[  +0.000191]  update_gpuvm_pte+0xcc/0x290 [amdgpu]
[  +0.000229]  ? amdgpu_vm_bo_map+0xd7/0x130 [amdgpu]
[  +0.000190]  amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x912/0xf60
[amdgpu]
[  +0.000234]  kfd_ioctl_map_memory_to_gpu+0x182/0x320 [amdgpu]
[  +0.000218]  kfd_ioctl+0x2b9/0x600 [amdgpu]
[  +0.000216]  ? kfd_ioctl_unmap_memory_from_gpu+0x270/0x270 [amdgpu]
[  +0.000216]  ? lock_release+0x13f/0x270
[  +0.000006]  ? __fget_files+0x107/0x1e0
[  +0.000007]  __x64_sys_ioctl+0x8b/0xd0
[  +0.000007]  do_syscall_64+0x36/0x70
[  +0.000004]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  +0.000007] RIP: 0033:0x7fbff90a7317
[  +0.000004] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00
48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
[  +0.000005] RSP: 002b:00007fbe301fe648 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  +0.000006] RAX: ffffffffffffffda RBX: 00007fbcc402d820 RCX:
00007fbff90a7317
[  +0.000003] RDX: 00007fbe301fe690 RSI: 00000000c0184b18 RDI:
0000000000000004
[  +0.000003] RBP: 00007fbe301fe690 R08: 0000000000000000 R09:
00007fbcc402d880
[  +0.000003] R10: 0000000002001000 R11: 0000000000000246 R12:
00000000c0184b18
[  +0.000003] R13: 0000000000000004 R14: 00007fbf689593a0 R15:
00007fbcc402d820

Cc: Christian König <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Alex Deucher <[email protected]>

Reviewed-by: Christian König <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Rajneesh Bhardwaj <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Jun 28, 2022
[ Upstream commit af68656 ]

While handling PCI errors (AER flow) driver tries to
disable NAPI [napi_disable()] after NAPI is deleted
[__netif_napi_del()] which causes unexpected system
hang/crash.

System message log shows the following:
=======================================
[ 3222.537510] EEH: Detected PCI bus error on PHB#384-PE#800000 [ 3222.537511] EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures.
[ 3222.537512] EEH: Notify device drivers to shutdown [ 3222.537513] EEH: Beginning: 'error_detected(IO frozen)'
[ 3222.537514] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
bnx2x->error_detected(IO frozen)
[ 3222.537516] bnx2x: [bnx2x_io_error_detected:14236(eth14)]IO error detected [ 3222.537650] EEH: PE#800000 (PCI 0384:80:00.0): bnx2x driver reports:
'need reset'
[ 3222.537651] EEH: PE#800000 (PCI 0384:80:00.1): Invoking
bnx2x->error_detected(IO frozen)
[ 3222.537651] bnx2x: [bnx2x_io_error_detected:14236(eth13)]IO error detected [ 3222.537729] EEH: PE#800000 (PCI 0384:80:00.1): bnx2x driver reports:
'need reset'
[ 3222.537729] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
[ 3222.537890] EEH: Collect temporary log [ 3222.583481] EEH: of node=0384:80:00.0 [ 3222.583519] EEH: PCI device/vendor: 168e14e4 [ 3222.583557] EEH: PCI cmd/status register: 00100140 [ 3222.583557] EEH: PCI-E capabilities and status follow:
[ 3222.583744] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.583892] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.583893] EEH: PCI-E 20: 00000000 [ 3222.583893] EEH: PCI-E AER capability register set follows:
[ 3222.584079] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.584230] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.584378] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.584416] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.584416] EEH: of node=0384:80:00.1 [ 3222.584454] EEH: PCI device/vendor: 168e14e4 [ 3222.584491] EEH: PCI cmd/status register: 00100140 [ 3222.584492] EEH: PCI-E capabilities and status follow:
[ 3222.584677] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.584825] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.584826] EEH: PCI-E 20: 00000000 [ 3222.584826] EEH: PCI-E AER capability register set follows:
[ 3222.585011] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.585160] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.585309] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.585347] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.586872] RTAS: event: 5, Type: Platform Error (224), Severity: 2 [ 3222.586873] EEH: Reset without hotplug activity [ 3224.762767] EEH: Beginning: 'slot_reset'
[ 3224.762770] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
bnx2x->slot_reset()
[ 3224.762771] bnx2x: [bnx2x_io_slot_reset:14271(eth14)]IO slot reset initializing...
[ 3224.762887] bnx2x 0384:80:00.0: enabling device (0140 -> 0142) [ 3224.768157] bnx2x: [bnx2x_io_slot_reset:14287(eth14)]IO slot reset
--> driver unload

Uninterruptible tasks
=====================
crash> ps | grep UN
     213      2  11  c000000004c89e00  UN   0.0       0      0  [eehd]
     215      2   0  c000000004c80000  UN   0.0       0      0
[kworker/0:2]
    2196      1  28  c000000004504f00  UN   0.1   15936  11136  wickedd
    4287      1   9  c00000020d076800  UN   0.0    4032   3008  agetty
    4289      1  20  c00000020d056680  UN   0.0    7232   3840  agetty
   32423      2  26  c00000020038c580  UN   0.0       0      0
[kworker/26:3]
   32871   4241  27  c0000002609ddd00  UN   0.1   18624  11648  sshd
   32920  10130  16  c00000027284a100  UN   0.1   48512  12608  sendmail
   33092  32987   0  c000000205218b00  UN   0.1   48512  12608  sendmail
   33154   4567  16  c000000260e51780  UN   0.1   48832  12864  pickup
   33209   4241  36  c000000270cb6500  UN   0.1   18624  11712  sshd
   33473  33283   0  c000000205211480  UN   0.1   48512  12672  sendmail
   33531   4241  37  c00000023c902780  UN   0.1   18624  11648  sshd

EEH handler hung while bnx2x sleeping and holding RTNL lock
===========================================================
crash> bt 213
PID: 213    TASK: c000000004c89e00  CPU: 11  COMMAND: "eehd"
  #0 [c000000004d477e0] __schedule at c000000000c70808
  #1 [c000000004d478b0] schedule at c000000000c70ee0
  #2 [c000000004d478e0] schedule_timeout at c000000000c76dec
  #3 [c000000004d479c0] msleep at c0000000002120cc
  #4 [c000000004d479f0] napi_disable at c000000000a06448
                                        ^^^^^^^^^^^^^^^^
  #5 [c000000004d47a30] bnx2x_netif_stop at c0080000018dba94 [bnx2x]
  #6 [c000000004d47a60] bnx2x_io_slot_reset at c0080000018a551c [bnx2x]
  #7 [c000000004d47b20] eeh_report_reset at c00000000004c9bc
  #8 [c000000004d47b90] eeh_pe_report at c00000000004d1a8
  #9 [c000000004d47c40] eeh_handle_normal_event at c00000000004da64

And the sleeping source code
============================
crash> dis -ls c000000000a06448
FILE: ../net/core/dev.c
LINE: 6702

   6697  {
   6698          might_sleep();
   6699          set_bit(NAPI_STATE_DISABLE, &n->state);
   6700
   6701          while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
* 6702                  msleep(1);
   6703          while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
   6704                  msleep(1);
   6705
   6706          hrtimer_cancel(&n->timer);
   6707
   6708          clear_bit(NAPI_STATE_DISABLE, &n->state);
   6709  }

EEH calls into bnx2x twice based on the system log above, first through
bnx2x_io_error_detected() and then bnx2x_io_slot_reset(), and executes
the following call chains:

bnx2x_io_error_detected()
  +-> bnx2x_eeh_nic_unload()
       +-> bnx2x_del_all_napi()
            +-> __netif_napi_del()

bnx2x_io_slot_reset()
  +-> bnx2x_netif_stop()
       +-> bnx2x_napi_disable()
            +->napi_disable()

Fix this by correcting the sequence of NAPI APIs usage,
that is delete the NAPI after disabling it.

Fixes: 7fa6f34 ("bnx2x: AER revised")
Reported-by: David Christensen <[email protected]>
Tested-by: David Christensen <[email protected]>
Signed-off-by: Manish Chopra <[email protected]>
Signed-off-by: Ariel Elior <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Aug 26, 2022
[ Upstream commit 1b0e814 ]

Commit 3e3b5df ("NFC: reorder the logic in nfc_{un,}register_device")
assumes the device_is_registered() in function nfc_dev_up() will help
to check when the rfkill is unregistered. However, this check only
take effect when device_del(&dev->dev) is done in nfc_unregister_device().
Hence, the rfkill object is still possible be dereferenced.

The crash trace in latest kernel (5.18-rc2):

[   68.760105] ==================================================================
[   68.760330] BUG: KASAN: use-after-free in __lock_acquire+0x3ec1/0x6750
[   68.760756] Read of size 8 at addr ffff888009c93018 by task fuzz/313
[   68.760756]
[   68.760756] CPU: 0 PID: 313 Comm: fuzz Not tainted 5.18.0-rc2 #4
[   68.760756] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[   68.760756] Call Trace:
[   68.760756]  <TASK>
[   68.760756]  dump_stack_lvl+0x57/0x7d
[   68.760756]  print_report.cold+0x5e/0x5db
[   68.760756]  ? __lock_acquire+0x3ec1/0x6750
[   68.760756]  kasan_report+0xbe/0x1c0
[   68.760756]  ? __lock_acquire+0x3ec1/0x6750
[   68.760756]  __lock_acquire+0x3ec1/0x6750
[   68.760756]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[   68.760756]  ? register_lock_class+0x18d0/0x18d0
[   68.760756]  lock_acquire+0x1ac/0x4f0
[   68.760756]  ? rfkill_blocked+0xe/0x60
[   68.760756]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[   68.760756]  ? mutex_lock_io_nested+0x12c0/0x12c0
[   68.760756]  ? nla_get_range_signed+0x540/0x540
[   68.760756]  ? _raw_spin_lock_irqsave+0x4e/0x50
[   68.760756]  _raw_spin_lock_irqsave+0x39/0x50
[   68.760756]  ? rfkill_blocked+0xe/0x60
[   68.760756]  rfkill_blocked+0xe/0x60
[   68.760756]  nfc_dev_up+0x84/0x260
[   68.760756]  nfc_genl_dev_up+0x90/0xe0
[   68.760756]  genl_family_rcv_msg_doit+0x1f4/0x2f0
[   68.760756]  ? genl_family_rcv_msg_attrs_parse.constprop.0+0x230/0x230
[   68.760756]  ? security_capable+0x51/0x90
[   68.760756]  genl_rcv_msg+0x280/0x500
[   68.760756]  ? genl_get_cmd+0x3c0/0x3c0
[   68.760756]  ? lock_acquire+0x1ac/0x4f0
[   68.760756]  ? nfc_genl_dev_down+0xe0/0xe0
[   68.760756]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[   68.760756]  netlink_rcv_skb+0x11b/0x340
[   68.760756]  ? genl_get_cmd+0x3c0/0x3c0
[   68.760756]  ? netlink_ack+0x9c0/0x9c0
[   68.760756]  ? netlink_deliver_tap+0x136/0xb00
[   68.760756]  genl_rcv+0x1f/0x30
[   68.760756]  netlink_unicast+0x430/0x710
[   68.760756]  ? memset+0x20/0x40
[   68.760756]  ? netlink_attachskb+0x740/0x740
[   68.760756]  ? __build_skb_around+0x1f4/0x2a0
[   68.760756]  netlink_sendmsg+0x75d/0xc00
[   68.760756]  ? netlink_unicast+0x710/0x710
[   68.760756]  ? netlink_unicast+0x710/0x710
[   68.760756]  sock_sendmsg+0xdf/0x110
[   68.760756]  __sys_sendto+0x19e/0x270
[   68.760756]  ? __ia32_sys_getpeername+0xa0/0xa0
[   68.760756]  ? fd_install+0x178/0x4c0
[   68.760756]  ? fd_install+0x195/0x4c0
[   68.760756]  ? kernel_fpu_begin_mask+0x1c0/0x1c0
[   68.760756]  __x64_sys_sendto+0xd8/0x1b0
[   68.760756]  ? lockdep_hardirqs_on+0xbf/0x130
[   68.760756]  ? syscall_enter_from_user_mode+0x1d/0x50
[   68.760756]  do_syscall_64+0x3b/0x90
[   68.760756]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   68.760756] RIP: 0033:0x7f67fb50e6b3
...
[   68.760756] RSP: 002b:00007f67fa91fe90 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
[   68.760756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f67fb50e6b3
[   68.760756] RDX: 000000000000001c RSI: 0000559354603090 RDI: 0000000000000003
[   68.760756] RBP: 00007f67fa91ff00 R08: 00007f67fa91fedc R09: 000000000000000c
[   68.760756] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffe824d496e
[   68.760756] R13: 00007ffe824d496f R14: 00007f67fa120000 R15: 0000000000000003

[   68.760756]  </TASK>
[   68.760756]
[   68.760756] Allocated by task 279:
[   68.760756]  kasan_save_stack+0x1e/0x40
[   68.760756]  __kasan_kmalloc+0x81/0xa0
[   68.760756]  rfkill_alloc+0x7f/0x280
[   68.760756]  nfc_register_device+0xa3/0x1a0
[   68.760756]  nci_register_device+0x77a/0xad0
[   68.760756]  nfcmrvl_nci_register_dev+0x20b/0x2c0
[   68.760756]  nfcmrvl_nci_uart_open+0xf2/0x1dd
[   68.760756]  nci_uart_tty_ioctl+0x2c3/0x4a0
[   68.760756]  tty_ioctl+0x764/0x1310
[   68.760756]  __x64_sys_ioctl+0x122/0x190
[   68.760756]  do_syscall_64+0x3b/0x90
[   68.760756]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   68.760756]
[   68.760756] Freed by task 314:
[   68.760756]  kasan_save_stack+0x1e/0x40
[   68.760756]  kasan_set_track+0x21/0x30
[   68.760756]  kasan_set_free_info+0x20/0x30
[   68.760756]  __kasan_slab_free+0x108/0x170
[   68.760756]  kfree+0xb0/0x330
[   68.760756]  device_release+0x96/0x200
[   68.760756]  kobject_put+0xf9/0x1d0
[   68.760756]  nfc_unregister_device+0x77/0x190
[   68.760756]  nfcmrvl_nci_unregister_dev+0x88/0xd0
[   68.760756]  nci_uart_tty_close+0xdf/0x180
[   68.760756]  tty_ldisc_kill+0x73/0x110
[   68.760756]  tty_ldisc_hangup+0x281/0x5b0
[   68.760756]  __tty_hangup.part.0+0x431/0x890
[   68.760756]  tty_release+0x3a8/0xc80
[   68.760756]  __fput+0x1f0/0x8c0
[   68.760756]  task_work_run+0xc9/0x170
[   68.760756]  exit_to_user_mode_prepare+0x194/0x1a0
[   68.760756]  syscall_exit_to_user_mode+0x19/0x50
[   68.760756]  do_syscall_64+0x48/0x90
[   68.760756]  entry_SYSCALL_64_after_hwframe+0x44/0xae

This patch just add the null out of dev->rfkill to make sure such
dereference cannot happen. This is safe since the device_lock() already
protect the check/write from data race.

Fixes: 3e3b5df ("NFC: reorder the logic in nfc_{un,}register_device")
Signed-off-by: Lin Ma <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Nov 5, 2022
[ Upstream commit 84a5358 ]

The SRv6 layer allows defining HMAC data that can later be used to sign IPv6
Segment Routing Headers. This configuration is realised via netlink through
four attributes: SEG6_ATTR_HMACKEYID, SEG6_ATTR_SECRET, SEG6_ATTR_SECRETLEN and
SEG6_ATTR_ALGID. Because the SECRETLEN attribute is decoupled from the actual
length of the SECRET attribute, it is possible to provide invalid combinations
(e.g., secret = "", secretlen = 64). This case is not checked in the code and
with an appropriately crafted netlink message, an out-of-bounds read of up
to 64 bytes (max secret length) can occur past the skb end pointer and into
skb_shared_info:

Breakpoint 1, seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208
208		memcpy(hinfo->secret, secret, slen);
(gdb) bt
 #0  seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208
 #1  0xffffffff81e012e9 in genl_family_rcv_msg_doit (skb=skb@entry=0xffff88800b1f9f00, nlh=nlh@entry=0xffff88800b1b7600,
    extack=extack@entry=0xffffc90000ba7af0, ops=ops@entry=0xffffc90000ba7a80, hdrlen=4, net=0xffffffff84237580 <init_net>, family=<optimized out>,
    family=<optimized out>) at net/netlink/genetlink.c:731
 #2  0xffffffff81e01435 in genl_family_rcv_msg (extack=0xffffc90000ba7af0, nlh=0xffff88800b1b7600, skb=0xffff88800b1f9f00,
    family=0xffffffff82fef6c0 <seg6_genl_family>) at net/netlink/genetlink.c:775
 #3  genl_rcv_msg (skb=0xffff88800b1f9f00, nlh=0xffff88800b1b7600, extack=0xffffc90000ba7af0) at net/netlink/genetlink.c:792
 #4  0xffffffff81dfffc3 in netlink_rcv_skb (skb=skb@entry=0xffff88800b1f9f00, cb=cb@entry=0xffffffff81e01350 <genl_rcv_msg>)
    at net/netlink/af_netlink.c:2501
 #5  0xffffffff81e00919 in genl_rcv (skb=0xffff88800b1f9f00) at net/netlink/genetlink.c:803
 #6  0xffffffff81dff6ae in netlink_unicast_kernel (ssk=0xffff888010eec800, skb=0xffff88800b1f9f00, sk=0xffff888004aed000)
    at net/netlink/af_netlink.c:1319
 #7  netlink_unicast (ssk=ssk@entry=0xffff888010eec800, skb=skb@entry=0xffff88800b1f9f00, portid=portid@entry=0, nonblock=<optimized out>)
    at net/netlink/af_netlink.c:1345
 #8  0xffffffff81dff9a4 in netlink_sendmsg (sock=<optimized out>, msg=0xffffc90000ba7e48, len=<optimized out>) at net/netlink/af_netlink.c:1921
...
(gdb) p/x ((struct sk_buff *)0xffff88800b1f9f00)->head + ((struct sk_buff *)0xffff88800b1f9f00)->end
$1 = 0xffff88800b1b76c0
(gdb) p/x secret
$2 = 0xffff88800b1b76c0
(gdb) p slen
$3 = 64 '@'

The OOB data can then be read back from userspace by dumping HMAC state. This
commit fixes this by ensuring SECRETLEN cannot exceed the actual length of
SECRET.

Reported-by: Lucas Leong <[email protected]>
Tested: verified that EINVAL is correctly returned when secretlen > len(secret)
Fixes: 4f4853d ("ipv6: sr: implement API to control SR HMAC structure")
Signed-off-by: David Lebrun <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
fhgwright pushed a commit to fhgwright/bb-linux that referenced this pull request Dec 4, 2022
…tion

commit 07fd5b6 upstream.

Each cset (css_set) is pinned by its tasks. When we're moving tasks around
across csets for a migration, we need to hold the source and destination
csets to ensure that they don't go away while we're moving tasks about. This
is done by linking cset->mg_preload_node on either the
mgctx->preloaded_src_csets or mgctx->preloaded_dst_csets list. Using the
same cset->mg_preload_node for both the src and dst lists was deemed okay as
a cset can't be both the source and destination at the same time.

Unfortunately, this overloading becomes problematic when multiple tasks are
involved in a migration and some of them are identity noop migrations while
others are actually moving across cgroups. For example, this can happen with
the following sequence on cgroup1:

 beagleboard#1> mkdir -p /sys/fs/cgroup/misc/a/b
 beagleboard#2> echo $$ > /sys/fs/cgroup/misc/a/cgroup.procs
 beagleboard#3> RUN_A_COMMAND_WHICH_CREATES_MULTIPLE_THREADS &
 beagleboard#4> PID=$!
 beagleboard#5> echo $PID > /sys/fs/cgroup/misc/a/b/tasks
 beagleboard#6> echo $PID > /sys/fs/cgroup/misc/a/cgroup.procs

the process including the group leader back into a. In this final migration,
non-leader threads would be doing identity migration while the group leader
is doing an actual one.

After beagleboard#3, let's say the whole process was in cset A, and that after beagleboard#4, the
leader moves to cset B. Then, during beagleboard#6, the following happens:

 1. cgroup_migrate_add_src() is called on B for the leader.

 2. cgroup_migrate_add_src() is called on A for the other threads.

 3. cgroup_migrate_prepare_dst() is called. It scans the src list.

 4. It notices that B wants to migrate to A, so it tries to A to the dst
    list but realizes that its ->mg_preload_node is already busy.

 5. and then it notices A wants to migrate to A as it's an identity
    migration, it culls it by list_del_init()'ing its ->mg_preload_node and
    putting references accordingly.

 6. The rest of migration takes place with B on the src list but nothing on
    the dst list.

This means that A isn't held while migration is in progress. If all tasks
leave A before the migration finishes and the incoming task pins it, the
cset will be destroyed leading to use-after-free.

This is caused by overloading cset->mg_preload_node for both src and dst
preload lists. We wanted to exclude the cset from the src list but ended up
inadvertently excluding it from the dst list too.

This patch fixes the issue by separating out cset->mg_preload_node into
->mg_src_preload_node and ->mg_dst_preload_node, so that the src and dst
preloadings don't interfere with each other.

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Mukesh Ojha <[email protected]>
Reported-by: shisiyuan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: https://www.spinics.net/lists/cgroups/msg33313.html
Fixes: f817de9 ("cgroup: prepare migration path for unified hierarchy")
Cc: [email protected] # v3.16+
Signed-off-by: Greg Kroah-Hartman <[email protected]>
fhgwright pushed a commit to fhgwright/bb-linux that referenced this pull request Dec 4, 2022
[ Upstream commit dc4d316 ]

[BUG]
If we have a btrfs image with dirty log, along with an unsupported RO
compatible flag:

log_root		30474240
...
compat_flags		0x0
compat_ro_flags		0x40000003
			( FREE_SPACE_TREE |
			  FREE_SPACE_TREE_VALID |
			  unknown flag: 0x40000000 )

Then even if we can only mount it RO, we will still cause metadata
update for log replay:

  BTRFS info (device dm-1): flagging fs with big metadata feature
  BTRFS info (device dm-1): using free space tree
  BTRFS info (device dm-1): has skinny extents
  BTRFS info (device dm-1): start tree-log replay

This is definitely against RO compact flag requirement.

[CAUSE]
RO compact flag only forces us to do RO mount, but we will still do log
replay for plain RO mount.

Thus this will result us to do log replay and update metadata.

This can be very problematic for new RO compat flag, for example older
kernel can not understand v2 cache, and if we allow metadata update on
RO mount and invalidate/corrupt v2 cache.

[FIX]
Just reject the mount unless rescue=nologreplay is provided:

  BTRFS error (device dm-1): cannot replay dirty log with unsupport optional features (0x40000000), try rescue=nologreplay instead

We don't want to set rescue=nologreply directly, as this would make the
end user to read the old data, and cause confusion.

Since the such case is really rare, we're mostly fine to just reject the
mount with an error message, which also includes the proper workaround.

CC: [email protected] beagleboard#4.9+
Signed-off-by: Qu Wenruo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Dec 8, 2022
commit 29a5b8a upstream.

When walking through an inode extents, the ext4_ext_binsearch_idx() function
assumes that the extent header has been previously validated.  However, there
are no checks that verify that the number of entries (eh->eh_entries) is
non-zero when depth is > 0.  And this will lead to problems because the
EXT_FIRST_INDEX() and EXT_LAST_INDEX() will return garbage and result in this:

[  135.245946] ------------[ cut here ]------------
[  135.247579] kernel BUG at fs/ext4/extents.c:2258!
[  135.249045] invalid opcode: 0000 [#1] PREEMPT SMP
[  135.250320] CPU: 2 PID: 238 Comm: tmp118 Not tainted 5.19.0-rc8+ #4
[  135.252067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
[  135.255065] RIP: 0010:ext4_ext_map_blocks+0xc20/0xcb0
[  135.256475] Code:
[  135.261433] RSP: 0018:ffffc900005939f8 EFLAGS: 00010246
[  135.262847] RAX: 0000000000000024 RBX: ffffc90000593b70 RCX: 0000000000000023
[  135.264765] RDX: ffff8880038e5f10 RSI: 0000000000000003 RDI: ffff8880046e922c
[  135.266670] RBP: ffff8880046e9348 R08: 0000000000000001 R09: ffff888002ca580c
[  135.268576] R10: 0000000000002602 R11: 0000000000000000 R12: 0000000000000024
[  135.270477] R13: 0000000000000000 R14: 0000000000000024 R15: 0000000000000000
[  135.272394] FS:  00007fdabdc56740(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
[  135.274510] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  135.276075] CR2: 00007ffc26bd4f00 CR3: 0000000006261004 CR4: 0000000000170ea0
[  135.277952] Call Trace:
[  135.278635]  <TASK>
[  135.279247]  ? preempt_count_add+0x6d/0xa0
[  135.280358]  ? percpu_counter_add_batch+0x55/0xb0
[  135.281612]  ? _raw_read_unlock+0x18/0x30
[  135.282704]  ext4_map_blocks+0x294/0x5a0
[  135.283745]  ? xa_load+0x6f/0xa0
[  135.284562]  ext4_mpage_readpages+0x3d6/0x770
[  135.285646]  read_pages+0x67/0x1d0
[  135.286492]  ? folio_add_lru+0x51/0x80
[  135.287441]  page_cache_ra_unbounded+0x124/0x170
[  135.288510]  filemap_get_pages+0x23d/0x5a0
[  135.289457]  ? path_openat+0xa72/0xdd0
[  135.290332]  filemap_read+0xbf/0x300
[  135.291158]  ? _raw_spin_lock_irqsave+0x17/0x40
[  135.292192]  new_sync_read+0x103/0x170
[  135.293014]  vfs_read+0x15d/0x180
[  135.293745]  ksys_read+0xa1/0xe0
[  135.294461]  do_syscall_64+0x3c/0x80
[  135.295284]  entry_SYSCALL_64_after_hwframe+0x46/0xb0

This patch simply adds an extra check in __ext4_ext_check(), verifying that
eh_entries is not 0 when eh_depth is > 0.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215941
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216283
Cc: Baokun Li <[email protected]>
Cc: [email protected]
Signed-off-by: Luís Henriques <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Baokun Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Feb 4, 2023
[ Upstream commit 13be2ef ]

As previously noted in commit 66e4f4a ("rtc: cmos: Use
spin_lock_irqsave() in cmos_interrupt()"):

<4>[  254.192378] WARNING: inconsistent lock state
<4>[  254.192384] 5.12.0-rc1-CI-CI_DRM_9834+ #1 Not tainted
<4>[  254.192396] --------------------------------
<4>[  254.192400] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
<4>[  254.192409] rtcwake/5309 [HC0[0]:SC0[0]:HE1:SE1] takes:
<4>[  254.192429] ffffffff8263c5f8 (rtc_lock){?...}-{2:2}, at: cmos_interrupt+0x18/0x100
<4>[  254.192481] {IN-HARDIRQ-W} state was registered at:
<4>[  254.192488]   lock_acquire+0xd1/0x3d0
<4>[  254.192504]   _raw_spin_lock+0x2a/0x40
<4>[  254.192519]   cmos_interrupt+0x18/0x100
<4>[  254.192536]   rtc_handler+0x1f/0xc0
<4>[  254.192553]   acpi_ev_fixed_event_detect+0x109/0x13c
<4>[  254.192574]   acpi_ev_sci_xrupt_handler+0xb/0x28
<4>[  254.192596]   acpi_irq+0x13/0x30
<4>[  254.192620]   __handle_irq_event_percpu+0x43/0x2c0
<4>[  254.192641]   handle_irq_event_percpu+0x2b/0x70
<4>[  254.192661]   handle_irq_event+0x2f/0x50
<4>[  254.192680]   handle_fasteoi_irq+0x9e/0x150
<4>[  254.192693]   __common_interrupt+0x76/0x140
<4>[  254.192715]   common_interrupt+0x96/0xc0
<4>[  254.192732]   asm_common_interrupt+0x1e/0x40
<4>[  254.192750]   _raw_spin_unlock_irqrestore+0x38/0x60
<4>[  254.192767]   resume_irqs+0xba/0xf0
<4>[  254.192786]   dpm_resume_noirq+0x245/0x3d0
<4>[  254.192811]   suspend_devices_and_enter+0x230/0xaa0
<4>[  254.192835]   pm_suspend.cold.8+0x301/0x34a
<4>[  254.192859]   state_store+0x7b/0xe0
<4>[  254.192879]   kernfs_fop_write_iter+0x11d/0x1c0
<4>[  254.192899]   new_sync_write+0x11d/0x1b0
<4>[  254.192916]   vfs_write+0x265/0x390
<4>[  254.192933]   ksys_write+0x5a/0xd0
<4>[  254.192949]   do_syscall_64+0x33/0x80
<4>[  254.192965]   entry_SYSCALL_64_after_hwframe+0x44/0xae
<4>[  254.192986] irq event stamp: 43775
<4>[  254.192994] hardirqs last  enabled at (43775): [<ffffffff81c00c42>] asm_sysvec_apic_timer_interrupt+0x12/0x20
<4>[  254.193023] hardirqs last disabled at (43774): [<ffffffff81aa691a>] sysvec_apic_timer_interrupt+0xa/0xb0
<4>[  254.193049] softirqs last  enabled at (42548): [<ffffffff81e00342>] __do_softirq+0x342/0x48e
<4>[  254.193074] softirqs last disabled at (42543): [<ffffffff810b45fd>] irq_exit_rcu+0xad/0xd0
<4>[  254.193101]
                  other info that might help us debug this:
<4>[  254.193107]  Possible unsafe locking scenario:

<4>[  254.193112]        CPU0
<4>[  254.193117]        ----
<4>[  254.193121]   lock(rtc_lock);
<4>[  254.193137]   <Interrupt>
<4>[  254.193142]     lock(rtc_lock);
<4>[  254.193156]
                   *** DEADLOCK ***

<4>[  254.193161] 6 locks held by rtcwake/5309:
<4>[  254.193174]  #0: ffff888104861430 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x5a/0xd0
<4>[  254.193232]  #1: ffff88810f823288 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xe7/0x1c0
<4>[  254.193282]  #2: ffff888100cef3c0 (kn->active#285
<7>[  254.192706] i915 0000:00:02.0: [drm:intel_modeset_setup_hw_state [i915]] [CRTC:51:pipe A] hw state readout: disabled
<4>[  254.193307] ){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xf0/0x1c0
<4>[  254.193333]  #3: ffffffff82649fa8 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend.cold.8+0xce/0x34a
<4>[  254.193387]  #4: ffffffff827a2108 (acpi_scan_lock){+.+.}-{3:3}, at: acpi_suspend_begin+0x47/0x70
<4>[  254.193433]  #5: ffff8881019ea178 (&dev->mutex){....}-{3:3}, at: device_resume+0x68/0x1e0
<4>[  254.193485]
                  stack backtrace:
<4>[  254.193492] CPU: 1 PID: 5309 Comm: rtcwake Not tainted 5.12.0-rc1-CI-CI_DRM_9834+ #1
<4>[  254.193514] Hardware name: Google Soraka/Soraka, BIOS MrChromebox-4.10 08/25/2019
<4>[  254.193524] Call Trace:
<4>[  254.193536]  dump_stack+0x7f/0xad
<4>[  254.193567]  mark_lock.part.47+0x8ca/0xce0
<4>[  254.193604]  __lock_acquire+0x39b/0x2590
<4>[  254.193626]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
<4>[  254.193660]  lock_acquire+0xd1/0x3d0
<4>[  254.193677]  ? cmos_interrupt+0x18/0x100
<4>[  254.193716]  _raw_spin_lock+0x2a/0x40
<4>[  254.193735]  ? cmos_interrupt+0x18/0x100
<4>[  254.193758]  cmos_interrupt+0x18/0x100
<4>[  254.193785]  cmos_resume+0x2ac/0x2d0
<4>[  254.193813]  ? acpi_pm_set_device_wakeup+0x1f/0x110
<4>[  254.193842]  ? pnp_bus_suspend+0x10/0x10
<4>[  254.193864]  pnp_bus_resume+0x5e/0x90
<4>[  254.193885]  dpm_run_callback+0x5f/0x240
<4>[  254.193914]  device_resume+0xb2/0x1e0
<4>[  254.193942]  ? pm_dev_err+0x25/0x25
<4>[  254.193974]  dpm_resume+0xea/0x3f0
<4>[  254.194005]  dpm_resume_end+0x8/0x10
<4>[  254.194030]  suspend_devices_and_enter+0x29b/0xaa0
<4>[  254.194066]  pm_suspend.cold.8+0x301/0x34a
<4>[  254.194094]  state_store+0x7b/0xe0
<4>[  254.194124]  kernfs_fop_write_iter+0x11d/0x1c0
<4>[  254.194151]  new_sync_write+0x11d/0x1b0
<4>[  254.194183]  vfs_write+0x265/0x390
<4>[  254.194207]  ksys_write+0x5a/0xd0
<4>[  254.194232]  do_syscall_64+0x33/0x80
<4>[  254.194251]  entry_SYSCALL_64_after_hwframe+0x44/0xae
<4>[  254.194274] RIP: 0033:0x7f07d79691e7
<4>[  254.194293] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
<4>[  254.194312] RSP: 002b:00007ffd9cc2c768 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4>[  254.194337] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f07d79691e7
<4>[  254.194352] RDX: 0000000000000004 RSI: 0000556ebfc63590 RDI: 000000000000000b
<4>[  254.194366] RBP: 0000556ebfc63590 R08: 0000000000000000 R09: 0000000000000004
<4>[  254.194379] R10: 0000556ebf0ec2a6 R11: 0000000000000246 R12: 0000000000000004

which breaks S3-resume on fi-kbl-soraka presumably as that's slow enough
to trigger the alarm during the suspend.

Fixes: 6950d04 ("rtc: cmos: Replace spin_lock_irqsave with spin_lock in hard IRQ")
References: 66e4f4a ("rtc: cmos: Use spin_lock_irqsave() in cmos_interrupt()"):
Signed-off-by: Chris Wilson <[email protected]>
Cc: Xiaofei Tan <[email protected]>
Cc: Alexandre Belloni <[email protected]>
Cc: Alessandro Zummo <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Reviewed-by: Ville Syrjälä <[email protected]>
Signed-off-by: Alexandre Belloni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
nmenon pushed a commit to nmenon/nm-beagle-linux that referenced this pull request May 22, 2023
[ Upstream commit 93c660c ]

ASAN reports an use-after-free in btf_dump_name_dups:

ERROR: AddressSanitizer: heap-use-after-free on address 0xffff927006db at pc 0xaaaab5dfb618 bp 0xffffdd89b890 sp 0xffffdd89b928
READ of size 2 at 0xffff927006db thread T0
    #0 0xaaaab5dfb614 in __interceptor_strcmp.part.0 (test_progs+0x21b614)
    beagleboard#1 0xaaaab635f144 in str_equal_fn tools/lib/bpf/btf_dump.c:127
    beagleboard#2 0xaaaab635e3e0 in hashmap_find_entry tools/lib/bpf/hashmap.c:143
    beagleboard#3 0xaaaab635e72c in hashmap__find tools/lib/bpf/hashmap.c:212
    beagleboard#4 0xaaaab6362258 in btf_dump_name_dups tools/lib/bpf/btf_dump.c:1525
    beagleboard#5 0xaaaab636240c in btf_dump_resolve_name tools/lib/bpf/btf_dump.c:1552
    beagleboard#6 0xaaaab6362598 in btf_dump_type_name tools/lib/bpf/btf_dump.c:1567
    beagleboard#7 0xaaaab6360b48 in btf_dump_emit_struct_def tools/lib/bpf/btf_dump.c:912
    beagleboard#8 0xaaaab6360630 in btf_dump_emit_type tools/lib/bpf/btf_dump.c:798
    beagleboard#9 0xaaaab635f720 in btf_dump__dump_type tools/lib/bpf/btf_dump.c:282
    beagleboard#10 0xaaaab608523c in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:236
    beagleboard#11 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
    beagleboard#12 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
    beagleboard#13 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
    beagleboard#14 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
    beagleboard#15 0xaaaab5d65990  (test_progs+0x185990)

0xffff927006db is located 11 bytes inside of 16-byte region [0xffff927006d0,0xffff927006e0)
freed by thread T0 here:
    #0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4)
    beagleboard#1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191
    beagleboard#2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163
    beagleboard#3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106
    beagleboard#4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157
    beagleboard#5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519
    beagleboard#6 0xaaaab6353e10 in btf__add_field tools/lib/bpf/btf.c:2032
    beagleboard#7 0xaaaab6084fcc in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:232
    beagleboard#8 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
    beagleboard#9 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
    beagleboard#10 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
    beagleboard#11 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
    beagleboard#12 0xaaaab5d65990  (test_progs+0x185990)

previously allocated by thread T0 here:
    #0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4)
    beagleboard#1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191
    beagleboard#2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163
    beagleboard#3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106
    beagleboard#4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157
    beagleboard#5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519
    beagleboard#6 0xaaaab6353ff0 in btf_add_enum_common tools/lib/bpf/btf.c:2070
    beagleboard#7 0xaaaab6354080 in btf__add_enum tools/lib/bpf/btf.c:2102
    beagleboard#8 0xaaaab6082f50 in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:162
    beagleboard#9 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
    beagleboard#10 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
    beagleboard#11 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
    beagleboard#12 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
    beagleboard#13 0xaaaab5d65990  (test_progs+0x185990)

The reason is that the key stored in hash table name_map is a string
address, and the string memory is allocated by realloc() function, when
the memory is resized by realloc() later, the old memory may be freed,
so the address stored in name_map references to a freed memory, causing
use-after-free.

Fix it by storing duplicated string address in name_map.

Fixes: 919d2b1 ("libbpf: Allow modification of BTF and add btf__add_str API")
Signed-off-by: Xu Kuohai <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
nmenon pushed a commit to nmenon/nm-beagle-linux that referenced this pull request May 22, 2023
…g the sock

[ Upstream commit 3cf7203 ]

There is a race condition in vxlan that when deleting a vxlan device
during receiving packets, there is a possibility that the sock is
released after getting vxlan_sock vs from sk_user_data. Then in
later vxlan_ecn_decapsulate(), vxlan_get_sk_family() we will got
NULL pointer dereference. e.g.

   #0 [ffffa25ec6978a38] machine_kexec at ffffffff8c669757
   beagleboard#1 [ffffa25ec6978a90] __crash_kexec at ffffffff8c7c0a4d
   beagleboard#2 [ffffa25ec6978b58] crash_kexec at ffffffff8c7c1c48
   beagleboard#3 [ffffa25ec6978b60] oops_end at ffffffff8c627f2b
   beagleboard#4 [ffffa25ec6978b80] page_fault_oops at ffffffff8c678fcb
   beagleboard#5 [ffffa25ec6978bd8] exc_page_fault at ffffffff8d109542
   beagleboard#6 [ffffa25ec6978c00] asm_exc_page_fault at ffffffff8d200b62
      [exception RIP: vxlan_ecn_decapsulate+0x3b]
      RIP: ffffffffc1014e7b  RSP: ffffa25ec6978cb0  RFLAGS: 00010246
      RAX: 0000000000000008  RBX: ffff8aa000888000  RCX: 0000000000000000
      RDX: 000000000000000e  RSI: ffff8a9fc7ab803e  RDI: ffff8a9fd1168700
      RBP: ffff8a9fc7ab803e   R8: 0000000000700000   R9: 00000000000010ae
      R10: ffff8a9fcb748980  R11: 0000000000000000  R12: ffff8a9fd1168700
      R13: ffff8aa000888000  R14: 00000000002a0000  R15: 00000000000010ae
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   beagleboard#7 [ffffa25ec6978ce8] vxlan_rcv at ffffffffc10189cd [vxlan]
   beagleboard#8 [ffffa25ec6978d90] udp_queue_rcv_one_skb at ffffffff8cfb6507
   beagleboard#9 [ffffa25ec6978dc0] udp_unicast_rcv_skb at ffffffff8cfb6e45
  beagleboard#10 [ffffa25ec6978dc8] __udp4_lib_rcv at ffffffff8cfb8807
  beagleboard#11 [ffffa25ec6978e20] ip_protocol_deliver_rcu at ffffffff8cf76951
  beagleboard#12 [ffffa25ec6978e48] ip_local_deliver at ffffffff8cf76bde
  beagleboard#13 [ffffa25ec6978ea0] __netif_receive_skb_one_core at ffffffff8cecde9b
  beagleboard#14 [ffffa25ec6978ec8] process_backlog at ffffffff8cece139
  beagleboard#15 [ffffa25ec6978f00] __napi_poll at ffffffff8ceced1a
  beagleboard#16 [ffffa25ec6978f28] net_rx_action at ffffffff8cecf1f3
  beagleboard#17 [ffffa25ec6978fa0] __softirqentry_text_start at ffffffff8d4000ca
  beagleboard#18 [ffffa25ec6978ff0] do_softirq at ffffffff8c6fbdc3

Reproducer: https://github.com/Mellanox/ovs-tests/blob/master/test-ovs-vxlan-remove-tunnel-during-traffic.sh

Fix this by waiting for all sk_user_data reader to finish before
releasing the sock.

Reported-by: Jianlin Shi <[email protected]>
Suggested-by: Jakub Sitnicki <[email protected]>
Fixes: 6a93cc9 ("udp-tunnel: Add a few more UDP tunnel APIs")
Signed-off-by: Hangbin Liu <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
nmenon pushed a commit to nmenon/nm-beagle-linux that referenced this pull request May 22, 2023
[ Upstream commit b18cba0 ]

Commit 9130b8d ("SUNRPC: allow for upcalls for the same uid
but different gss service") introduced `auth` argument to
__gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL
since it (and auth->service) was not (yet) determined.

When multiple upcalls with the same uid and different service are
ongoing, it could happen that __gss_find_upcall(), which returns the
first match found in the pipe->in_downcall list, could not find the
correct gss_msg corresponding to the downcall we are looking for.
Moreover, it might return a msg which is not sent to rpc.gssd yet.

We could see mount.nfs process hung in D state with multiple mount.nfs
are executed in parallel.  The call trace below is of CentOS 7.9
kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/
elrepo kernel-ml-6.0.7-1.el7.

PID: 71258  TASK: ffff91ebd4be0000  CPU: 36  COMMAND: "mount.nfs"
 #0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f
 beagleboard#1 [ffff9203ca323580] schedule at ffffffffa3b88eb9
 beagleboard#2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss]
 beagleboard#3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc
[sunrpc]
 beagleboard#4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss]
 beagleboard#5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc]
 beagleboard#6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc]
 beagleboard#7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc]
 beagleboard#8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc]
 beagleboard#9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc]

The scenario is like this. Let's say there are two upcalls for
services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe.

When rpc.gssd reads pipe to get the upcall msg corresponding to
service B from pipe->pipe and then writes the response, in
gss_pipe_downcall the msg corresponding to service A will be picked
because only uid is used to find the msg and it is before the one for
B in pipe->in_downcall.  And the process waiting for the msg
corresponding to service A will be woken up.

Actual scheduing of that process might be after rpc.gssd processes the
next msg.  In rpc_pipe_generic_upcall it clears msg->errno (for A).
The process is scheduled to see gss_msg->ctx == NULL and
gss_msg->msg.errno == 0, therefore it cannot break the loop in
gss_create_upcall and is never woken up after that.

This patch adds a simple check to ensure that a msg which is not
sent to rpc.gssd yet is not chosen as the matching upcall upon
receiving a downcall.

Signed-off-by: minoura makoto <[email protected]>
Signed-off-by: Hiroshi Shimamoto <[email protected]>
Tested-by: Hiroshi Shimamoto <[email protected]>
Cc: Trond Myklebust <[email protected]>
Fixes: 9130b8d ("SUNRPC: allow for upcalls for same uid but different gss service")
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
nmenon pushed a commit to nmenon/nm-beagle-linux that referenced this pull request May 22, 2023
[ Upstream commit 6c4ca03 ]

During EEH error injection testing, a deadlock was encountered in the tg3
driver when tg3_io_error_detected() was attempting to cancel outstanding
reset tasks:

crash> foreach UN bt
...
PID: 159    TASK: c0000000067c6000  CPU: 8   COMMAND: "eehd"
...
 beagleboard#5 [c00000000681f990] __cancel_work_timer at c00000000019fd18
 beagleboard#6 [c00000000681fa30] tg3_io_error_detected at c00800000295f098 [tg3]
 beagleboard#7 [c00000000681faf0] eeh_report_error at c00000000004e25c
...

PID: 290    TASK: c000000036e5f800  CPU: 6   COMMAND: "kworker/6:1"
...
 beagleboard#4 [c00000003721fbc0] rtnl_lock at c000000000c940d8
 beagleboard#5 [c00000003721fbe0] tg3_reset_task at c008000002969358 [tg3]
 beagleboard#6 [c00000003721fc60] process_one_work at c00000000019e5c4
...

PID: 296    TASK: c000000037a65800  CPU: 21  COMMAND: "kworker/21:1"
...
 beagleboard#4 [c000000037247bc0] rtnl_lock at c000000000c940d8
 beagleboard#5 [c000000037247be0] tg3_reset_task at c008000002969358 [tg3]
 beagleboard#6 [c000000037247c60] process_one_work at c00000000019e5c4
...

PID: 655    TASK: c000000036f49000  CPU: 16  COMMAND: "kworker/16:2"
...:1

 beagleboard#4 [c0000000373ebbc0] rtnl_lock at c000000000c940d8
 beagleboard#5 [c0000000373ebbe0] tg3_reset_task at c008000002969358 [tg3]
 beagleboard#6 [c0000000373ebc60] process_one_work at c00000000019e5c4
...

Code inspection shows that both tg3_io_error_detected() and
tg3_reset_task() attempt to acquire the RTNL lock at the beginning of
their code blocks.  If tg3_reset_task() should happen to execute between
the times when tg3_io_error_deteced() acquires the RTNL lock and
tg3_reset_task_cancel() is called, a deadlock will occur.

Moving tg3_reset_task_cancel() call earlier within the code block, prior
to acquiring RTNL, prevents this from happening, but also exposes another
deadlock issue where tg3_reset_task() may execute AFTER
tg3_io_error_detected() has executed:

crash> foreach UN bt
PID: 159    TASK: c0000000067d2000  CPU: 9   COMMAND: "eehd"
...
 beagleboard#4 [c000000006867a60] rtnl_lock at c000000000c940d8
 beagleboard#5 [c000000006867a80] tg3_io_slot_reset at c0080000026c2ea8 [tg3]
 beagleboard#6 [c000000006867b00] eeh_report_reset at c00000000004de88
...
PID: 363    TASK: c000000037564000  CPU: 6   COMMAND: "kworker/6:1"
...
 beagleboard#3 [c000000036c1bb70] msleep at c000000000259e6c
 beagleboard#4 [c000000036c1bba0] napi_disable at c000000000c6b848
 beagleboard#5 [c000000036c1bbe0] tg3_reset_task at c0080000026d942c [tg3]
 beagleboard#6 [c000000036c1bc60] process_one_work at c00000000019e5c4
...

This issue can be avoided by aborting tg3_reset_task() if EEH error
recovery is already in progress.

Fixes: db84bf4 ("tg3: tg3_reset_task() needs to use rtnl_lock to synchronize")
Signed-off-by: David Christensen <[email protected]>
Reviewed-by: Pavan Chebbi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Mar 4, 2024
[ Upstream commit 1469417 ]

Trying to suspend to RAM on SAMA5D27 EVK leads to the following lockdep
warning:

 ============================================
 WARNING: possible recursive locking detected
 6.7.0-rc5-wt+ #532 Not tainted
 --------------------------------------------
 sh/92 is trying to acquire lock:
 c3cf306c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0xe8/0x100

 but task is already holding lock:
 c3d7c46c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0xe8/0x100

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&irq_desc_lock_class);
   lock(&irq_desc_lock_class);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 6 locks held by sh/92:
  #0: c3aa0258 (sb_writers#6){.+.+}-{0:0}, at: ksys_write+0xd8/0x178
  #1: c4c2df44 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x138/0x284
  #2: c32684a0 (kn->active){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x148/0x284
  #3: c232b6d4 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend+0x13c/0x4e8
  #4: c387b088 (&dev->mutex){....}-{3:3}, at: __device_suspend+0x1e8/0x91c
  #5: c3d7c46c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0xe8/0x100

 stack backtrace:
 CPU: 0 PID: 92 Comm: sh Not tainted 6.7.0-rc5-wt+ #532
 Hardware name: Atmel SAMA5
  unwind_backtrace from show_stack+0x18/0x1c
  show_stack from dump_stack_lvl+0x34/0x48
  dump_stack_lvl from __lock_acquire+0x19ec/0x3a0c
  __lock_acquire from lock_acquire.part.0+0x124/0x2d0
  lock_acquire.part.0 from _raw_spin_lock_irqsave+0x5c/0x78
  _raw_spin_lock_irqsave from __irq_get_desc_lock+0xe8/0x100
  __irq_get_desc_lock from irq_set_irq_wake+0xa8/0x204
  irq_set_irq_wake from atmel_gpio_irq_set_wake+0x58/0xb4
  atmel_gpio_irq_set_wake from irq_set_irq_wake+0x100/0x204
  irq_set_irq_wake from gpio_keys_suspend+0xec/0x2b8
  gpio_keys_suspend from dpm_run_callback+0xe4/0x248
  dpm_run_callback from __device_suspend+0x234/0x91c
  __device_suspend from dpm_suspend+0x224/0x43c
  dpm_suspend from dpm_suspend_start+0x9c/0xa8
  dpm_suspend_start from suspend_devices_and_enter+0x1e0/0xa84
  suspend_devices_and_enter from pm_suspend+0x460/0x4e8
  pm_suspend from state_store+0x78/0xe4
  state_store from kernfs_fop_write_iter+0x1a0/0x284
  kernfs_fop_write_iter from vfs_write+0x38c/0x6f4
  vfs_write from ksys_write+0xd8/0x178
  ksys_write from ret_fast_syscall+0x0/0x1c
 Exception stack(0xc52b3fa8 to 0xc52b3ff0)
 3fa0:                   00000004 005a0ae8 00000001 005a0ae8 00000004 00000001
 3fc0: 00000004 005a0ae8 00000001 00000004 00000004 b6c616c0 00000020 0059d190
 3fe0: 00000004 b6c61678 aec5a041 aebf1a26

This warning is raised because pinctrl-at91-pio4 uses chained IRQ. Whenever
a wake up source configures an IRQ through irq_set_irq_wake, it will
lock the corresponding IRQ desc, and then call irq_set_irq_wake on "parent"
IRQ which will do the same on its own IRQ desc, but since those two locks
share the same class, lockdep reports this as an issue.

Fix lockdep false positive by setting a different class for parent and
children IRQ

Fixes: 7761808 ("pinctrl: introduce driver for Atmel PIO4 controller")
Signed-off-by: Alexis Lothoré <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Mar 4, 2024
[ Upstream commit fc3a553 ]

An issue occurred while reading an ELF file in libbpf.c during fuzzing:

	Program received signal SIGSEGV, Segmentation fault.
	0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206
	4206 in libbpf.c
	(gdb) bt
	#0 0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206
	#1 0x000000000094f9d6 in bpf_object.collect_relos () at libbpf.c:6706
	#2 0x000000000092bef3 in bpf_object_open () at libbpf.c:7437
	#3 0x000000000092c046 in bpf_object.open_mem () at libbpf.c:7497
	#4 0x0000000000924afa in LLVMFuzzerTestOneInput () at fuzz/bpf-object-fuzzer.c:16
	#5 0x000000000060be11 in testblitz_engine::fuzzer::Fuzzer::run_one ()
	#6 0x000000000087ad92 in tracing::span::Span::in_scope ()
	#7 0x00000000006078aa in testblitz_engine::fuzzer::util::walkdir ()
	#8 0x00000000005f3217 in testblitz_engine::entrypoint::main::{{closure}} ()
	#9 0x00000000005f2601 in main ()
	(gdb)

scn_data was null at this code(tools/lib/bpf/src/libbpf.c):

	if (rel->r_offset % BPF_INSN_SZ || rel->r_offset >= scn_data->d_size) {

The scn_data is derived from the code above:

	scn = elf_sec_by_idx(obj, sec_idx);
	scn_data = elf_sec_data(obj, scn);

	relo_sec_name = elf_sec_str(obj, shdr->sh_name);
	sec_name = elf_sec_name(obj, scn);
	if (!relo_sec_name || !sec_name)// don't check whether scn_data is NULL
		return -EINVAL;

In certain special scenarios, such as reading a malformed ELF file,
it is possible that scn_data may be a null pointer

Signed-off-by: Mingyi Zhang <[email protected]>
Signed-off-by: Xin Liu <[email protected]>
Signed-off-by: Changye Wu <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Mar 12, 2024
[ Upstream commit fc3a553 ]

An issue occurred while reading an ELF file in libbpf.c during fuzzing:

	Program received signal SIGSEGV, Segmentation fault.
	0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206
	4206 in libbpf.c
	(gdb) bt
	#0 0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206
	#1 0x000000000094f9d6 in bpf_object.collect_relos () at libbpf.c:6706
	#2 0x000000000092bef3 in bpf_object_open () at libbpf.c:7437
	#3 0x000000000092c046 in bpf_object.open_mem () at libbpf.c:7497
	#4 0x0000000000924afa in LLVMFuzzerTestOneInput () at fuzz/bpf-object-fuzzer.c:16
	#5 0x000000000060be11 in testblitz_engine::fuzzer::Fuzzer::run_one ()
	#6 0x000000000087ad92 in tracing::span::Span::in_scope ()
	#7 0x00000000006078aa in testblitz_engine::fuzzer::util::walkdir ()
	#8 0x00000000005f3217 in testblitz_engine::entrypoint::main::{{closure}} ()
	#9 0x00000000005f2601 in main ()
	(gdb)

scn_data was null at this code(tools/lib/bpf/src/libbpf.c):

	if (rel->r_offset % BPF_INSN_SZ || rel->r_offset >= scn_data->d_size) {

The scn_data is derived from the code above:

	scn = elf_sec_by_idx(obj, sec_idx);
	scn_data = elf_sec_data(obj, scn);

	relo_sec_name = elf_sec_str(obj, shdr->sh_name);
	sec_name = elf_sec_name(obj, scn);
	if (!relo_sec_name || !sec_name)// don't check whether scn_data is NULL
		return -EINVAL;

In certain special scenarios, such as reading a malformed ELF file,
it is possible that scn_data may be a null pointer

Signed-off-by: Mingyi Zhang <[email protected]>
Signed-off-by: Xin Liu <[email protected]>
Signed-off-by: Changye Wu <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Jun 27, 2024
[ Upstream commit 65e8fbd ]

There is this reported crash when experimenting with the lvm2 testsuite.
The list corruption is caused by the fact that the postsuspend and resume
methods were not paired correctly; there were two consecutive calls to the
origin_postsuspend function. The second call attempts to remove the
"hash_list" entry from a list, while it was already removed by the first
call.

Fix __dm_internal_resume so that it calls the preresume and resume
methods of the table's targets.

If a preresume method of some target fails, we are in a tricky situation.
We can't return an error because dm_internal_resume isn't supposed to
return errors. We can't return success, because then the "resume" and
"postsuspend" methods would not be paired correctly. So, we set the
DMF_SUSPENDED flag and we fake normal suspend - it may confuse userspace
tools, but it won't cause a kernel crash.

------------[ cut here ]------------
kernel BUG at lib/list_debug.c:56!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 8343 Comm: dmsetup Not tainted 6.8.0-rc6 #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
RIP: 0010:__list_del_entry_valid_or_report+0x77/0xc0
<snip>
RSP: 0018:ffff8881b831bcc0 EFLAGS: 00010282
RAX: 000000000000004e RBX: ffff888143b6eb80 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffffff819053d0 RDI: 00000000ffffffff
RBP: ffff8881b83a3400 R08: 00000000fffeffff R09: 0000000000000058
R10: 0000000000000000 R11: ffffffff81a24080 R12: 0000000000000001
R13: ffff88814538e000 R14: ffff888143bc6dc0 R15: ffffffffa02e4bb0
FS:  00000000f7c0f780(0000) GS:ffff8893f0a40000(0000) knlGS:0000000000000000
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000057fb5000 CR3: 0000000143474000 CR4: 00000000000006b0
Call Trace:
 <TASK>
 ? die+0x2d/0x80
 ? do_trap+0xeb/0xf0
 ? __list_del_entry_valid_or_report+0x77/0xc0
 ? do_error_trap+0x60/0x80
 ? __list_del_entry_valid_or_report+0x77/0xc0
 ? exc_invalid_op+0x49/0x60
 ? __list_del_entry_valid_or_report+0x77/0xc0
 ? asm_exc_invalid_op+0x16/0x20
 ? table_deps+0x1b0/0x1b0 [dm_mod]
 ? __list_del_entry_valid_or_report+0x77/0xc0
 origin_postsuspend+0x1a/0x50 [dm_snapshot]
 dm_table_postsuspend_targets+0x34/0x50 [dm_mod]
 dm_suspend+0xd8/0xf0 [dm_mod]
 dev_suspend+0x1f2/0x2f0 [dm_mod]
 ? table_deps+0x1b0/0x1b0 [dm_mod]
 ctl_ioctl+0x300/0x5f0 [dm_mod]
 dm_compat_ctl_ioctl+0x7/0x10 [dm_mod]
 __x64_compat_sys_ioctl+0x104/0x170
 do_syscall_64+0x184/0x1b0
 entry_SYSCALL_64_after_hwframe+0x46/0x4e
RIP: 0033:0xf7e6aead
<snip>
---[ end trace 0000000000000000 ]---

Fixes: ffcc393 ("dm: enhance internal suspend and resume interface")
Signed-off-by: Mikulas Patocka <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Oct 30, 2024
Syzkaller reported a lockdep splat:

  ============================================
  WARNING: possible recursive locking detected
  6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Not tainted
  --------------------------------------------
  syz-executor364/5113 is trying to acquire lock:
  ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
  ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328

  but task is already holding lock:
  ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
  ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(k-slock-AF_INET);
    lock(k-slock-AF_INET);

   *** DEADLOCK ***

   May be due to missing lock nesting notation

  7 locks held by syz-executor364/5113:
   #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline]
   #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x153/0x1b10 net/mptcp/protocol.c:1806
   #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline]
   #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg_fastopen+0x11f/0x530 net/mptcp/protocol.c:1727
   #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline]
   #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
   #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5f/0x1b80 net/ipv4/ip_output.c:470
   #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline]
   #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
   #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1390 net/ipv4/ip_output.c:228
   #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
   #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x33b/0x15b0 net/core/dev.c:6104
   #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline]
   #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]
   #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x230/0x5f0 net/ipv4/ip_input.c:232
   #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
   #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328

  stack backtrace:
  CPU: 0 UID: 0 PID: 5113 Comm: syz-executor364 Not tainted 6.11.0-rc6-syzkaller-00019-g67784a74e258 #0
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
  Call Trace:
   <IRQ>
   __dump_stack lib/dump_stack.c:93 [inline]
   dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
   check_deadlock kernel/locking/lockdep.c:3061 [inline]
   validate_chain+0x15d3/0x5900 kernel/locking/lockdep.c:3855
   __lock_acquire+0x137a/0x2040 kernel/locking/lockdep.c:5142
   lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5759
   __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
   _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
   spin_lock include/linux/spinlock.h:351 [inline]
   sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328
   mptcp_sk_clone_init+0x32/0x13c0 net/mptcp/protocol.c:3279
   subflow_syn_recv_sock+0x931/0x1920 net/mptcp/subflow.c:874
   tcp_check_req+0xfe4/0x1a20 net/ipv4/tcp_minisocks.c:853
   tcp_v4_rcv+0x1c3e/0x37f0 net/ipv4/tcp_ipv4.c:2267
   ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205
   ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233
   NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
   NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
   __netif_receive_skb_one_core net/core/dev.c:5661 [inline]
   __netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775
   process_backlog+0x662/0x15b0 net/core/dev.c:6108
   __napi_poll+0xcb/0x490 net/core/dev.c:6772
   napi_poll net/core/dev.c:6841 [inline]
   net_rx_action+0x89b/0x1240 net/core/dev.c:6963
   handle_softirqs+0x2c4/0x970 kernel/softirq.c:554
   do_softirq+0x11b/0x1e0 kernel/softirq.c:455
   </IRQ>
   <TASK>
   __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382
   local_bh_enable include/linux/bottom_half.h:33 [inline]
   rcu_read_unlock_bh include/linux/rcupdate.h:908 [inline]
   __dev_queue_xmit+0x1763/0x3e90 net/core/dev.c:4450
   dev_queue_xmit include/linux/netdevice.h:3105 [inline]
   neigh_hh_output include/net/neighbour.h:526 [inline]
   neigh_output include/net/neighbour.h:540 [inline]
   ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:235
   ip_local_out net/ipv4/ip_output.c:129 [inline]
   __ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:535
   __tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466
   tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:6542 [inline]
   tcp_rcv_state_process+0x2c32/0x4570 net/ipv4/tcp_input.c:6729
   tcp_v4_do_rcv+0x77d/0xc70 net/ipv4/tcp_ipv4.c:1934
   sk_backlog_rcv include/net/sock.h:1111 [inline]
   __release_sock+0x214/0x350 net/core/sock.c:3004
   release_sock+0x61/0x1f0 net/core/sock.c:3558
   mptcp_sendmsg_fastopen+0x1ad/0x530 net/mptcp/protocol.c:1733
   mptcp_sendmsg+0x1884/0x1b10 net/mptcp/protocol.c:1812
   sock_sendmsg_nosec net/socket.c:730 [inline]
   __sock_sendmsg+0x1a6/0x270 net/socket.c:745
   ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
   ___sys_sendmsg net/socket.c:2651 [inline]
   __sys_sendmmsg+0x3b2/0x740 net/socket.c:2737
   __do_sys_sendmmsg net/socket.c:2766 [inline]
   __se_sys_sendmmsg net/socket.c:2763 [inline]
   __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2763
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
  RIP: 0033:0x7f04fb13a6b9
  Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 01 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007ffd651f42d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
  RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f04fb13a6b9
  RDX: 0000000000000001 RSI: 0000000020000d00 RDI: 0000000000000004
  RBP: 00007ffd651f4310 R08: 0000000000000001 R09: 0000000000000001
  R10: 0000000020000080 R11: 0000000000000246 R12: 00000000000f4240
  R13: 00007f04fb187449 R14: 00007ffd651f42f4 R15: 00007ffd651f4300
   </TASK>

As noted by Cong Wang, the splat is false positive, but the code
path leading to the report is an unexpected one: a client is
attempting an MPC handshake towards the in-kernel listener created
by the in-kernel PM for a port based signal endpoint.

Such connection will be never accepted; many of them can make the
listener queue full and preventing the creation of MPJ subflow via
such listener - its intended role.

Explicitly detect this scenario at initial-syn time and drop the
incoming MPC request.

Fixes: 1729cf1 ("mptcp: create the listening socket for new port")
Cc: [email protected]
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=f4aacdfef2c6a6529c3e
Cc: Cong Wang <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Nov 5, 2024
Hou Tao says:

====================
The patch set fixes several issues in bits iterator. Patch #1 fixes the
kmemleak problem of bits iterator. Patch #2~#3 fix the overflow problem
of nr_bits. Patch #4 fixes the potential stack corruption when bits
iterator is used on 32-bit host. Patch #5 adds more test cases for bits
iterator.

Please see the individual patches for more details. And comments are
always welcome.
---
v4:
 * patch #1: add ack from Yafang
 * patch #3: revert code-churn like changes:
   (1) compute nr_bytes and nr_bits before the check of nr_words.
   (2) use nr_bits == 64 to check for single u64, preventing build
       warning on 32-bit hosts.
 * patch #4: use "BITS_PER_LONG == 32" instead of "!defined(CONFIG_64BIT)"

v3: https://lore.kernel.org/bpf/[email protected]/T/#t
  * split the bits-iterator related patches from "Misc fixes for bpf"
    patch set
  * patch #1: use "!nr_bits || bits >= nr_bits" to stop the iteration
  * patch #2: add a new helper for the overflow problem
  * patch #3: decrease the limitation from 512 to 511 and check whether
    nr_bytes is too large for bpf memory allocator explicitly
  * patch #5: add two more test cases for bit iterator

v2: http://lore.kernel.org/bpf/[email protected]
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Nov 5, 2024
Petr Machata says:

====================
mlxsw: Fixes

In this patchset:

- Tx header should be pushed for each packet which is transmitted via
  Spectrum ASICs. Patch #1 adds a missing call to skb_cow_head() to make
  sure that there is both enough room to push the Tx header and that the
  SKB header is not cloned and can be modified.

- Commit b5b60bb ("mlxsw: pci: Use page pool for Rx buffers
  allocation") converted mlxsw to use page pool for Rx buffers allocation.
  Sync for CPU and for device should be done for Rx pages. In patches #2
  and #3, add the missing calls to sync pages for, respectively, CPU and
  the device.

- Patch #4 then fixes a bug to IPv6 GRE forwarding offload. Patch #5 adds
  a generic forwarding test that fails with mlxsw ports prior to the fix.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
RobertCNelson pushed a commit that referenced this pull request Nov 18, 2024
When CONFIG_KASAN_SW_TAGS and CONFIG_KASAN_STACK are enabled, the
object_is_on_stack() function may produce incorrect results due to the
presence of tags in the obj pointer, while the stack pointer does not have
tags.  This discrepancy can lead to incorrect stack object detection and
subsequently trigger warnings if CONFIG_DEBUG_OBJECTS is also enabled.

Example of the warning:

ODEBUG: object 3eff800082ea7bb0 is NOT on stack ffff800082ea0000, but annotated.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:557 __debug_object_init+0x330/0x364
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc5 #4
Hardware name: linux,dummy-virt (DT)
pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __debug_object_init+0x330/0x364
lr : __debug_object_init+0x330/0x364
sp : ffff800082ea7b40
x29: ffff800082ea7b40 x28: 98ff0000c0164518 x27: 98ff0000c0164534
x26: ffff800082d93ec8 x25: 0000000000000001 x24: 1cff0000c00172a0
x23: 0000000000000000 x22: ffff800082d93ed0 x21: ffff800081a24418
x20: 3eff800082ea7bb0 x19: efff800000000000 x18: 0000000000000000
x17: 00000000000000ff x16: 0000000000000047 x15: 206b63617473206e
x14: 0000000000000018 x13: ffff800082ea7780 x12: 0ffff800082ea78e
x11: 0ffff800082ea790 x10: 0ffff800082ea79d x9 : 34d77febe173e800
x8 : 34d77febe173e800 x7 : 0000000000000001 x6 : 0000000000000001
x5 : feff800082ea74b8 x4 : ffff800082870a90 x3 : ffff80008018d3c4
x2 : 0000000000000001 x1 : ffff800082858810 x0 : 0000000000000050
Call trace:
 __debug_object_init+0x330/0x364
 debug_object_init_on_stack+0x30/0x3c
 schedule_hrtimeout_range_clock+0xac/0x26c
 schedule_hrtimeout+0x1c/0x30
 wait_task_inactive+0x1d4/0x25c
 kthread_bind_mask+0x28/0x98
 init_rescuer+0x1e8/0x280
 workqueue_init+0x1a0/0x3cc
 kernel_init_freeable+0x118/0x200
 kernel_init+0x28/0x1f0
 ret_from_fork+0x10/0x20
---[ end trace 0000000000000000 ]---
ODEBUG: object 3eff800082ea7bb0 is NOT on stack ffff800082ea0000, but annotated.
------------[ cut here ]------------

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Qun-Wei Lin <[email protected]>
Cc: Andrew Yang <[email protected]>
Cc: AngeloGioacchino Del Regno <[email protected]>
Cc: Casper Li <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chinwen Chang <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Matthias Brugger <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants