Skip to content

Commit

Permalink
driver core: Fix uevent_show() vs driver detach race
Browse files Browse the repository at this point in the history
uevent_show() wants to de-reference dev->driver->name. There is no clean
way for a device attribute to de-reference dev->driver unless that
attribute is defined via (struct device_driver).dev_groups. Instead, the
anti-pattern of taking the device_lock() in the attribute handler risks
deadlocks with code paths that remove device attributes while holding
the lock.

This deadlock is typically invisible to lockdep given the device_lock()
is marked lockdep_set_novalidate_class(), but some subsystems allocate a
local lockdep key for @Dev->mutex to reveal reports of the form:

 ======================================================
 WARNING: possible circular locking dependency detected
 6.10.0-rc7+ torvalds#275 Tainted: G           OE    N
 ------------------------------------------------------
 modprobe/2374 is trying to acquire lock:
 ffff8c2270070de0 (kn->active#6){++++}-{0:0}, at: __kernfs_remove+0xde/0x220

 but task is already holding lock:
 ffff8c22016e88f8 (&cxl_root_key){+.+.}-{3:3}, at: device_release_driver_internal+0x39/0x210

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&cxl_root_key){+.+.}-{3:3}:
        __mutex_lock+0x99/0xc30
        uevent_show+0xac/0x130
        dev_attr_show+0x18/0x40
        sysfs_kf_seq_show+0xac/0xf0
        seq_read_iter+0x110/0x450
        vfs_read+0x25b/0x340
        ksys_read+0x67/0xf0
        do_syscall_64+0x75/0x190
        entry_SYSCALL_64_after_hwframe+0x76/0x7e

 -> #0 (kn->active#6){++++}-{0:0}:
        __lock_acquire+0x121a/0x1fa0
        lock_acquire+0xd6/0x2e0
        kernfs_drain+0x1e9/0x200
        __kernfs_remove+0xde/0x220
        kernfs_remove_by_name_ns+0x5e/0xa0
        device_del+0x168/0x410
        device_unregister+0x13/0x60
        devres_release_all+0xb8/0x110
        device_unbind_cleanup+0xe/0x70
        device_release_driver_internal+0x1c7/0x210
        driver_detach+0x47/0x90
        bus_remove_driver+0x6c/0xf0
        cxl_acpi_exit+0xc/0x11 [cxl_acpi]
        __do_sys_delete_module.isra.0+0x181/0x260
        do_syscall_64+0x75/0x190
        entry_SYSCALL_64_after_hwframe+0x76/0x7e

The observation though is that driver objects are typically much longer
lived than device objects. It is reasonable to perform lockless
de-reference of a @driver pointer even if it is racing detach from a
device. Given the infrequency of driver unregistration, use
synchronize_rcu() in module_remove_driver() to close any potential
races.  It is potentially overkill to suffer synchronize_rcu() just to
handle the rare module removal racing uevent_show() event.

Thanks to Tetsuo Handa for the debug analysis of the syzbot report [1].

Fixes: c0a4009 ("drivers: core: synchronize really_probe() and dev_uevent()")
Reported-by: [email protected]
Reported-by: Tetsuo Handa <[email protected]>
Closes: http://lore.kernel.org/[email protected] [1]
Link: http://lore.kernel.org/[email protected]
Cc: [email protected]
Cc: Ashish Sangwan <[email protected]>
Cc: Namjae Jeon <[email protected]>
Cc: Dirk Behme <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Link: https://lore.kernel.org/r/172081332794.577428.9738802016494057132.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Greg Kroah-Hartman <[email protected]>
  • Loading branch information
djbw authored and gregkh committed Jul 31, 2024
1 parent 86fee28 commit 15fffc6
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 5 deletions.
13 changes: 8 additions & 5 deletions drivers/base/core.c
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#include <linux/mutex.h>
#include <linux/pm_runtime.h>
#include <linux/netdevice.h>
#include <linux/rcupdate.h>
#include <linux/sched/signal.h>
#include <linux/sched/mm.h>
#include <linux/string_helpers.h>
Expand Down Expand Up @@ -2640,6 +2641,7 @@ static const char *dev_uevent_name(const struct kobject *kobj)
static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
{
const struct device *dev = kobj_to_dev(kobj);
struct device_driver *driver;
int retval = 0;

/* add device node properties if present */
Expand Down Expand Up @@ -2668,8 +2670,12 @@ static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
if (dev->type && dev->type->name)
add_uevent_var(env, "DEVTYPE=%s", dev->type->name);

if (dev->driver)
add_uevent_var(env, "DRIVER=%s", dev->driver->name);
/* Synchronize with module_remove_driver() */
rcu_read_lock();
driver = READ_ONCE(dev->driver);
if (driver)
add_uevent_var(env, "DRIVER=%s", driver->name);
rcu_read_unlock();

/* Add common DT information about the device */
of_device_uevent(dev, env);
Expand Down Expand Up @@ -2739,11 +2745,8 @@ static ssize_t uevent_show(struct device *dev, struct device_attribute *attr,
if (!env)
return -ENOMEM;

/* Synchronize with really_probe() */
device_lock(dev);
/* let the kset specific function add its keys */
retval = kset->uevent_ops->uevent(&dev->kobj, env);
device_unlock(dev);
if (retval)
goto out;

Expand Down
4 changes: 4 additions & 0 deletions drivers/base/module.c
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
#include <linux/errno.h>
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/rcupdate.h>
#include "base.h"

static char *make_driver_name(const struct device_driver *drv)
Expand Down Expand Up @@ -97,6 +98,9 @@ void module_remove_driver(const struct device_driver *drv)
if (!drv)
return;

/* Synchronize with dev_uevent() */
synchronize_rcu();

sysfs_remove_link(&drv->p->kobj, "module");

if (drv->owner)
Expand Down

0 comments on commit 15fffc6

Please sign in to comment.