port patches from Lustre tree to kernel tree #24

bergwolf · 2013-11-05T09:59:57Z

This syncs kernel client with Lustre tree up to commit f5e8090ae8(LU-2940 llite: Fix for oops in vvp_pgcache_show()).

Amir Shehata (2):
staging/lustre/lnet: Fix assert on empty group in selftest module
staging/lustre/ptlrpc: Fix a crash when dereferencing NULL pointer

Andreas Dilger (2):
staging/lustre/ldlm: fix resource/fid check, use DLDLMRES
staging/lustre/idl: remove LASSERT/CLASSERT from lustre_idl.h

Andrew Perepechko (1):
staging/lustre/llite: extended attribute cache

Andriy Skulysh (2):
staging/lustre/ptlrpc: Fix race during exp_flock_hash creation
staging/lustre/ptlrpc: flock deadlock detection does not work

Artem Blagodarenko (1):
staging/lustre/mgs: set_param -P option that sets value permanently

Aurelien Degremont (1):
staging/lustre/mdt: Set HSM dirty open-for-write file when evicted.

Dmitry Eremin (2):
staging/lustre/build: clean up unused variables and dead code
staging/lustre/build: fix compilation issue with is_compat_task

Doug Oucharek (1):
staging/lustre/lnet: Add LNet Router Priority parameter

Fan Yong (2):
staging/lustre/scrub: OI scrub on OST
staging/lustre/scrub: control OI scrub on OST from user space

JC Lafoucriere (6):
staging/lustre/llite: Access to released file trigs a restore
staging/lustre/mdt: HSM coordinator client interface
staging/lustre/mdt: HSM coordinator agent interface
staging/lustre/api: HSM import uses new released pattern
staging/lustre/utils: HSM Posix CopyTool
staging/lustre/mdt: HSM coordinator main thread

James Simmons (2):
staging/lustre/autoconf: remove vectored fops tests
staging/lustre/autoconf: remove LIBCFS_HAVE_IS_COMPAT_TASK test

Jinshan Xiong (3):
staging/lustre/hsm: Implementation of exclusive open
staging/lustre/hsm: Add hsm_release feature.
staging/lustre/mdt: a new param to allocate sequences

John L. Hammond (7):
staging/lustre: validate open handle cookies
staging/lustre/llite: use correct FID in ll_och_fill()
staging/lustre/lov: convert magic to host-endian in lov_dump_lmm()
staging/lustre/mdc: prevent fall through in mdc_iocontrol()
staging/lustre/lu: shrink lu_object by 8 bytes on x86_64
staging/lustre/llite: don't check for O_CREAT in it_create_mode
staging/lustre/llite: pass correct pointer to obd_iocontrol()

Lai Siyao (1):
staging/lustre/llite: remove ll_d_root_ops

Li Xi (2):
staging/lustre/llog: fix return value of llog_alloc_handle
staging/lustre/dt: Fix assertion of method in dt_capa_get()

Mikhail Pershin (4):
staging/lustre/server: use unified request handler for MGS
staging/lustre/llog: MGC to use OSD API for backup logs
staging/lustre/target: move OUT to the unified target code
staging/lustre/seq: unified SEQ handler

Patrick Farrell (1):
staging/lustre/nfs: writing to new files will return ENOENT

Swapnil Pimpale (1):
staging/lustre/llite: Fix for oops in vvp_pgcache_show()

.../staging/lustre/include/linux/libcfs/curproc.h | 1 -
.../staging/lustre/include/linux/libcfs/libcfs.h | 2 -
.../lustre/include/linux/libcfs/libcfs_ioctl.h | 1 +
.../staging/lustre/include/linux/lnet/lib-lnet.h | 5 +-
.../staging/lustre/include/linux/lnet/lib-types.h | 2 +-
drivers/staging/lustre/lnet/lnet/api-ni.c | 5 +-
drivers/staging/lustre/lnet/lnet/config.c | 39 ++-
drivers/staging/lustre/lnet/lnet/lib-move.c | 6 +
drivers/staging/lustre/lnet/lnet/router.c | 19 +-
drivers/staging/lustre/lnet/lnet/router_proc.c | 16 +-
drivers/staging/lustre/lnet/selftest/conctl.c | 56 +-
drivers/staging/lustre/lnet/selftest/conrpc.c | 2 +-
drivers/staging/lustre/lnet/selftest/console.c | 105 +++-
drivers/staging/lustre/lnet/selftest/console.h | 6 +-
drivers/staging/lustre/lnet/selftest/rpc.c | 2 -
drivers/staging/lustre/lnet/selftest/selftest.h | 3 -
drivers/staging/lustre/lnet/selftest/timer.c | 6 +-
drivers/staging/lustre/lustre/fid/lproc_fid.c | 6 +
drivers/staging/lustre/lustre/fld/lproc_fld.c | 1 +
drivers/staging/lustre/lustre/include/cl_object.h | 6 +-
drivers/staging/lustre/lustre/include/dt_object.h | 4 +-
.../lustre/lustre/include/linux/lustre_compat25.h | 4 +-
.../lustre/lustre/include/linux/lustre_intent.h | 2 +-
.../lustre/lustre/include/linux/lustre_lite.h | 1 +
drivers/staging/lustre/lustre/include/lu_object.h | 19 -
drivers/staging/lustre/lustre/include/lu_target.h | 287 +++++++++-
.../lustre/lustre/include/lustre/lustre_idl.h | 114 ++--
.../lustre/lustre/include/lustre/lustre_user.h | 65 ++-
.../lustre/lustre/include/lustre/lustreapi.h | 208 ++++---
drivers/staging/lustre/lustre/include/lustre_cfg.h | 2 +
.../staging/lustre/lustre/include/lustre_disk.h | 2 +
.../lustre/lustre/include/lustre_dlm_flags.h | 24 +-
drivers/staging/lustre/lustre/include/lustre_fid.h | 6 -
drivers/staging/lustre/lustre/include/lustre_ha.h | 3 -
.../staging/lustre/lustre/include/lustre_handles.h | 5 +-
drivers/staging/lustre/lustre/include/lustre_lib.h | 11 +-
drivers/staging/lustre/lustre/include/lustre_log.h | 13 +-
drivers/staging/lustre/lustre/include/lustre_net.h | 11 +-
.../lustre/lustre/include/lustre_req_layout.h | 7 +
drivers/staging/lustre/lustre/include/md_object.h | 6 +-
drivers/staging/lustre/lustre/include/obd.h | 15 +-
drivers/staging/lustre/lustre/include/obd_class.h | 9 +-
.../staging/lustre/lustre/include/obd_support.h | 10 +
drivers/staging/lustre/lustre/lclient/lcommon_cl.c | 6 +
.../staging/lustre/lustre/lclient/lcommon_misc.c | 4 +-
drivers/staging/lustre/lustre/ldlm/ldlm_flock.c | 69 ++-
drivers/staging/lustre/lustre/ldlm/ldlm_lock.c | 9 +-
drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c | 18 +-
drivers/staging/lustre/lustre/ldlm/ldlm_request.c | 39 +-
drivers/staging/lustre/lustre/ldlm/ldlm_resource.c | 19 +-
.../lustre/lustre/libcfs/linux/linux-curproc.c | 13 -
drivers/staging/lustre/lustre/libcfs/workitem.c | 55 +-
drivers/staging/lustre/lustre/llite/Makefile | 2 +-
drivers/staging/lustre/lustre/llite/dcache.c | 77 +--
drivers/staging/lustre/lustre/llite/dir.c | 47 ++-
drivers/staging/lustre/lustre/llite/file.c | 589 ++++++++++++++++--
.../staging/lustre/lustre/llite/llite_internal.h | 66 ++-
drivers/staging/lustre/lustre/llite/llite_lib.c | 91 +++-
drivers/staging/lustre/lustre/llite/llite_nfs.c | 6 +-
drivers/staging/lustre/lustre/llite/lproc_llite.c | 36 ++
drivers/staging/lustre/lustre/llite/namei.c | 53 +-
drivers/staging/lustre/lustre/llite/statahead.c | 7 +-
drivers/staging/lustre/lustre/llite/super25.c | 4 +
drivers/staging/lustre/lustre/llite/vvp_dev.c | 2 +-
drivers/staging/lustre/lustre/llite/vvp_io.c | 54 ++-
drivers/staging/lustre/lustre/llite/vvp_object.c | 2 +-
drivers/staging/lustre/lustre/llite/xattr.c | 102 ++--
drivers/staging/lustre/lustre/llite/xattr_cache.c | 640 ++++++++++++++++++++
.../staging/lustre/lustre/lov/lov_cl_internal.h | 16 +
drivers/staging/lustre/lustre/lov/lov_internal.h | 4 +-
drivers/staging/lustre/lustre/lov/lov_io.c | 15 +-
drivers/staging/lustre/lustre/lov/lov_object.c | 35 +-
drivers/staging/lustre/lustre/lov/lov_pack.c | 20 +-
drivers/staging/lustre/lustre/mdc/mdc_internal.h | 3 +-
drivers/staging/lustre/lustre/mdc/mdc_lib.c | 31 +-
drivers/staging/lustre/lustre/mdc/mdc_locks.c | 111 +++-
drivers/staging/lustre/lustre/mdc/mdc_request.c | 86 ++--
drivers/staging/lustre/lustre/mgc/libmgc.c | 3 -
drivers/staging/lustre/lustre/mgc/mgc_request.c | 492 ++++++++++------
drivers/staging/lustre/lustre/obdclass/genops.c | 2 +-
drivers/staging/lustre/lustre/obdclass/llog.c | 214 ++++---
.../staging/lustre/lustre/obdclass/local_storage.c | 9 +-
.../staging/lustre/lustre/obdclass/local_storage.h | 3 +
.../lustre/lustre/obdclass/lprocfs_status.c | 5 +-
drivers/staging/lustre/lustre/obdclass/lu_object.c | 26 +-
.../lustre/lustre/obdclass/lustre_handles.c | 4 +-
drivers/staging/lustre/lustre/obdclass/md_attrs.c | 6 +-
.../staging/lustre/lustre/obdclass/obd_config.c | 50 ++-
drivers/staging/lustre/lustre/obdclass/obd_mount.c | 3 +
.../staging/lustre/lustre/obdecho/echo_client.c | 2 +-
drivers/staging/lustre/lustre/ptlrpc/client.c | 10 +-
drivers/staging/lustre/lustre/ptlrpc/import.c | 2 -
drivers/staging/lustre/lustre/ptlrpc/layout.c | 60 ++-
drivers/staging/lustre/lustre/ptlrpc/llog_server.c | 240 ++------
.../staging/lustre/lustre/ptlrpc/lproc_ptlrpc.c | 4 +-
drivers/staging/lustre/lustre/ptlrpc/niobuf.c | 16 +-
.../staging/lustre/lustre/ptlrpc/pack_generic.c | 59 +-
drivers/staging/lustre/lustre/ptlrpc/pinger.c | 65 --
drivers/staging/lustre/lustre/ptlrpc/ptlrpcd.c | 44 +-
drivers/staging/lustre/lustre/ptlrpc/wiretest.c | 60 ++-
100 files changed, 3427 insertions(+), 1400 deletions(-)

When a client accesses data in a released file, or truncate it, client must trig a restore request. During this restore, the client must not glimpse and must use size from MDT. To bring the "restore is running" information on the client we add a new t_state bit field to mdt_info which will be used to carry transient file state. To memorise this information in the inode we add a new flag LLIF_FILE_RESTORING. Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3432 Lustre-change: http://review.whamcloud.com/6537 Signed-off-by: JC Lafoucriere <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Tested-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

Add a const void *h_owner member to struct portals_handle. Add a const void *owner parameter to class_handle2object() which must be matched by the h_owner member of the handle in addition to the cookie. Adjust the callers of class_handle2object() accordingly, using NULL as the argument to the owner parameter, except in the case of mdt_handle2mfd() where we add an explicit mdt_export_data parameter which we use as the owner when searching for a MFD. When allocating a new MFD, pass a pointer to the mdt_export_data into mdt_mfd_new() and store it in h_owner. This allows the MDT to validate that the client has not sent the wrong open handle cookie, or sent the right cookie to the wrong MDT. Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3233 Lustre-change: http://review.whamcloud.com/6938 Signed-off-by: John L. Hammond <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Fan Yong <[email protected]> Reviewed-by: Mike Pershin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

When ll_intent_file_open() is called on a file with a stale dentry, ll_och_fill() may incorrectly use the FID from the struct ll_inode_info rather than the FID from the response body (which is the correct FID for the close). Fix this, remove the ll_inode_info parameter from ll_och_fill(), and move the call to ll_ioepoch_open() from ll_och_fill() to ll_local_open(). Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3233 Lustre-change: http://review.whamcloud.com/6695 Signed-off-by: John L. Hammond <[email protected]> Reviewed-by: Fan Yong <[email protected]> Reviewed-by: Mike Pershin <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

In this patch, a framework of lease is implemented. However, only exclusive lease is supported right now. To apply a lease, MDS_OPEN_LEASE must be set to open the file, EX mode open lock is returned to the client side to hold a lease. From that time on, if this file is opened again by other processes, the open lock will be revoked so the client who holds the lease will know the lease is already broken by checking that open lock. To release a lease, normal close is used. The client will revoke the open lock before sending CLOSE request. Lease can be applied in two ways. ll_lease_open()/close() can be called directly if the lease holder is in kernel space; or if the lease holder lives in user space, it has to open the file first and then use ioctl() with command LL_IOC_SET_LEASE to apply a lease. The lease holder has to poll the lease status itself. Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2919 Lustre-change: http://review.whamcloud.com/6730 Signed-off-by: Jinshan Xiong <[email protected]> Signed-off-by: John L. Hammond <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

The core of the issue is that the selftest module doesn't sanitize its own API, but it depends on lst utility to do such checks. As a result this issue manifests itself in this particular LU through an assert on an empty group. If the NID is misspelled then an empty group is added. An error output is provided, but if that's never checked in a batch script, as is the case with this issue, then the script will try to add an empty group to a test to run in a batch, and that will cause an assert The fix is two fold. Ensure that lst utility checks that a group is added with at least one node. If not the group is subsequently deleted. And the add_test command would fail, since the group no longer exists. The second fix is to ensure that the kernel module itself sanitizes its own API in this particular case, so that if a different utility is used other than lst to communicate with the selftest kernel module then this error would be caught. This fix looks up the batch and the groups, src and dst, in the ioctl handle and sanitizes that input at this point. If the group looked up either doesn't exist or doesn't have at least one ACTIVE node, then the command fails. NOTE:there are many other cases in the code where the selftest kernel module doesn't check for sanity of the input, but depends totally on the lst module to do such checks. Particularly around length of strings passed in. Thus it is possible to crash the selftest module if someone tries to create another userspace app to communicate with the selftest kernel module without ensuring sanity of the params sent to the kernel module. In effect, it's always assumed that lst is the front end for selftest and no other front end is to be used. Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3093 Lustre-change: http://review.whamcloud.com/6092 Signed-off-by: Amir Shehata <[email protected]> Reviewed-by: Isaac Huang <[email protected]> Reviewed-by: Liang Zhen <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

In ll_md_blocking_ast() the FID/resource comparison is incorrectly checking for fid_ver() stored in res_id.name[2] instead of name[1] changed since http://review.whamcloud.com/2271 (commit 4f91d5161d00) landed. This does not impact current clients, since name[2] and fid_ver() are always zero, but it could cause problems in the future. In ldlm_cli_enqueue_fini() use ldlm_res_eq() instead of comparing each of the resource fields separately. Use DLDLMRES/PLDLMRES when printing resource names everywhere. Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2901 Lustre-change: http://review.whamcloud.com/6592 Signed-off-by: Andreas Dilger <[email protected]> Signed-off-by: Lai Siyao <[email protected]> Reviewed-by: Johann Lombardi <[email protected]> Signed-off-by: Peng Tao <[email protected]>

- Unify request handler. It finds target for particular request and calls appropriate handler for it. Generic handlers are moved to the unified target code. The tgt_session_info is introduced to store all request-related data and passed to all handlers. - Pack reply in llog server functions early and use err_serious() - remove obsoleted llog_origin_handle_cancel(), it is not used anymore - remove push_ctxt/pop_ctxt from llog server function, it is based on OSD now. Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2145 Lustre-change: http://review.whamcloud.com/4826 Signed-off-by: Mikhail Pershin <[email protected]> Reviewed-by: Fan Yong <[email protected]> Reviewed-by: Alex Zhuravlev <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: James Simmons <[email protected]> Signed-off-by: Peng Tao <[email protected]>

MGC uses lvfs API to access local llogs blocking removal of old code - MGS is converted to use OSD API for local llogs - llog_is_empty() and llog_backup() are introduced - initial OSD start initialize run ctxt so llog can use it too - shrink dcache after initial scrub in osd-ldiskfs to avoid holding data of unlinked objects until umount. Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2059 Lustre-change: http://review.whamcloud.com/5049 Signed-off-by: Mikhail Pershin <[email protected]> Reviewed-by: Lai Siyao <[email protected]> Reviewed-by: Alex Zhuravlev <[email protected]> Reviewed-by: James Simmons <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

This happend with SLES11SP2 Lustre client, which in turn acts as an NFS server, exporting a subtree of an Lustre fs through NFS. We detected that whenever we are writing to a new file using, fx, 'echo blah > newfile', it will return ENOENT error. We found out that this was caused by the anonymous dentry. In SLESS11SP2, anonymous dentries are assigned '/' as the name, instead of an empty string. When MDT handles the intent_open call, it will look up the obj by the name if it is not an empty string, and thus couldn't find it. As MDS_OPEN_BY_FID is always set on this request, we never need to send the name in this request. The fid is already available and should be used in case the file has been renamed. Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3544 Lustre-change: http://review.whamcloud.com/6920 Signed-off-by: Cheng Shao <[email protected]> Signed-off-by: Patrick Farrell <[email protected]> Reviewed-by: Bob Glossman <[email protected]> Reviewed-by: Alexey Shvetsov <[email protected]> Reviewed-by: Lai Siyao <[email protected]> Reviewed-by: James Simmons <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

file_operations.readv/writev have been removed since v2.6.19 We can remove the test and the dead code. Lustre-change: http://review.whamcloud.com/5343 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2800 Signed-off-by: Jeff Mahoney <[email protected]> Signed-off-by: James Simmons <[email protected]> Reviewed-by: Bob Glossman <[email protected]> Reviewed-by: Christopher J. Morrone <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

is_compat_task has been defined on all arches since v2.6.29. We can remove the test and dead code. Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2800 Lustre-change: http://review.whamcloud.com/5393 Signed-off-by: Jeff Mahoney <[email protected]> Signed-off-by: James Simmons <[email protected]> Reviewed-by: Bob Glossman <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

llog_open() calls llog_alloc_handle() taking NULL as the error return value. But llog_alloc_handle() returns ERR_PTR(-ENOMEM) instead when error. Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3470 Lustre-change: http://review.whamcloud.com/6644 Signed-off-by: Li Xi <[email protected]> Reviewed-by: Mike Pershin <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Signed-off-by: Peng Tao <[email protected]>

In lov_dump_lmm(), convert the lmm_magic from little-endian to host-endian byte order before the switch statement, as the other lov_dump_xxx() and lov_verify_xxx() functions already do. Remove the unused macro LMM_ASSERT(). Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3297 Lustre-change: http://review.whamcloud.com/6290 Signed-off-by: John L. Hammond <[email protected]> Reviewed-by: Li Wei <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Signed-off-by: Peng Tao <[email protected]>

During race exp_flock_hash can be created 2 times. It is created & assigned without any lock. Move hash initialization from ldlm_flock_blocking_link() to ldlm_init_export() Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2835 Lustre-change: http://review.whamcloud.com/5471 Signed-off-by: Andriy Skulysh <[email protected]> Reviewed-by: Alexander Boyko <[email protected]> Reviewed-by: Vitaly Fertman <[email protected]> Tested-by: Kyrylo Shatskyy <[email protected]> Reviewed-by: Keith Mannthey <[email protected]> Reviewed-by: Prakash Surya <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

In mdc_iocontrol() add a goto to the end of the LL_IOC_HSM_STATE_SET case, preventing fall through into the next case. In the same function, replace the return statement in OBD_IOC_QUOTACTL with a goto, so that a reference to the module is not leaked. Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3576 Lustre-change: http://review.whamcloud.com/6962 Signed-off-by: John L. Hammond <[email protected]> Reviewed-by: Aurelien Degremont <[email protected]> Reviewed-by: jacques-Charles Lafoucriere <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

Remove the lo_depth member from struct lu_object. This field is never set and only read in lu_object_print(). Remove the lo_flags member. This field was only used in lu_object_alloc() and can be replaced with an on-stack mask to keep trace of which layers have been allocated. Lustre-change: http://review.whamcloud.com/5890 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3059 Signed-off-by: John L. Hammond <[email protected]> Reviewed-by: Alex Zhuravlev <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

This patch implements the HSM coordinator methods used by client to add/remove/list HSM actions on FID. Lustre-change: http://review.whamcloud.com/6532 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3341 Signed-off-by: JC Lafoucriere <[email protected]> Reviewed-by: John L. Hammond <[email protected]> Reviewed-by: Jinshan Xiong <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Signed-off-by: Peng Tao <[email protected]>

To move data with external storage, HSM coordinator uses a Copy Tool running on a client named agent. This patch implements the interface for these agents. Lustre-change: http://review.whamcloud.com/6534 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3342 Signed-off-by: JC Lafoucriere <[email protected]> Reviewed-by: Jinshan Xiong <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: John L. Hammond <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

OI scrub should has the ability to handle kinds of OI, including both the OI files on MDT and the /O directory on OST. We trust the FID in LMA for both MDT objects and OST objects. So if some /O sub-item does not match related LMA, then the /O will be updated, instead of the LMA. To guarantee that the OI scrub can run without MDT0 involved for FLDB, the OST object needs to store some flag in its LMA to tell OI scrub that it is for an OST object, no need to query the MDT0. Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3335 Lustre-change: http://review.whamcloud.com/6669 Signed-off-by: Fan Yong <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Alex Zhuravlev <[email protected]> Signed-off-by: Peng Tao <[email protected]>

Not all the OI inconsistency can be detected automatically, such as /O entry lost case. So the OI scrub on OST should can be triggered by the administrator manually. Lustre-change: http://review.whamcloud.com/6698 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3335 Signed-off-by: Fan Yong <[email protected]> Reviewed-by: Alex Zhuravlev <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

ll_lookup_it() checks for O_CREAT in struct lookup_intent's it_create_mode member which is nonsensical, as it_create_mode is used for file mode bits (S_IFREG, S_IRUSR, ...). Fix this by just checking for IT_CREATE in it_op and do the same in llu_lookup_it(). This will not affect the behavior of either function, since if O_CREATE (0100) is actually set in o_create_mode then IT_CREATE must have been set in it_op. In ll_atomic_open() check for O_CREAT in the open_flags parameter rather than testing mode. Lustre-change: http://review.whamcloud.com/6786 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3517 Signed-off-by: John L. Hammond <[email protected]> Reviewed-by: Lai Siyao <[email protected]> Reviewed-by: Peng Tao <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

Clean up the code from really unused variables and rearrange other to remove SET_BUT_UNUSED and UNUSED macro usage if possible. Clean up or comment on some dead code. Removed never implemented suspend timeouts functionality from pinger.c which was commented out since 2007 and going to be replaced by adaptive timeouts. Also removed all references to this functionality from ldlm_lockd.c, ldlm_request.c and import.c which actually nevers executes or do nothing. pinger.c commit d2d56f38da01001c92a09afc6b52b5acbd9bc13c Author: tappro <tappro> Date: Mon Jul 30 21:08:59 2007 +0000 - make HEAD from b_post_cmd3 Lustre-change: http://review.whamcloud.com/6139 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3204 Signed-off-by: Dmitry Eremin <[email protected]> Signed-off-by: Ned Bass <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

After removing LIBCFS_HAVE_IS_COMPAT_TASK test we have a compilation issue with kernels configured without CONFIG_COMPAT. Lustre-change: http://review.whamcloud.com/7118 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2800 Signed-off-by: Dmitry Eremin <[email protected]> Reviewed-by: James Simmons <[email protected]> Reviewed-by: Bob Glossman <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

When a system runs out of memory and the function ptlrpc_register_bulk() is called from ptl_send_rpc() the call to LNetMEAttach() fails due to failure to allocate memory. This forces the code into an error path, which most probably previously went untested. The error path: if (rc != 0) { CERROR("%s: LNetMEAttach failed x"LPU64"/%d: rc = %dn", desc->bd_export->exp_obd->obd_name, xid, posted_md, rc); break; } This print assumes that desc->bd_export is not NULL. However, it is. In fact it is expected to be NULL. desc->bd_import is the correct structure to access in this case. Lustre-change: http://review.whamcloud.com/7121 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3585 Signed-off-by: Amir Shehata <[email protected]> Reviewed-by: Liang Zhen <[email protected]> Reviewed-by: Doug Oucharek <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

HSM Release is one of the key feature of HSM. To perform HSM release, clients need to acquire the file lease exclusivelt and flush dirty cache from clients. A special close REQ will be sent to the MDT to release the lease and get rid of OST objects. Lustre-change: http://review.whamcloud.com/7028 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-1333 Signed-off-by: Aurelien Degremont <[email protected]> Signed-off-by: Jinshan Xiong <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Alex Zhuravlev <[email protected]> Signed-off-by: Peng Tao <[email protected]>

This patch implements an extended attribute cache for a Lustre client. It is organized as a write-through cache: reads are performed from cache, updates are sent synchronously to the MDS. An additional inode bit MDS_INODELOCK_XATTR is added to protect the cache. Lustre-change: http://review.whamcloud.com/5537 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2869 Signed-off-by: Andrew Perepechko <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Jinshan Xiong <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

This change adds a priority parameter to the route module settings. This paramter can be >= 0. Like hops, the lower the prioirty number the higher the priority. So lower numbered priorities will be selected over higher numbers. Lustre-change: http://review.whamcloud.com/5663 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2934 Signed-off-by: Doug Oucharek <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Amir Shehata <[email protected]> Reviewed-by: Isaac Huang <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

Import creates a released file using new RAID pattern flag Import used a new ioctl() to implement the import in the client kernel. Add a -L|--layout option to lfs getstripe and lfs find Lustre-change: http://review.whamcloud.com/6536 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3363 Signed-off-by: JC Lafoucriere <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Reviewed-by: Jinshan Xiong <[email protected]> [Fix up kuid_t/guid_t conversion in llite/file.c -- Peng Tao] Signed-off-by: Peng Tao <[email protected]>

- Move OUT handler to the unified target code, so it can be used by both MDS and OST. - Make OFD able to handle OUT requests. - Rename MDS_MDS_PORTAL to the OUT_PORTAL and MDS_OUT_... defines just OUT_... since they are independent from MDS now. Lustre-change: http://review.whamcloud.com/6763 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3467 Signed-off-by: Mikhail Pershin <[email protected]> Reviewed-by: Alex Zhuravlev <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Signed-off-by: Peng Tao <[email protected]>

move SEQ handler from MDT to the FID module so it can be used from any target. Lustre-change: http://review.whamcloud.com/6765 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3467 Signed-off-by: Mikhail Pershin <[email protected]> Reviewed-by: Alex Zhuravlev <[email protected]> Reviewed-by: wangdi <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

sequence controller (aka master sequence server) exports FLDB in proc/fs/lustre/seq/ctl-*/fldb. the same entry can be used to modify FLDB using available sequences: lctl set_param seq.ctl*.fldb="[0x0000000280000400-0x0000000280000600):0:mdt" Lustre-change: http://review.whamcloud.com/7027 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3565 Signed-off-by: Alex Zhuravlev <[email protected]> Signed-off-by: Jinshan Xiong <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> Signed-off-by: Peng Tao <[email protected]>

Hold the lock protecting page tree (coh_page_guard) till the point the page is being accessed. Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2940 Lustre-change: http://review.whamcloud.com/7192 Signed-off-by: Swapnil Pimpale <[email protected]> Reviewed-by: Jinshan Xiong <[email protected]> Reviewed-by: John L. Hammond <[email protected]> Reviewed-by: Lai Siyao <[email protected]> Reviewed-by: Oleg Drokin <[email protected]> Signed-off-by: Peng Tao <[email protected]>

…urces. When we change group_thread_cnt from sysfs entry, it can OOPS. The kernel messages are: [ 135.299021] BUG: unable to handle kernel NULL pointer dereference at (null) [ 135.299073] IP: [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440 [ 135.299107] PGD 0 [ 135.299122] Oops: 0000 [#1] SMP [ 135.299144] Modules linked in: netconsole e1000e ptp pps_core [ 135.299188] CPU: 3 PID: 2225 Comm: md0_raid5 Not tainted 3.12.0+ #24 [ 135.299214] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015 11/09/2011 [ 135.299255] task: ffff8800b9638f80 ti: ffff8800b77a4000 task.ti: ffff8800b77a4000 [ 135.299283] RIP: 0010:[<ffffffff815188ab>] [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440 [ 135.299323] RSP: 0018:ffff8800b77a5c48 EFLAGS: 00010002 [ 135.299344] RAX: ffff880037bb5c70 RBX: 0000000000000000 RCX: 0000000000000008 [ 135.299371] RDX: ffff880037bb5cb8 RSI: 0000000000000001 RDI: ffff880037bb5c00 [ 135.299398] RBP: ffff8800b77a5d08 R08: 0000000000000001 R09: 0000000000000000 [ 135.299425] R10: ffff8800b77a5c98 R11: 00000000ffffffff R12: ffff880037bb5c00 [ 135.299452] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880037bb5c70 [ 135.299479] FS: 0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000 [ 135.299510] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 135.299532] CR2: 0000000000000000 CR3: 0000000001c0b000 CR4: 00000000000407e0 [ 135.299559] Stack: [ 135.299570] ffff8800b77a5c88 ffffffff8107383e ffff8800b77a5c88 ffff880037a64300 [ 135.299611] 000000000000ec08 ffff880037bb5cb8 ffff8800b77a5c98 ffffffffffffffd8 [ 135.299654] 000000000000ec08 ffff880037bb5c60 ffff8800b77a5c98 ffff8800b77a5c98 [ 135.299696] Call Trace: [ 135.299711] [<ffffffff8107383e>] ? __wake_up+0x4e/0x70 [ 135.299733] [<ffffffff81518f88>] raid5d+0x4c8/0x680 [ 135.299756] [<ffffffff817174ed>] ? schedule_timeout+0x15d/0x1f0 [ 135.299781] [<ffffffff81524c9f>] md_thread+0x11f/0x170 [ 135.299804] [<ffffffff81069cd0>] ? wake_up_bit+0x40/0x40 [ 135.299826] [<ffffffff81524b80>] ? md_rdev_init+0x110/0x110 [ 135.299850] [<ffffffff81069656>] kthread+0xc6/0xd0 [ 135.299871] [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70 [ 135.299899] [<ffffffff81722ffc>] ret_from_fork+0x7c/0xb0 [ 135.299923] [<ffffffff81069590>] ? kthread_freezable_should_stop+0x70/0x70 [ 135.299951] Code: ff ff ff 0f 84 d7 fe ff ff e9 5c fe ff ff 66 90 41 8b b4 24 d8 01 00 00 45 31 ed 85 f6 0f 8e 7b fd ff ff 49 8b 9c 24 d0 01 00 00 <48> 3b 1b 49 89 dd 0f 85 67 fd ff ff 48 8d 43 28 31 d2 eb 17 90 [ 135.300005] RIP [<ffffffff815188ab>] handle_active_stripes+0x32b/0x440 [ 135.300005] RSP <ffff8800b77a5c48> [ 135.300005] CR2: 0000000000000000 [ 135.300005] ---[ end trace 504854e5bb7562ed ]--- [ 135.300005] Kernel panic - not syncing: Fatal exception This is because raid5d() can be running when the multi-thread resources are changed via system. We see need to provide locking. mddev->device_lock is suitable, but we cannot simple call alloc_thread_groups under this lock as we cannot allocate memory while holding a spinlock. So change alloc_thread_groups() to allocate and return the data structures, then raid5_store_group_thread_cnt() can take the lock while updating the pointers to the data structures. This fixes a bug introduced in 3.12 and so is suitable for the 3.12.x stable series. Fixes: b721420 Cc: [email protected] (3.12) Signed-off-by: Jianpeng Ma <[email protected]> Signed-off-by: NeilBrown <[email protected]> Reviewed-by: Shaohua Li <[email protected]>

After commit e9e4ea7 "net: smc91x: dont't use SMC_outw for fixing up halfword-aligned data" The Versatile SMSC LAN91C111 is crashing like this: ------------[ cut here ]------------ kernel BUG at /home/linus/linux/drivers/net/ethernet/smsc/smc91x.c:599! Internal error: Oops - BUG: 0 [#1] ARM Modules linked in: CPU: 0 PID: 43 Comm: udhcpc Not tainted 3.13.0-rc1+ #24 task: c6ccfaa0 ti: c6cd0000 task.ti: c6cd0000 PC is at smc_hardware_send_pkt+0x198/0x22c LR is at smc_hardware_send_pkt+0x24/0x22c pc : [<c01be324>] lr : [<c01be1b0>] psr: 20000013 sp : c6cd1d08 ip : 00000001 fp : 00000000 r10: c02adb08 r9 : 00000000 r8 : c6ced802 r7 : c786fba0 r6 : 00000146 r5 : c8800000 r4 : c78d6000 r3 : 0000000f r2 : 00000146 r1 : 00000000 r0 : 00000031 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 0005317f Table: 06cf4000 DAC: 00000015 Process udhcpc (pid: 43, stack limit = 0xc6cd01c0) Stack: (0xc6cd1d08 to 0xc6cd2000) 1d00: 00000010 c8800000 c78d6000 c786fba0 c78d6000 c01be868 1d20: c01be7a4 00004000 00000000 c786fba0 c6c12b80 c0208554 000004d0 c780fc60 1d40: 00000220 c01fb734 00000000 00000000 00000000 c6c9a440 c6c12b80 c78d6000 1d60: c786fba0 c6c9a440 00000000 c021d1d8 00000000 00000000 c6c12b80 c78d6000 1d80: c786fba0 00000001 c6c9a440 c02087f8 c6c9a4a0 00080008 00000000 00000000 1da0: c78d6000 c786fba0 c78d6000 00000138 00000000 00000000 00000000 00000000 1dc0: 00000000 c027ba74 00000138 00000138 00000001 00000010 c6cedc00 00000000 1de0: 00000008 c7404400 c6cd1eec c6cd1f14 c067a73c c065c0b8 00000000 c067a740 1e00: 01ffffff 002040d0 00000000 00000000 00000000 00000000 00000000 ffffffff 1e20: 43004400 00110022 c6cdef20 c027ae8c c6ccfaa0 be82d65c 00000014 be82d3cc 1e40: 00000000 00000000 00000000 c01f2870 00000000 00000000 00000000 c6cd1e88 1e60: c6ccfaa0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 1e80: 00000000 00000000 00000031 c7802310 c7802300 00000138 c7404400 c0771da0 1ea0: 00000000 c6cd1eec c7800340 00000138 be82d65c 00000014 be82d3cc c6cd1f08 1ec0: 00000014 00000000 c7404400 c7404400 00000138 c01f4628 c78d6000 00000000 1ee0: 00000000 be82d3cc 00000138 c6cd1f08 00000014 c6cd1ee4 00000001 00000000 1f00: 00000000 00000000 00080011 00000002 06000000 ffffffff 0000ffff 00000002 1f20: 06000000 ffffffff 0000ffff c00928c8 c065c52 c6cd1f58 00000003 c009299c 1f40: 00000003 c065c52 c7404400 00000000 c7404400 c01f2218 c78106b0 c7441cb0 1f60: 00000000 00000006 c06799fc 00000000 00000000 00000006 00000000 c01f3ee0 1f80: 00000000 00000000 be82d678 be82d65c 00000014 00000001 00000122 c00139c8 1fa0: c6cd0000 c0013840 be82d65c 00000014 00000006 be82d3cc 00000138 00000000 1fc0: be82d65c 00000014 00000001 00000122 00000000 00000000 00018cb1 00000000 1fe0: 00003801 be82d3a8 0003a0c7 b6e9af08 60000010 00000006 00000000 00000000 [<c01be324>] (smc_hardware_send_pkt+0x198/0x22c) from [<c01be868>] (smc_hard_start_xmit+0xc4/0x1e8) [<c01be868>] (smc_hard_start_xmit+0xc4/0x1e8) from [<c0208554>] (dev_hard_start_xmit+0x460/0x4cc) [<c0208554>] (dev_hard_start_xmit+0x460/0x4cc) from [<c021d1d8>] (sch_direct_xmit+0x94/0x18c) [<c021d1d8>] (sch_direct_xmit+0x94/0x18c) from [<c02087f8>] (dev_queue_xmit+0x238/0x42c) [<c02087f8>] (dev_queue_xmit+0x238/0x42c) from [<c027ba74>] (packet_sendmsg+0xbe8/0xd28) [<c027ba74>] (packet_sendmsg+0xbe8/0xd28) from [<c01f2870>] (sock_sendmsg+0x84/0xa8) [<c01f2870>] (sock_sendmsg+0x84/0xa8) from [<c01f4628>] (SyS_sendto+0xb8/0xdc) [<c01f4628>] (SyS_sendto+0xb8/0xdc) from [<c0013840>] (ret_fast_syscall+0x0/0x2c) Code: e3130002 1a000001 e3130001 0affffcd (e7f001f2) ---[ end trace 81104fe70e8da7fe ]--- Kernel panic - not syncing: Fatal exception in interrupt This is because the macro operations in smc91x.h defined for Versatile are missing SMC_outsw() as used in this commit. The Versatile needs and uses the same accessors as the other platforms in the first if(...) clause, just switch it to using that and we have one problem less to worry about. Checkpatch complains about spacing, but I have opted to follow the style of this .h-file. Cc: Russell King <[email protected]> Cc: Nicolas Pitre <[email protected]> Cc: Eric Miao <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Signed-off-by: Linus Walleij <[email protected]> Signed-off-by: David S. Miller <[email protected]>

After commit e9e4ea7 "net: smc91x: dont't use SMC_outw for fixing up halfword-aligned data" The Versatile SMSC LAN91C111 is crashing like this: ------------[ cut here ]------------ kernel BUG at /home/linus/linux/drivers/net/ethernet/smsc/smc91x.c:599! Internal error: Oops - BUG: 0 [#1] ARM Modules linked in: CPU: 0 PID: 43 Comm: udhcpc Not tainted 3.13.0-rc1+ #24 task: c6ccfaa0 ti: c6cd0000 task.ti: c6cd0000 PC is at smc_hardware_send_pkt+0x198/0x22c LR is at smc_hardware_send_pkt+0x24/0x22c pc : [<c01be324>] lr : [<c01be1b0>] psr: 20000013 sp : c6cd1d08 ip : 00000001 fp : 00000000 r10: c02adb08 r9 : 00000000 r8 : c6ced802 r7 : c786fba0 r6 : 00000146 r5 : c8800000 r4 : c78d6000 r3 : 0000000f r2 : 00000146 r1 : 00000000 r0 : 00000031 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 0005317f Table: 06cf4000 DAC: 00000015 Process udhcpc (pid: 43, stack limit = 0xc6cd01c0) Stack: (0xc6cd1d08 to 0xc6cd2000) 1d00: 00000010 c8800000 c78d6000 c786fba0 c78d6000 c01be868 1d20: c01be7a4 00004000 00000000 c786fba0 c6c12b80 c0208554 000004d0 c780fc60 1d40: 00000220 c01fb734 00000000 00000000 00000000 c6c9a440 c6c12b80 c78d6000 1d60: c786fba0 c6c9a440 00000000 c021d1d8 00000000 00000000 c6c12b80 c78d6000 1d80: c786fba0 00000001 c6c9a440 c02087f8 c6c9a4a0 00080008 00000000 00000000 1da0: c78d6000 c786fba0 c78d6000 00000138 00000000 00000000 00000000 00000000 1dc0: 00000000 c027ba74 00000138 00000138 00000001 00000010 c6cedc00 00000000 1de0: 00000008 c7404400 c6cd1eec c6cd1f14 c067a73c c065c0b8 00000000 c067a740 1e00: 01ffffff 002040d0 00000000 00000000 00000000 00000000 00000000 ffffffff 1e20: 43004400 00110022 c6cdef20 c027ae8c c6ccfaa0 be82d65c 00000014 be82d3cc 1e40: 00000000 00000000 00000000 c01f2870 00000000 00000000 00000000 c6cd1e88 1e60: c6ccfaa0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 1e80: 00000000 00000000 00000031 c7802310 c7802300 00000138 c7404400 c0771da0 1ea0: 00000000 c6cd1eec c7800340 00000138 be82d65c 00000014 be82d3cc c6cd1f08 1ec0: 00000014 00000000 c7404400 c7404400 00000138 c01f4628 c78d6000 00000000 1ee0: 00000000 be82d3cc 00000138 c6cd1f08 00000014 c6cd1ee4 00000001 00000000 1f00: 00000000 00000000 00080011 00000002 06000000 ffffffff 0000ffff 00000002 1f20: 06000000 ffffffff 0000ffff c00928c8 c065c52 c6cd1f58 00000003 c009299c 1f40: 00000003 c065c52 c7404400 00000000 c7404400 c01f2218 c78106b0 c7441cb0 1f60: 00000000 00000006 c06799fc 00000000 00000000 00000006 00000000 c01f3ee0 1f80: 00000000 00000000 be82d678 be82d65c 00000014 00000001 00000122 c00139c8 1fa0: c6cd0000 c0013840 be82d65c 00000014 00000006 be82d3cc 00000138 00000000 1fc0: be82d65c 00000014 00000001 00000122 00000000 00000000 00018cb1 00000000 1fe0: 00003801 be82d3a8 0003a0c7 b6e9af08 60000010 00000006 00000000 00000000 [<c01be324>] (smc_hardware_send_pkt+0x198/0x22c) from [<c01be868>] (smc_hard_start_xmit+0xc4/0x1e8) [<c01be868>] (smc_hard_start_xmit+0xc4/0x1e8) from [<c0208554>] (dev_hard_start_xmit+0x460/0x4cc) [<c0208554>] (dev_hard_start_xmit+0x460/0x4cc) from [<c021d1d8>] (sch_direct_xmit+0x94/0x18c) [<c021d1d8>] (sch_direct_xmit+0x94/0x18c) from [<c02087f8>] (dev_queue_xmit+0x238/0x42c) [<c02087f8>] (dev_queue_xmit+0x238/0x42c) from [<c027ba74>] (packet_sendmsg+0xbe8/0xd28) [<c027ba74>] (packet_sendmsg+0xbe8/0xd28) from [<c01f2870>] (sock_sendmsg+0x84/0xa8) [<c01f2870>] (sock_sendmsg+0x84/0xa8) from [<c01f4628>] (SyS_sendto+0xb8/0xdc) [<c01f4628>] (SyS_sendto+0xb8/0xdc) from [<c0013840>] (ret_fast_syscall+0x0/0x2c) Code: e3130002 1a000001 e3130001 0affffcd (e7f001f2) ---[ end trace 81104fe70e8da7fe ]--- Kernel panic - not syncing: Fatal exception in interrupt This is because the macro operations in smc91x.h defined for Versatile are missing SMC_outsw() as used in this commit. The Versatile needs and uses the same accessors as the other platforms in the first if(...) clause, just switch it to using that and we have one problem less to worry about. This includes a hunk of a patch from Will Deacon fixin the other 32bit platforms as well: Innokom, Ramses, PXA, PCM027. Checkpatch complains about spacing, but I have opted to follow the style of this .h-file. Cc: Russell King <[email protected]> Cc: Nicolas Pitre <[email protected]> Cc: Eric Miao <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: [email protected] Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Linus Walleij <[email protected]> Signed-off-by: David S. Miller <[email protected]>

[ Upstream commit 45caeaa ] As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: #8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . #9 [] tcp_rcv_established at ffffffff81580b64 #10 [] tcp_v4_do_rcv at ffffffff8158b54a #11 [] tcp_v4_rcv at ffffffff8158cd02 #12 [] ip_local_deliver_finish at ffffffff815668f4 #13 [] ip_local_deliver at ffffffff81566bd9 #14 [] ip_rcv_finish at ffffffff8156656d #15 [] ip_rcv at ffffffff81566f06 #16 [] __netif_receive_skb_core at ffffffff8152b3a2 #17 [] __netif_receive_skb at ffffffff8152b608 #18 [] netif_receive_skb at ffffffff8152b690 #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] #21 [] net_rx_action at ffffffff8152bac2 #22 [] __do_softirq at ffffffff81084b4f #23 [] call_softirq at ffffffff8164845c #24 [] do_softirq at ffffffff81016fc5 #25 [] irq_exit at ffffffff81084ee5 #26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <[email protected]> Cc: Hannes Sowa <[email protected]> Signed-off-by: Jon Maxwell <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

I compiled with AddressSanitizer and I had these memory leaks while I was using the tep_parse_format function: Direct leak of 28 byte(s) in 4 object(s) allocated from: #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe) #1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985 #2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140 #3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206 #4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291 #5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299 #6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849 #7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161 #8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207 #9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786 #10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285 #11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369 #12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335 #13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389 #14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431 #15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251 #16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284 #17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593 #18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727 #19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048 #20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127 #21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152 #22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252 #23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347 #24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461 #25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673 #26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) The token variable in the process_dynamic_array_len function is allocated in the read_expect_type function, but is not freed before calling the read_token function. Free the token variable before calling read_token in order to plug the leak. Signed-off-by: Philippe Duplessis-Guindon <[email protected]> Reviewed-by: Steven Rostedt (VMware) <[email protected]> Link: https://lore.kernel.org/linux-trace-devel/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

The following deadlock was captured. The first process is holding 'kernfs_mutex' and hung by io. The io was staging in 'r1conf.pending_bio_list' of raid1 device, this pending bio list would be flushed by second process 'md127_raid1', but it was hung by 'kernfs_mutex'. Using sysfs_notify_dirent_safe() to replace sysfs_notify() can fix it. There were other sysfs_notify() invoked from io path, removed all of them. PID: 40430 TASK: ffff8ee9c8c65c40 CPU: 29 COMMAND: "probe_file" #0 [ffffb87c4df37260] __schedule at ffffffff9a8678ec #1 [ffffb87c4df372f8] schedule at ffffffff9a867f06 #2 [ffffb87c4df37310] io_schedule at ffffffff9a0c73e6 #3 [ffffb87c4df37328] __dta___xfs_iunpin_wait_3443 at ffffffffc03a4057 [xfs] #4 [ffffb87c4df373a0] xfs_iunpin_wait at ffffffffc03a6c79 [xfs] #5 [ffffb87c4df373b0] __dta_xfs_reclaim_inode_3357 at ffffffffc039a46c [xfs] #6 [ffffb87c4df37400] xfs_reclaim_inodes_ag at ffffffffc039a8b6 [xfs] #7 [ffffb87c4df37590] xfs_reclaim_inodes_nr at ffffffffc039bb33 [xfs] #8 [ffffb87c4df375b0] xfs_fs_free_cached_objects at ffffffffc03af0e9 [xfs] #9 [ffffb87c4df375c0] super_cache_scan at ffffffff9a287ec7 #10 [ffffb87c4df37618] shrink_slab at ffffffff9a1efd93 #11 [ffffb87c4df37700] shrink_node at ffffffff9a1f5968 #12 [ffffb87c4df37788] do_try_to_free_pages at ffffffff9a1f5ea2 #13 [ffffb87c4df377f0] try_to_free_mem_cgroup_pages at ffffffff9a1f6445 #14 [ffffb87c4df37880] try_charge at ffffffff9a26cc5f #15 [ffffb87c4df37920] memcg_kmem_charge_memcg at ffffffff9a270f6a #16 [ffffb87c4df37958] new_slab at ffffffff9a251430 #17 [ffffb87c4df379c0] ___slab_alloc at ffffffff9a251c85 #18 [ffffb87c4df37a80] __slab_alloc at ffffffff9a25635d #19 [ffffb87c4df37ac0] kmem_cache_alloc at ffffffff9a251f89 #20 [ffffb87c4df37b00] alloc_inode at ffffffff9a2a2b10 #21 [ffffb87c4df37b20] iget_locked at ffffffff9a2a4854 #22 [ffffb87c4df37b60] kernfs_get_inode at ffffffff9a311377 #23 [ffffb87c4df37b80] kernfs_iop_lookup at ffffffff9a311e2b #24 [ffffb87c4df37ba8] lookup_slow at ffffffff9a290118 #25 [ffffb87c4df37c10] walk_component at ffffffff9a291e83 #26 [ffffb87c4df37c78] path_lookupat at ffffffff9a293619 #27 [ffffb87c4df37cd8] filename_lookup at ffffffff9a2953af #28 [ffffb87c4df37de8] user_path_at_empty at ffffffff9a295566 #29 [ffffb87c4df37e10] vfs_statx at ffffffff9a289787 #30 [ffffb87c4df37e70] SYSC_newlstat at ffffffff9a289d5d #31 [ffffb87c4df37f18] sys_newlstat at ffffffff9a28a60e #32 [ffffb87c4df37f28] do_syscall_64 at ffffffff9a003949 #33 [ffffb87c4df37f50] entry_SYSCALL_64_after_hwframe at ffffffff9aa001ad RIP: 00007f617a5f2905 RSP: 00007f607334f838 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00007f6064044b20 RCX: 00007f617a5f2905 RDX: 00007f6064044b20 RSI: 00007f6064044b20 RDI: 00007f6064005890 RBP: 00007f6064044aa0 R8: 0000000000000030 R9: 000000000000011c R10: 0000000000000013 R11: 0000000000000246 R12: 00007f606417e6d0 R13: 00007f6064044aa0 R14: 00007f6064044b10 R15: 00000000ffffffff ORIG_RAX: 0000000000000006 CS: 0033 SS: 002b PID: 927 TASK: ffff8f15ac5dbd80 CPU: 42 COMMAND: "md127_raid1" #0 [ffffb87c4df07b28] __schedule at ffffffff9a8678ec #1 [ffffb87c4df07bc0] schedule at ffffffff9a867f06 #2 [ffffb87c4df07bd8] schedule_preempt_disabled at ffffffff9a86825e #3 [ffffb87c4df07be8] __mutex_lock at ffffffff9a869bcc #4 [ffffb87c4df07ca0] __mutex_lock_slowpath at ffffffff9a86a013 #5 [ffffb87c4df07cb0] mutex_lock at ffffffff9a86a04f #6 [ffffb87c4df07cc8] kernfs_find_and_get_ns at ffffffff9a311d83 #7 [ffffb87c4df07cf0] sysfs_notify at ffffffff9a314b3a #8 [ffffb87c4df07d18] md_update_sb at ffffffff9a688696 #9 [ffffb87c4df07d98] md_update_sb at ffffffff9a6886d5 #10 [ffffb87c4df07da8] md_check_recovery at ffffffff9a68ad9c #11 [ffffb87c4df07dd0] raid1d at ffffffffc01f0375 [raid1] #12 [ffffb87c4df07ea0] md_thread at ffffffff9a680348 #13 [ffffb87c4df07f08] kthread at ffffffff9a0b8005 #14 [ffffb87c4df07f50] ret_from_fork at ffffffff9aa00344 Signed-off-by: Junxiao Bi <[email protected]> Signed-off-by: Song Liu <[email protected]>

For early sections, its memmap is handled specially even sub-section is enabled. The memmap could only be populated as a whole. Quoted from the comment of section_activate(): * The early init code does not consider partially populated * initial sections, it simply assumes that memory will never be * referenced. If we hot-add memory into such a section then we * do not need to populate the memmap and can simply reuse what * is already there. While current section_deactivate() breaks this rule. When hot-remove a sub-section, section_deactivate() would depopulate its memmap. The consequence is if we hot-add this subsection again, its memmap never get proper populated. We can reproduce the case by following steps: 1. Hacking qemu to allow sub-section early section : diff --git a/hw/i386/pc.c b/hw/i386/pc.c : index 51b3050d01..c6a78d83c0 100644 : --- a/hw/i386/pc.c : +++ b/hw/i386/pc.c : @@ -1010,7 +1010,7 @@ void pc_memory_init(PCMachineState *pcms, : } : : machine->device_memory->base = : - ROUND_UP(0x100000000ULL + x86ms->above_4g_mem_size, 1 * GiB); : + 0x100000000ULL + x86ms->above_4g_mem_size; : : if (pcmc->enforce_aligned_dimm) { : /* size device region assuming 1G page max alignment per slot */ 2. Bootup qemu with PSE disabled and a sub-section aligned memory size Part of the qemu command would look like this: sudo x86_64-softmmu/qemu-system-x86_64 \ --enable-kvm -cpu host,pse=off \ -m 4160M,maxmem=20G,slots=1 \ -smp sockets=2,cores=16 \ -numa node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 \ -machine pc,nvdimm \ -nographic \ -object memory-backend-ram,id=mem0,size=8G \ -device nvdimm,id=vm0,memdev=mem0,node=0,addr=0x144000000,label-size=128k 3. Re-config a pmem device with sub-section size in guest ndctl create-namespace --force --reconfig=namespace0.0 --mode=devdax --size=16M Then you would see the following call trace: pmem0: detected capacity change from 0 to 16777216 BUG: unable to handle page fault for address: ffffec73c51000b4 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 81ff8067 P4D 81ff8067 PUD 81ff7067 PMD 1437cb067 PTE 0 Oops: 0002 [#1] SMP NOPTI CPU: 16 PID: 1348 Comm: ndctl Kdump: loaded Tainted: G W 5.8.0-rc2+ #24 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.4 RIP: 0010:memmap_init_zone+0x154/0x1c2 Code: 77 16 f6 40 10 02 74 10 48 03 48 08 48 89 cb 48 c1 eb 0c e9 3a ff ff ff 48 89 df 48 c1 e7 06 48f RSP: 0018:ffffbdc7011a39b0 EFLAGS: 00010282 RAX: ffffec73c5100088 RBX: 0000000000144002 RCX: 0000000000144000 RDX: 0000000000000004 RSI: 007ffe0000000000 RDI: ffffec73c5100080 RBP: 027ffe0000000000 R08: 0000000000000001 R09: ffff9f8d38f6d708 R10: ffffec73c0000000 R11: 0000000000000000 R12: 0000000000000004 R13: 0000000000000001 R14: 0000000000144200 R15: 0000000000000000 FS: 00007efe6b65d780(0000) GS:ffff9f8d3f780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffec73c51000b4 CR3: 000000007d718000 CR4: 0000000000340ee0 Call Trace: move_pfn_range_to_zone+0x128/0x150 memremap_pages+0x4e4/0x5a0 devm_memremap_pages+0x1e/0x60 dev_dax_probe+0x69/0x160 [device_dax] really_probe+0x298/0x3c0 driver_probe_device+0xe1/0x150 ? driver_allows_async_probing+0x50/0x50 bus_for_each_drv+0x7e/0xc0 __device_attach+0xdf/0x160 bus_probe_device+0x8e/0xa0 device_add+0x3b9/0x740 __devm_create_dev_dax+0x127/0x1c0 __dax_pmem_probe+0x1f2/0x219 [dax_pmem_core] dax_pmem_probe+0xc/0x1b [dax_pmem] nvdimm_bus_probe+0x69/0x1c0 [libnvdimm] really_probe+0x147/0x3c0 driver_probe_device+0xe1/0x150 device_driver_attach+0x53/0x60 bind_store+0xd1/0x110 kernfs_fop_write+0xce/0x1b0 vfs_write+0xb6/0x1a0 ksys_write+0x5f/0xe0 do_syscall_64+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: ba72b4c ("mm/sparsemem: support sub-section hotplug") Signed-off-by: Wei Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Dan Williams <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>

If the specified/hinted sink is not reachable from a subset of the CPUs, we could end up unable to trace the event on those CPUs. This is the best effort we could do until we support 1:1 configurations. Fail gracefully in such cases avoiding a WARN_ON, which can be easily triggered by the user on certain platforms (Arm N1SDP), with the following trace paths : CPU0 \ -- Funnel0 --> ETF0 --> / \ CPU1 \ MainFunnel CPU2 / \ / -- Funnel1 --> ETF1 --> / CPU1 $ perf record --per-thread -e cs_etm/@ETF0/u -- <app> could trigger the following WARNING, when the event is scheduled on CPU2. [10919.513250] ------------[ cut here ]------------ [10919.517861] WARNING: CPU: 2 PID: 24021 at drivers/hwtracing/coresight/coresight-etm-perf.c:316 etm_event_start+0xf8/0x100 ... [10919.564403] CPU: 2 PID: 24021 Comm: perf Not tainted 5.8.0+ #24 [10919.570308] pstate: 80400089 (Nzcv daIf +PAN -UAO BTYPE=--) [10919.575865] pc : etm_event_start+0xf8/0x100 [10919.580034] lr : etm_event_start+0x80/0x100 [10919.584202] sp : fffffe001932f940 [10919.587502] x29: fffffe001932f940 x28: fffffc834995f800 [10919.592799] x27: 0000000000000000 x26: fffffe0011f3ced0 [10919.598095] x25: fffffc837fce244c x24: fffffc837fce2448 [10919.603391] x23: 0000000000000002 x22: fffffc8353529c00 [10919.608688] x21: fffffc835bb31000 x20: 0000000000000000 [10919.613984] x19: fffffc837fcdcc70 x18: 0000000000000000 [10919.619281] x17: 0000000000000000 x16: 0000000000000000 [10919.624577] x15: 0000000000000000 x14: 00000000000009f8 [10919.629874] x13: 00000000000009f8 x12: 0000000000000018 [10919.635170] x11: 0000000000000000 x10: 0000000000000000 [10919.640467] x9 : fffffe00108cd168 x8 : 0000000000000000 [10919.645763] x7 : 0000000000000020 x6 : 0000000000000001 [10919.651059] x5 : 0000000000000002 x4 : 0000000000000001 [10919.656356] x3 : 0000000000000000 x2 : 0000000000000000 [10919.661652] x1 : fffffe836eb40000 x0 : 0000000000000000 [10919.666949] Call trace: [10919.669382] etm_event_start+0xf8/0x100 [10919.673203] etm_event_add+0x40/0x60 [10919.676765] event_sched_in.isra.134+0xcc/0x210 [10919.681281] merge_sched_in+0xb0/0x2a8 [10919.685017] visit_groups_merge.constprop.140+0x15c/0x4b8 [10919.690400] ctx_sched_in+0x15c/0x170 [10919.694048] perf_event_sched_in+0x6c/0xa0 [10919.698130] ctx_resched+0x60/0xa0 [10919.701517] perf_event_exec+0x288/0x2f0 [10919.705425] begin_new_exec+0x4c8/0xf58 [10919.709247] load_elf_binary+0x66c/0xf30 [10919.713155] exec_binprm+0x15c/0x450 [10919.716716] __do_execve_file+0x508/0x748 [10919.720711] __arm64_sys_execve+0x40/0x50 [10919.724707] do_el0_svc+0xf4/0x1b8 [10919.728095] el0_sync_handler+0xf8/0x124 [10919.732003] el0_sync+0x140/0x180 Even though we don't support using separate sinks for the ETMs yet (e.g, for 1:1 configurations), we should at least honor the user's choice and handle the limitations gracefully, by simply skipping the tracing on ETMs which can't reach the requested sink. Fixes: f9d81a6 ("coresight: perf: Allow tracing on hotplugged CPUs") Cc: Mathieu Poirier <[email protected]> Cc: Mike Leach <[email protected]> Reported-by: Jeremy Linton <[email protected]> Tested-by: Jeremy Linton <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Signed-off-by: Mathieu Poirier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

This fix is for a failure that occurred in the DWARF unwind perf test. Stack unwinders may probe memory when looking for frames. Memory sanitizer will poison and track uninitialized memory on the stack, and on the heap if the value is copied to the heap. This can lead to false memory sanitizer failures for the use of an uninitialized value. Avoid this problem by removing the poison on the copied stack. The full msan failure with track origins looks like: ==2168==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x559ceb10755b in handle_cfi elfutils/libdwfl/frame_unwind.c:648:8 #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 #5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 #6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 #7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 #8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 #9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 #10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 #11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) #12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 #13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 #14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 #15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 #16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 #17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 #18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 #19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 #20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 #21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 #22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 #23 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceb106acf in __libdwfl_frame_reg_set elfutils/libdwfl/frame_unwind.c:77:22 #1 0x559ceb106acf in handle_cfi elfutils/libdwfl/frame_unwind.c:627:13 #2 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #3 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #4 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #5 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 #6 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 #7 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 #8 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 #9 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 #10 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 #11 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 #12 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) #13 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 #14 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 #15 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 #16 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 #17 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 #18 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 #19 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 #20 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 #21 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 #22 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 #23 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 #24 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceb106a54 in handle_cfi elfutils/libdwfl/frame_unwind.c:613:9 #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 #5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 #6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 #7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 #8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 #9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 #10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 #11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) #12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 #13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 #14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 #15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 #16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 #17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 #18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 #19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 #20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 #21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 #22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 #23 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceaff8800 in memory_read tools/perf/util/unwind-libdw.c:156:10 #1 0x559ceb10f053 in expr_eval elfutils/libdwfl/frame_unwind.c:501:13 #2 0x559ceb1060cc in handle_cfi elfutils/libdwfl/frame_unwind.c:603:18 #3 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #4 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #5 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #6 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 #7 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 #8 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 #9 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 #10 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 #11 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 #12 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 #13 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) #14 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 #15 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 #16 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 #17 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 #18 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 #19 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 #20 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 #21 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 #22 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 #23 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 #24 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 #25 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559cea9027d9 in __msan_memcpy llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1558:3 #1 0x559cea9d2185 in sample_ustack tools/perf/arch/x86/tests/dwarf-unwind.c:41:2 #2 0x559cea9d202c in test__arch_unwind_sample tools/perf/arch/x86/tests/dwarf-unwind.c:72:9 #3 0x559ceabc9cbd in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:106:6 #4 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 #5 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) #6 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 #7 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 #8 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 #9 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 #10 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 #11 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 #12 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 #13 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 #14 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 #15 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 #16 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 #17 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was created by an allocation of 'bf' in the stack frame of function 'perf_event__synthesize_mmap_events' #0 0x559ceafc5f60 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:445 SUMMARY: MemorySanitizer: use-of-uninitialized-value elfutils/libdwfl/frame_unwind.c:648:8 in handle_cfi Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: [email protected] Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sandeep Dasgupta <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

CHECK: Alignment should match open parenthesis #24: FILE: drivers/mfd/tps65910.c:296: + ret = regmap_clear_bits(tps65910->regmap, TPS65910_DEVCTRL, DEVCTRL_CK32K_CTRL_MASK); CHECK: Alignment should match open parenthesis #33: FILE: drivers/mfd/tps65910.c:318: + ret = regmap_set_bits(tps65910->regmap, TPS65910_DEVCTRL, DEVCTRL_DEV_SLP_MASK); CHECK: Alignment should match open parenthesis #42: FILE: drivers/mfd/tps65910.c:326: + ret = regmap_set_bits(tps65910->regmap, TPS65910_SLEEP_KEEP_RES_ON, CHECK: Alignment should match open parenthesis #51: FILE: drivers/mfd/tps65910.c:336: + ret = regmap_set_bits(tps65910->regmap, TPS65910_SLEEP_KEEP_RES_ON, CHECK: Alignment should match open parenthesis #60: FILE: drivers/mfd/tps65910.c:346: + ret = regmap_set_bits(tps65910->regmap, TPS65910_SLEEP_KEEP_RES_ON, CHECK: Alignment should match open parenthesis #69: FILE: drivers/mfd/tps65910.c:358: + regmap_clear_bits(tps65910->regmap, TPS65910_DEVCTRL, DEVCTRL_DEV_SLP_MASK); CHECK: Alignment should match open parenthesis #78: FILE: drivers/mfd/tps65910.c:440: + if (regmap_set_bits(tps65910->regmap, TPS65910_DEVCTRL, DEVCTRL_PWR_OFF_MASK) < 0) CHECK: Alignment should match open parenthesis #83: FILE: drivers/mfd/tps65910.c:444: + regmap_clear_bits(tps65910->regmap, TPS65910_DEVCTRL, DEVCTRL_DEV_ON_MASK); Signed-off-by: Lee Jones <[email protected]>

This patch cleared use_ack and use_map when dropping other suboptions to fix the following syzkaller BUG: [ 15.223006] BUG: unable to handle page fault for address: 0000000000223b10 [ 15.223700] #PF: supervisor read access in kernel mode [ 15.224209] #PF: error_code(0x0000) - not-present page [ 15.224724] PGD b8d5067 P4D b8d5067 PUD c0a5067 PMD 0 [ 15.225237] Oops: 0000 [#1] SMP [ 15.225556] CPU: 0 PID: 7747 Comm: syz-executor Not tainted 5.10.0-rc6+ #24 [ 15.226281] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 15.227292] RIP: 0010:skb_release_data+0x89/0x1e0 [ 15.227816] Code: 5b 5d 41 5c 41 5d 41 5e 41 5f e9 02 06 8a ff e8 fd 05 8a ff 45 31 ed 80 7d 02 00 4c 8d 65 30 74 55 e8 eb 05 8a ff 49 8b 1c 24 <4c> 8b 7b 08 41 f6 c7 01 0f 85 18 01 00 00 e8 d4 05 8a ff 8b 43 34 [ 15.229669] RSP: 0018:ffffc900019c7c08 EFLAGS: 00010293 [ 15.230188] RAX: ffff88800daad900 RBX: 0000000000223b08 RCX: 0000000000000006 [ 15.230895] RDX: 0000000000000000 RSI: ffffffff818e06c5 RDI: ffff88807f6dc700 [ 15.231593] RBP: ffff88807f71a4c0 R08: 0000000000000001 R09: 0000000000000001 [ 15.232299] R10: ffffc900019c7c18 R11: 0000000000000000 R12: ffff88807f71a4f0 [ 15.233007] R13: 0000000000000000 R14: ffff88807f6dc700 R15: 0000000000000002 [ 15.233714] FS: 00007f65d9b5f700(0000) GS:ffff88807c400000(0000) knlGS:0000000000000000 [ 15.234509] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.235081] CR2: 0000000000223b10 CR3: 000000000b883000 CR4: 00000000000006f0 [ 15.235788] Call Trace: [ 15.236042] skb_release_all+0x28/0x30 [ 15.236419] __kfree_skb+0x11/0x20 [ 15.236768] tcp_data_queue+0x270/0x1240 [ 15.237161] ? tcp_urg+0x50/0x2a0 [ 15.237496] tcp_rcv_established+0x39a/0x890 [ 15.237997] ? mark_held_locks+0x49/0x70 [ 15.238467] tcp_v4_do_rcv+0xb9/0x270 [ 15.238915] __release_sock+0x8a/0x160 [ 15.239365] release_sock+0x32/0xd0 [ 15.239793] __inet_stream_connect+0x1d2/0x400 [ 15.240313] ? do_wait_intr_irq+0x80/0x80 [ 15.240791] inet_stream_connect+0x36/0x50 [ 15.241275] mptcp_stream_connect+0x69/0x1b0 [ 15.241787] __sys_connect+0x122/0x140 [ 15.242236] ? syscall_enter_from_user_mode+0x17/0x50 [ 15.242836] ? lockdep_hardirqs_on_prepare+0xd4/0x170 [ 15.243436] __x64_sys_connect+0x1a/0x20 [ 15.243924] do_syscall_64+0x33/0x40 [ 15.244313] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 15.244821] RIP: 0033:0x7f65d946e469 [ 15.245183] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48 [ 15.247019] RSP: 002b:00007f65d9b5eda8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a [ 15.247770] RAX: ffffffffffffffda RBX: 000000000049bf00 RCX: 00007f65d946e469 [ 15.248471] RDX: 0000000000000010 RSI: 00000000200000c0 RDI: 0000000000000005 [ 15.249205] RBP: 000000000049bf00 R08: 0000000000000000 R09: 0000000000000000 [ 15.249908] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000049bf0c [ 15.250603] R13: 00007fffe8a25cef R14: 00007f65d9b3f000 R15: 0000000000000003 [ 15.251312] Modules linked in: [ 15.251626] CR2: 0000000000223b10 [ 15.251965] BUG: kernel NULL pointer dereference, address: 0000000000000048 [ 15.252005] ---[ end trace f5c51fe19123c773 ]--- [ 15.252822] #PF: supervisor read access in kernel mode [ 15.252823] #PF: error_code(0x0000) - not-present page [ 15.252825] PGD c6c6067 P4D c6c6067 PUD c0d8067 [ 15.253294] RIP: 0010:skb_release_data+0x89/0x1e0 [ 15.253910] PMD 0 [ 15.253914] Oops: 0000 [#2] SMP [ 15.253917] CPU: 1 PID: 7746 Comm: syz-executor Tainted: G D 5.10.0-rc6+ #24 [ 15.253920] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 15.254435] Code: 5b 5d 41 5c 41 5d 41 5e 41 5f e9 02 06 8a ff e8 fd 05 8a ff 45 31 ed 80 7d 02 00 4c 8d 65 30 74 55 e8 eb 05 8a ff 49 8b 1c 24 <4c> 8b 7b 08 41 f6 c7 01 0f 85 18 01 00 00 e8 d4 05 8a ff 8b 43 34 [ 15.254899] RIP: 0010:skb_release_data+0x89/0x1e0 [ 15.254902] Code: 5b 5d 41 5c 41 5d 41 5e 41 5f e9 02 06 8a ff e8 fd 05 8a ff 45 31 ed 80 7d 02 00 4c 8d 65 30 74 55 e8 eb 05 8a ff 49 8b 1c 24 <4c> 8b 7b 08 41 f6 c7 01 0f 85 18 01 00 00 e8 d4 05 8a ff 8b 43 34 [ 15.254905] RSP: 0018:ffffc900019bfc08 EFLAGS: 00010293 [ 15.255376] RSP: 0018:ffffc900019c7c08 EFLAGS: 00010293 [ 15.255580] [ 15.255583] RAX: ffff888004a7ac80 RBX: 0000000000000040 RCX: 0000000000000000 [ 15.255912] [ 15.256724] RDX: 0000000000000000 RSI: ffffffff818e06c5 RDI: ffff88807f6ddd00 [ 15.257620] RAX: ffff88800daad900 RBX: 0000000000223b08 RCX: 0000000000000006 [ 15.259817] RBP: ffff88800e9006c0 R08: 0000000000000000 R09: 0000000000000000 [ 15.259818] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88800e9006f0 [ 15.259820] R13: 0000000000000000 R14: ffff88807f6ddd00 R15: 0000000000000002 [ 15.259822] FS: 00007fae4a60a700(0000) GS:ffff88807c500000(0000) knlGS:0000000000000000 [ 15.259826] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.260296] RDX: 0000000000000000 RSI: ffffffff818e06c5 RDI: ffff88807f6dc700 [ 15.262514] CR2: 0000000000000048 CR3: 000000000b89c000 CR4: 00000000000006e0 [ 15.262515] Call Trace: [ 15.262519] skb_release_all+0x28/0x30 [ 15.262523] __kfree_skb+0x11/0x20 [ 15.263054] RBP: ffff88807f71a4c0 R08: 0000000000000001 R09: 0000000000000001 [ 15.263680] tcp_data_queue+0x270/0x1240 [ 15.263843] R10: ffffc900019c7c18 R11: 0000000000000000 R12: ffff88807f71a4f0 [ 15.264693] ? tcp_urg+0x50/0x2a0 [ 15.264856] R13: 0000000000000000 R14: ffff88807f6dc700 R15: 0000000000000002 [ 15.265720] tcp_rcv_established+0x39a/0x890 [ 15.266438] FS: 00007f65d9b5f700(0000) GS:ffff88807c400000(0000) knlGS:0000000000000000 [ 15.267283] ? __schedule+0x3fa/0x880 [ 15.267287] tcp_v4_do_rcv+0xb9/0x270 [ 15.267290] __release_sock+0x8a/0x160 [ 15.268049] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.268788] release_sock+0x32/0xd0 [ 15.268791] __inet_stream_connect+0x1d2/0x400 [ 15.268795] ? do_wait_intr_irq+0x80/0x80 [ 15.269593] CR2: 0000000000223b10 CR3: 000000000b883000 CR4: 00000000000006f0 [ 15.270246] inet_stream_connect+0x36/0x50 [ 15.270250] mptcp_stream_connect+0x69/0x1b0 [ 15.270253] __sys_connect+0x122/0x140 [ 15.271097] Kernel panic - not syncing: Fatal exception [ 15.271820] ? syscall_enter_from_user_mode+0x17/0x50 [ 15.283542] ? lockdep_hardirqs_on_prepare+0xd4/0x170 [ 15.284275] __x64_sys_connect+0x1a/0x20 [ 15.284853] do_syscall_64+0x33/0x40 [ 15.285369] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 15.286105] RIP: 0033:0x7fae49f19469 [ 15.286638] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48 [ 15.289295] RSP: 002b:00007fae4a609da8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a [ 15.290375] RAX: ffffffffffffffda RBX: 000000000049bf00 RCX: 00007fae49f19469 [ 15.291403] RDX: 0000000000000010 RSI: 00000000200000c0 RDI: 0000000000000005 [ 15.292437] RBP: 000000000049bf00 R08: 0000000000000000 R09: 0000000000000000 [ 15.293456] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000049bf0c [ 15.294473] R13: 00007fff0004b6bf R14: 00007fae4a5ea000 R15: 0000000000000003 [ 15.295492] Modules linked in: [ 15.295944] CR2: 0000000000000048 [ 15.296567] Kernel Offset: disabled [ 15.296941] ---[ end Kernel panic - not syncing: Fatal exception ]--- Reported-by: Christoph Paasch <[email protected]> Fixes: 84dfe36 (mptcp: send out dedicated ADD_ADDR packet) Signed-off-by: Geliang Tang <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Link: https://lore.kernel.org/r/ccca4e8f01457a1b495c5d612ed16c5f7a585706.1608010058.git.geliangtang@gmail.com Signed-off-by: Jakub Kicinski <[email protected]>

Since de78a9c ("powerpc: Add a framework for Kernel Userspace Access Protection"), user access helpers call user_{read|write}_access_{begin|end} when user space access is allowed. Commit 890274c ("powerpc/64s: Implement KUAP for Radix MMU") made the mentioned helpers program a AMR special register to allow such access for a short period of time, most of the time AMR is expected to block user memory access by the kernel. Since the code accesses the user space memory, unsafe_get_user() calls might_fault() which calls arch_local_irq_restore() if either CONFIG_PROVE_LOCKING or CONFIG_DEBUG_ATOMIC_SLEEP is enabled. arch_local_irq_restore() then attempts to replay pending soft interrupts as KUAP regions have hardware interrupts enabled. If a pending interrupt happens to do user access (performance interrupts do that), it enables access for a short period of time so after returning from the replay, the user access state remains blocked and if a user page fault happens - "Bug: Read fault blocked by AMR!" appears and SIGSEGV is sent. An example trace: Bug: Read fault blocked by AMR! WARNING: CPU: 0 PID: 1603 at /home/aik/p/kernel/arch/powerpc/include/asm/book3s/64/kup-radix.h:145 CPU: 0 PID: 1603 Comm: amr Not tainted 5.10.0-rc6_v5.10-rc6_a+fstn1 #24 NIP: c00000000009ece8 LR: c00000000009ece4 CTR: 0000000000000000 REGS: c00000000dc63560 TRAP: 0700 Not tainted (5.10.0-rc6_v5.10-rc6_a+fstn1) MSR: 8000000000021033 <SF,ME,IR,DR,RI,LE> CR: 28002888 XER: 20040000 CFAR: c0000000001fa928 IRQMASK: 1 GPR00: c00000000009ece4 c00000000dc637f0 c000000002397600 000000000000001f GPR04: c0000000020eb318 0000000000000000 c00000000dc63494 0000000000000027 GPR08: c00000007fe4de68 c00000000dfe9180 0000000000000000 0000000000000001 GPR12: 0000000000002000 c0000000030a0000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 bfffffffffffffff GPR20: 0000000000000000 c0000000134a4020 c0000000019c2218 0000000000000fe0 GPR24: 0000000000000000 0000000000000000 c00000000d106200 0000000040000000 GPR28: 0000000000000000 0000000000000300 c00000000dc63910 c000000001946730 NIP __do_page_fault+0xb38/0xde0 LR __do_page_fault+0xb34/0xde0 Call Trace: __do_page_fault+0xb34/0xde0 (unreliable) handle_page_fault+0x10/0x2c --- interrupt: 300 at strncpy_from_user+0x290/0x440 LR = strncpy_from_user+0x284/0x440 strncpy_from_user+0x2f0/0x440 (unreliable) getname_flags+0x88/0x2c0 do_sys_openat2+0x2d4/0x5f0 do_sys_open+0xcc/0x140 system_call_exception+0x160/0x240 system_call_common+0xf0/0x27c To fix it save/restore the AMR when replaying interrupts, and also add a check if AMR was not blocked prior to replaying interrupts. Originally found by syzkaller. Fixes: 890274c ("powerpc/64s: Implement KUAP for Radix MMU") Signed-off-by: Alexey Kardashevskiy <[email protected]> Reviewed-by: Nicholas Piggin <[email protected]> [mpe: Use normal commit citation format and add full oops log to change log, move kuap_check_amr() into the restore routine to avoid warnings about unreconciled IRQ state] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]

Clean up the code to use the "mmc" directly instead of "host->mmc". If the code sits in hot code path, this clean up also brings trvial performance improvement. Take the sdhci_post_req() for example: before the patch: ... 8d0: a9be7bfd stp x29, x30, [sp, #-32]! 8d4: 910003fd mov x29, sp 8d8: f9000bf3 str x19, [sp, #16] 8dc: f9400833 ldr x19, [x1, #16] 8e0: b9404261 ldr w1, [x19, #64] 8e4: 34000161 cbz w1, 910 <sdhci_post_req+0x50> 8e8: f9424400 ldr x0, [x0, #1160] 8ec: d2800004 mov x4, #0x0 // #0 8f0: b9401a61 ldr w1, [x19, #24] 8f4: b9403262 ldr w2, [x19, #48] 8f8: f9400000 ldr x0, [x0] 8fc: f278003f tst x1, #0x100 900: f9401e61 ldr x1, [x19, #56] 904: 1a9f17e3 cset w3, eq // eq = none 908: 11000463 add w3, w3, #0x1 90c: 94000000 bl 0 <dma_unmap_sg_attrs> ... After the patch: ... 8d0: a9be7bfd stp x29, x30, [sp, #-32]! 8d4: 910003fd mov x29, sp 8d8: f9000bf3 str x19, [sp, #16] 8dc: f9400833 ldr x19, [x1, #16] 8e0: b9404261 ldr w1, [x19, #64] 8e4: 34000141 cbz w1, 90c <sdhci_post_req+0x4c> 8e8: b9401a61 ldr w1, [x19, #24] 8ec: d2800004 mov x4, #0x0 // #0 8f0: b9403262 ldr w2, [x19, #48] 8f4: f9400000 ldr x0, [x0] 8f8: f278003f tst x1, #0x100 8fc: f9401e61 ldr x1, [x19, #56] 900: 1a9f17e3 cset w3, eq // eq = none 904: 11000463 add w3, w3, #0x1 908: 94000000 bl 0 <dma_unmap_sg_attrs> ... We saved one ldr instruction: "ldr x0, [x0, #1160]" Signed-off-by: Jisheng Zhang <[email protected]> Acked-by: Adrian Hunter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]>

The "auxtrace_info" and "auxtrace" functions are not set in "tool" member of "annotate". As a result, perf annotate does not support parsing itrace data. Before: # perf record -e arm_spe_0/branch_filter=1/ -a sleep 1 [ perf record: Woken up 9 times to write data ] [ perf record: Captured and wrote 20.874 MB perf.data ] # perf annotate --stdio Error: The perf.data data has no samples! Solution: 1. Add itrace options in help, 2. Set hook functions of "id_index", "auxtrace_info" and "auxtrace" in perf_tool. After: # perf record --all-user -e arm_spe_0/branch_filter=1/ ls Couldn't synthesize bpf events. perf.data [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.010 MB perf.data ] # perf annotate --stdio Percent | Source code & Disassembly of libc-2.28.so for branch-miss (1 samples, percent: local period) ------------------------------------------------------------------------------------------------------------ : : : : Disassembly of section .text: : : 0000000000066180 <__getdelim@@GLIBC_2.17>: 0.00 : 66180: stp x29, x30, [sp, #-96]! 0.00 : 66184: cmp x0, #0x0 0.00 : 66188: ccmp x1, #0x0, #0x4, ne // ne = any 0.00 : 6618c: mov x29, sp 0.00 : 66190: stp x24, x25, [sp, #56] 0.00 : 66194: stp x26, x27, [sp, #72] 0.00 : 66198: str x28, [sp, #88] 0.00 : 6619c: b.eq 66450 <__getdelim@@GLIBC_2.17+0x2d0> // b.none 0.00 : 661a0: stp x22, x23, [x29, #40] 0.00 : 661a4: mov x22, x1 0.00 : 661a8: ldr w1, [x3] 0.00 : 661ac: mov w23, w2 0.00 : 661b0: stp x20, x21, [x29, #24] 0.00 : 661b4: mov x20, x3 0.00 : 661b8: mov x21, x0 0.00 : 661bc: tbnz w1, #15, 66360 <__getdelim@@GLIBC_2.17+0x1e0> 0.00 : 661c0: ldr x0, [x3, #136] 0.00 : 661c4: ldr x2, [x0, #8] 0.00 : 661c8: str x19, [x29, #16] 0.00 : 661cc: mrs x19, tpidr_el0 0.00 : 661d0: sub x19, x19, #0x700 0.00 : 661d4: cmp x2, x19 0.00 : 661d8: b.eq 663f0 <__getdelim@@GLIBC_2.17+0x270> // b.none 0.00 : 661dc: mov w1, #0x1 // #1 0.00 : 661e0: ldaxr w2, [x0] 0.00 : 661e4: cmp w2, #0x0 0.00 : 661e8: b.ne 661f4 <__getdelim@@GLIBC_2.17+0x74> // b.any 0.00 : 661ec: stxr w3, w1, [x0] 0.00 : 661f0: cbnz w3, 661e0 <__getdelim@@GLIBC_2.17+0x60> 0.00 : 661f4: b.ne 66448 <__getdelim@@GLIBC_2.17+0x2c8> // b.any 0.00 : 661f8: ldr x0, [x20, #136] 0.00 : 661fc: ldr w1, [x20] 0.00 : 66200: ldr w2, [x0, #4] 0.00 : 66204: str x19, [x0, #8] 0.00 : 66208: add w2, w2, #0x1 0.00 : 6620c: str w2, [x0, #4] 0.00 : 66210: tbnz w1, #5, 66388 <__getdelim@@GLIBC_2.17+0x208> 0.00 : 66214: ldr x19, [x29, #16] 0.00 : 66218: ldr x0, [x21] 0.00 : 6621c: cbz x0, 66228 <__getdelim@@GLIBC_2.17+0xa8> 0.00 : 66220: ldr x0, [x22] 0.00 : 66224: cbnz x0, 6623c <__getdelim@@GLIBC_2.17+0xbc> 0.00 : 66228: mov x0, #0x78 // #120 0.00 : 6622c: str x0, [x22] 0.00 : 66230: bl 20710 <malloc@plt> 0.00 : 66234: str x0, [x21] 0.00 : 66238: cbz x0, 66428 <__getdelim@@GLIBC_2.17+0x2a8> 0.00 : 6623c: ldr x27, [x20, #8] 0.00 : 66240: str x19, [x29, #16] 0.00 : 66244: ldr x19, [x20, #16] 0.00 : 66248: sub x19, x19, x27 0.00 : 6624c: cmp x19, #0x0 0.00 : 66250: b.le 66398 <__getdelim@@GLIBC_2.17+0x218> 0.00 : 66254: mov x25, #0x0 // #0 0.00 : 66258: b 662d8 <__getdelim@@GLIBC_2.17+0x158> 0.00 : 6625c: nop 0.00 : 66260: add x24, x19, x25 0.00 : 66264: ldr x3, [x22] 0.00 : 66268: add x26, x24, #0x1 0.00 : 6626c: ldr x0, [x21] 0.00 : 66270: cmp x3, x26 0.00 : 66274: b.cs 6629c <__getdelim@@GLIBC_2.17+0x11c> // b.hs, b.nlast 0.00 : 66278: lsl x3, x3, #1 0.00 : 6627c: cmp x3, x26 0.00 : 66280: csel x26, x3, x26, cs // cs = hs, nlast 0.00 : 66284: mov x1, x26 0.00 : 66288: bl 206f0 <realloc@plt> 0.00 : 6628c: cbz x0, 66438 <__getdelim@@GLIBC_2.17+0x2b8> 0.00 : 66290: str x0, [x21] 0.00 : 66294: ldr x27, [x20, #8] 0.00 : 66298: str x26, [x22] 0.00 : 6629c: mov x2, x19 0.00 : 662a0: mov x1, x27 0.00 : 662a4: add x0, x0, x25 0.00 : 662a8: bl 87390 <explicit_bzero@@GLIBC_2.25+0x50> 0.00 : 662ac: ldr x0, [x20, #8] 0.00 : 662b0: add x19, x0, x19 0.00 : 662b4: str x19, [x20, #8] 0.00 : 662b8: cbnz x28, 66410 <__getdelim@@GLIBC_2.17+0x290> 0.00 : 662bc: mov x0, x20 0.00 : 662c0: bl 73b80 <__underflow@@GLIBC_2.17> 0.00 : 662c4: cmn w0, #0x1 0.00 : 662c8: b.eq 66410 <__getdelim@@GLIBC_2.17+0x290> // b.none 0.00 : 662cc: ldp x27, x19, [x20, #8] 0.00 : 662d0: mov x25, x24 0.00 : 662d4: sub x19, x19, x27 0.00 : 662d8: mov x2, x19 0.00 : 662dc: mov w1, w23 0.00 : 662e0: mov x0, x27 0.00 : 662e4: bl 807b0 <memchr@@GLIBC_2.17> 0.00 : 662e8: cmp x0, #0x0 0.00 : 662ec: mov x28, x0 0.00 : 662f0: sub x0, x0, x27 0.00 : 662f4: csinc x19, x19, x0, eq // eq = none 0.00 : 662f8: mov x0, #0x7fffffffffffffff // #9223372036854775807 0.00 : 662fc: sub x0, x0, x25 0.00 : 66300: cmp x19, x0 0.00 : 66304: b.lt 66260 <__getdelim@@GLIBC_2.17+0xe0> // b.tstop 0.00 : 66308: adrp x0, 17f000 <sys_sigabbrev@@GLIBC_2.17+0x320> 0.00 : 6630c: ldr x0, [x0, #3624] 0.00 : 66310: mrs x2, tpidr_el0 0.00 : 66314: ldr x19, [x29, #16] 0.00 : 66318: mov w3, #0x4b // #75 0.00 : 6631c: ldr w1, [x20] 0.00 : 66320: mov x24, #0xffffffffffffffff // #-1 0.00 : 66324: str w3, [x2, x0] 0.00 : 66328: tbnz w1, #15, 66340 <__getdelim@@GLIBC_2.17+0x1c0> 0.00 : 6632c: ldr x0, [x20, #136] 0.00 : 66330: ldr w1, [x0, #4] 0.00 : 66334: sub w1, w1, #0x1 0.00 : 66338: str w1, [x0, #4] 0.00 : 6633c: cbz w1, 663b8 <__getdelim@@GLIBC_2.17+0x238> 0.00 : 66340: mov x0, x24 0.00 : 66344: ldr x28, [sp, #88] 0.00 : 66348: ldp x20, x21, [x29, #24] 0.00 : 6634c: ldp x22, x23, [x29, #40] 0.00 : 66350: ldp x24, x25, [sp, #56] 0.00 : 66354: ldp x26, x27, [sp, #72] 0.00 : 66358: ldp x29, x30, [sp], #96 0.00 : 6635c: ret 100.00 : 66360: tbz w1, #5, 66218 <__getdelim@@GLIBC_2.17+0x98> 0.00 : 66364: ldp x20, x21, [x29, #24] 0.00 : 66368: mov x24, #0xffffffffffffffff // #-1 0.00 : 6636c: ldp x22, x23, [x29, #40] 0.00 : 66370: mov x0, x24 0.00 : 66374: ldp x24, x25, [sp, #56] 0.00 : 66378: ldp x26, x27, [sp, #72] 0.00 : 6637c: ldr x28, [sp, #88] 0.00 : 66380: ldp x29, x30, [sp], #96 0.00 : 66384: ret 0.00 : 66388: mov x24, #0xffffffffffffffff // #-1 0.00 : 6638c: ldr x19, [x29, #16] 0.00 : 66390: b 66328 <__getdelim@@GLIBC_2.17+0x1a8> 0.00 : 66394: nop 0.00 : 66398: mov x0, x20 0.00 : 6639c: bl 73b80 <__underflow@@GLIBC_2.17> 0.00 : 663a0: cmn w0, #0x1 0.00 : 663a4: b.eq 66438 <__getdelim@@GLIBC_2.17+0x2b8> // b.none 0.00 : 663a8: ldp x27, x19, [x20, #8] 0.00 : 663ac: sub x19, x19, x27 0.00 : 663b0: b 66254 <__getdelim@@GLIBC_2.17+0xd4> 0.00 : 663b4: nop 0.00 : 663b8: str xzr, [x0, #8] 0.00 : 663bc: ldxr w2, [x0] 0.00 : 663c0: stlxr w3, w1, [x0] 0.00 : 663c4: cbnz w3, 663bc <__getdelim@@GLIBC_2.17+0x23c> 0.00 : 663c8: cmp w2, #0x1 0.00 : 663cc: b.le 66340 <__getdelim@@GLIBC_2.17+0x1c0> 0.00 : 663d0: mov x1, #0x81 // #129 0.00 : 663d4: mov x2, #0x1 // #1 0.00 : 663d8: mov x3, #0x0 // #0 0.00 : 663dc: mov x8, #0x62 // #98 0.00 : 663e0: svc #0x0 0.00 : 663e4: ldp x20, x21, [x29, #24] 0.00 : 663e8: ldp x22, x23, [x29, #40] 0.00 : 663ec: b 66370 <__getdelim@@GLIBC_2.17+0x1f0> 0.00 : 663f0: ldr w2, [x0, #4] 0.00 : 663f4: add w2, w2, #0x1 0.00 : 663f8: str w2, [x0, #4] 0.00 : 663fc: tbz w1, #5, 66214 <__getdelim@@GLIBC_2.17+0x94> 0.00 : 66400: mov x24, #0xffffffffffffffff // #-1 0.00 : 66404: ldr x19, [x29, #16] 0.00 : 66408: b 66330 <__getdelim@@GLIBC_2.17+0x1b0> 0.00 : 6640c: nop 0.00 : 66410: ldr x0, [x21] 0.00 : 66414: strb wzr, [x0, x24] 0.00 : 66418: ldr w1, [x20] 0.00 : 6641c: ldr x19, [x29, #16] 0.00 : 66420: b 66328 <__getdelim@@GLIBC_2.17+0x1a8> 0.00 : 66424: nop 0.00 : 66428: mov x24, #0xffffffffffffffff // #-1 0.00 : 6642c: ldr w1, [x20] 0.00 : 66430: b 66328 <__getdelim@@GLIBC_2.17+0x1a8> 0.00 : 66434: nop 0.00 : 66438: mov x24, #0xffffffffffffffff // #-1 0.00 : 6643c: ldr w1, [x20] 0.00 : 66440: ldr x19, [x29, #16] 0.00 : 66444: b 66328 <__getdelim@@GLIBC_2.17+0x1a8> 0.00 : 66448: bl e3ba0 <pthread_setcanceltype@@GLIBC_2.17+0x30> 0.00 : 6644c: b 661f8 <__getdelim@@GLIBC_2.17+0x78> 0.00 : 66450: adrp x0, 17f000 <sys_sigabbrev@@GLIBC_2.17+0x320> 0.00 : 66454: ldr x0, [x0, #3624] 0.00 : 66458: mrs x1, tpidr_el0 0.00 : 6645c: mov w2, #0x16 // #22 0.00 : 66460: mov x24, #0xffffffffffffffff // #-1 0.00 : 66464: str w2, [x1, x0] 0.00 : 66468: b 66370 <__getdelim@@GLIBC_2.17+0x1f0> 0.00 : 6646c: ldr w1, [x20] 0.00 : 66470: mov x4, x0 0.00 : 66474: tbnz w1, #15, 6648c <__getdelim@@GLIBC_2.17+0x30c> 0.00 : 66478: ldr x0, [x20, #136] 0.00 : 6647c: ldr w1, [x0, #4] 0.00 : 66480: sub w1, w1, #0x1 0.00 : 66484: str w1, [x0, #4] 0.00 : 66488: cbz w1, 66494 <__getdelim@@GLIBC_2.17+0x314> 0.00 : 6648c: mov x0, x4 0.00 : 66490: bl 20e40 <gnu_get_libc_version@@GLIBC_2.17+0x130> 0.00 : 66494: str xzr, [x0, #8] 0.00 : 66498: ldxr w2, [x0] 0.00 : 6649c: stlxr w3, w1, [x0] 0.00 : 664a0: cbnz w3, 66498 <__getdelim@@GLIBC_2.17+0x318> 0.00 : 664a4: cmp w2, #0x1 0.00 : 664a8: b.le 6648c <__getdelim@@GLIBC_2.17+0x30c> 0.00 : 664ac: mov x1, #0x81 // #129 0.00 : 664b0: mov x2, #0x1 // #1 0.00 : 664b4: mov x3, #0x0 // #0 0.00 : 664b8: mov x8, #0x62 // #98 0.00 : 664bc: svc #0x0 0.00 : 664c0: b 6648c <__getdelim@@GLIBC_2.17+0x30c> Signed-off-by: Yang Jihong <[email protected]> Tested-by: Leo Yan <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

When setting up a read or write to the OPB memory space, we must perform five or six AHB writes. The ordering of these up until the trigger write does not matter, so use writel_relaxed. The generated code goes from (Debian GCC 10.2.1-6): mov r8, r3 mcr 15, 0, sl, cr7, cr10, {4} str sl, [r6, #20] mcr 15, 0, sl, cr7, cr10, {4} str r3, [r6, #24] mcr 15, 0, sl, cr7, cr10, {4} str r1, [r6, #28] mcr 15, 0, sl, cr7, cr10, {4} str r2, [r6, #32] mcr 15, 0, sl, cr7, cr10, {4} mov r1, #1 str r1, [r6, #64] ; 0x40 mcr 15, 0, sl, cr7, cr10, {4} str r1, [r6, #4] to this: str r3, [r7, #20] str r2, [r7, #24] str r1, [r7, #28] str r3, [r7, #64] mov r8, #0 mcr 15, 0, r8, cr7, cr10, {4} str r3, [r7, #4] Signed-off-by: Joel Stanley <[email protected]> Acked-by: Jeremy Kerr <[email protected]> Reviewed-by: Eddie James <[email protected]> Tested-by: Eddie James <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joel Stanley <[email protected]>

When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <[email protected]> Signed-off-by: David S. Miller <[email protected]>

The rcutorture module is used to run torture tests that validate RCU. rcutorture takes a variety of module parameters that configure the functionality of the test. Amongst these parameters are the types of synchronization mechanisms that the rcu_torture_writer and rcu_torture_fakewriter tasks may use, and the torture_type of the run which determines what read and sync operations are used by the various writer and reader tasks that run throughout the test. When the module is configured to only use sync types for which the specified torture_type does not implement the necessary operations, we can end up in a state where nsynctypes is 0. This is not an erroneous state, but it currently crashes the kernel with a #DE due to nsynctypes being used with a modulo operator in rcu_torture_fakewriter(). Here is an example of such a #DE: $ insmod ./rcutorture.ko gp_cond=1 gp_cond_exp=0 gp_exp=0 gp_poll_exp=0 gp_normal=0 gp_poll=0 gp_poll_exp=0 verbose=9999 torture_type=trivial ... [ 8536.525096] divide error: 0000 [#1] PREEMPT SMP PTI [ 8536.525101] CPU: 30 PID: 392138 Comm: rcu_torture_fak Kdump: loaded Tainted: G S 5.17.0-rc1-00179-gc8c42c80febd #24 [ 8536.525105] Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A23 12/08/2020 [ 8536.525106] RIP: 0010:rcu_torture_fakewriter+0xf1/0x2d0 [rcutorture] [ 8536.525121] Code: 00 31 d2 8d 0c f5 00 00 00 00 48 63 c9 48 f7 f1 48 85 d2 0f 84 79 ff ff ff 48 89 e7 e8 78 78 01 00 48 63 0d 29 ca 00 00 31 d2 <48> f7 f1 8b 04 95 00 05 4e a0 83 f8 06 0f 84 ad 00 00 00 7f 1f 83 [ 8536.525124] RSP: 0018:ffffc9000777fef0 EFLAGS: 00010246 [ 8536.525127] RAX: 00000000223d006e RBX: cccccccccccccccd RCX: 0000000000000000 [ 8536.525130] RDX: 0000000000000000 RSI: ffffffff824315b9 RDI: ffffc9000777fef0 [ 8536.525132] RBP: ffffc9000487bb30 R08: 0000000000000002 R09: 000000000002a580 [ 8536.525134] R10: ffffffff82c5f920 R11: 0000000000000000 R12: ffff8881a2c35d00 [ 8536.525136] R13: ffff8881540c8d00 R14: ffffffffa04d39d0 R15: 0000000000000000 [ 8536.525137] FS: 0000000000000000(0000) GS:ffff88903ff80000(0000) knlGS:0000000000000000 [ 8536.525140] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8536.525142] CR2: 00007f839f022000 CR3: 0000000002c0a006 CR4: 00000000007706e0 [ 8536.525144] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8536.525145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8536.525147] PKRU: 55555554 [ 8536.525148] Call Trace: [ 8536.525150] <TASK> [ 8536.525153] kthread+0xe8/0x110 [ 8536.525161] ? kthread_complete_and_exit+0x20/0x20 [ 8536.525167] ret_from_fork+0x22/0x30 [ 8536.525174] </TASK> The solution is to gracefully handle the case of nsynctypes being 0 in rcu_torture_fakewriter() by not performing any work. This is already being done in rcu_torture_writer(), though there is a missing return on that path which will be fixed in a subsequent patch. Signed-off-by: David Vernet <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>

During stress test with attaching and detaching VF from KVM and simultaneously changing VFs spoofcheck and trust there was a call trace in ice_reset_vf that VF's VSI is null. [145237.352797] WARNING: CPU: 46 PID: 840629 at drivers/net/ethernet/intel/ice/ice_vf_lib.c:508 ice_reset_vf+0x3d6/0x410 [ice] [145237.352851] Modules linked in: ice(E) vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio iavf dm_mod xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink tun bridge stp llc sunrpc intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iTCO_wdt iTC O_vendor_support irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl ipmi_si intel_cstate ipmi_devintf joydev intel_uncore m ei_me ipmi_msghandler i2c_i801 pcspkr mei lpc_ich ioatdma i2c_smbus acpi_pad acpi_power_meter ip_tables xfs libcrc32c i2c_algo_bit drm_sh mem_helper drm_kms_helper sd_mod t10_pi crc64_rocksoft syscopyarea crc64 sysfillrect sg sysimgblt fb_sys_fops drm i40e ixgbe ahci libahci libata crc32c_intel mdio dca wmi fuse [last unloaded: ice] [145237.352917] CPU: 46 PID: 840629 Comm: kworker/46:2 Tainted: G S W I E 5.19.0-rc6+ #24 [145237.352921] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015 [145237.352923] Workqueue: ice ice_service_task [ice] [145237.352948] RIP: 0010:ice_reset_vf+0x3d6/0x410 [ice] [145237.352984] Code: 30 ec f3 cc e9 28 fd ff ff 0f b7 4b 50 48 c7 c2 48 19 9c c0 4c 89 ee 48 c7 c7 30 fe 9e c0 e8 d1 21 9d cc 31 c0 e9 a 9 fe ff ff <0f> 0b b8 ea ff ff ff e9 c1 fc ff ff 0f 0b b8 fb ff ff ff e9 91 fe [145237.352987] RSP: 0018:ffffb453e257fdb8 EFLAGS: 00010246 [145237.352990] RAX: ffff8bd0040181c0 RBX: ffff8be68db8f800 RCX: 0000000000000000 [145237.352991] RDX: 000000000000ffff RSI: 0000000000000000 RDI: ffff8be68db8f800 [145237.352993] RBP: ffff8bd0040181c0 R08: 0000000000001000 R09: ffff8bcfd520e000 [145237.352995] R10: 0000000000000000 R11: 00008417b5ab0bc0 R12: 0000000000000005 [145237.352996] R13: ffff8bcee061c0d0 R14: ffff8bd004019640 R15: 0000000000000000 [145237.352998] FS: 0000000000000000(0000) GS:ffff8be5dfb00000(0000) knlGS:0000000000000000 [145237.353000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [145237.353002] CR2: 00007fd81f651d68 CR3: 0000001a0fe10001 CR4: 00000000001726e0 [145237.353003] Call Trace: [145237.353008] <TASK> [145237.353011] ice_process_vflr_event+0x8d/0xb0 [ice] [145237.353049] ice_service_task+0x79f/0xef0 [ice] [145237.353074] process_one_work+0x1c8/0x390 [145237.353081] ? process_one_work+0x390/0x390 [145237.353084] worker_thread+0x30/0x360 [145237.353087] ? process_one_work+0x390/0x390 [145237.353090] kthread+0xe8/0x110 [145237.353094] ? kthread_complete_and_exit+0x20/0x20 [145237.353097] ret_from_fork+0x22/0x30 [145237.353103] </TASK> Remove WARN_ON() from check if VSI is null in ice_reset_vf. Add "VF is already removed\n" in dev_dbg(). This WARN_ON() is unnecessary and causes call trace, despite that call trace, driver still works. There is no need for this warn because this piece of code is responsible for disabling VF's Tx/Rx queues when VF is disabled, but when VF is already removed there is no need to do reset or disable queues. Fixes: efe4186 ("ice: Fix memory corruption in VF driver") Signed-off-by: Michal Jaron <[email protected]> Signed-off-by: Jedrzej Jagielski <[email protected]> Tested-by: Marek Szlosek <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>

When we try to transmit an skb with metadata_dst attached (i.e. dst->dev == NULL) through xfrm interface we can hit a null pointer dereference[1] in xfrmi_xmit2() -> xfrm_lookup_with_ifid() due to the check for a loopback skb device when there's no policy which dereferences dst->dev unconditionally. Not having dst->dev can be interepreted as it not being a loopback device, so just add a check for a null dst_orig->dev. With this fix xfrm interface's Tx error counters go up as usual. [1] net-next calltrace captured via netconsole: BUG: kernel NULL pointer dereference, address: 00000000000000c0 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 1 PID: 7231 Comm: ping Kdump: loaded Not tainted 5.19.0+ #24 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-1.fc36 04/01/2014 RIP: 0010:xfrm_lookup_with_ifid+0x5eb/0xa60 Code: 8d 74 24 38 e8 26 a4 37 00 48 89 c1 e9 12 fc ff ff 49 63 ed 41 83 fd be 0f 85 be 01 00 00 41 be ff ff ff ff 45 31 ed 48 8b 03 <f6> 80 c0 00 00 00 08 75 0f 41 80 bc 24 19 0d 00 00 01 0f 84 1e 02 RSP: 0018:ffffb0db82c679f0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffd0db7fcad430 RCX: ffffb0db82c67a10 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffb0db82c67a80 RBP: ffffb0db82c67a80 R08: ffffb0db82c67a14 R09: 0000000000000000 R10: 0000000000000000 R11: ffff8fa449667dc8 R12: ffffffff966db880 R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000000 FS: 00007ff35c83f000(0000) GS:ffff8fa478480000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000c0 CR3: 000000001ebb7000 CR4: 0000000000350ee0 Call Trace: <TASK> xfrmi_xmit+0xde/0x460 ? tcf_bpf_act+0x13d/0x2a0 dev_hard_start_xmit+0x72/0x1e0 __dev_queue_xmit+0x251/0xd30 ip_finish_output2+0x140/0x550 ip_push_pending_frames+0x56/0x80 raw_sendmsg+0x663/0x10a0 ? try_charge_memcg+0x3fd/0x7a0 ? __mod_memcg_lruvec_state+0x93/0x110 ? sock_sendmsg+0x30/0x40 sock_sendmsg+0x30/0x40 __sys_sendto+0xeb/0x130 ? handle_mm_fault+0xae/0x280 ? do_user_addr_fault+0x1e7/0x680 ? kvm_read_and_reset_apf_flags+0x3b/0x50 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7ff35cac1366 Code: eb 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89 RSP: 002b:00007fff738e4028 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007fff738e57b0 RCX: 00007ff35cac1366 RDX: 0000000000000040 RSI: 0000557164e4b450 RDI: 0000000000000003 RBP: 0000557164e4b450 R08: 00007fff738e7a2c R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040 R13: 00007fff738e5770 R14: 00007fff738e4030 R15: 0000001d00000001 </TASK> Modules linked in: netconsole veth br_netfilter bridge bonding virtio_net [last unloaded: netconsole] CR2: 00000000000000c0 CC: Steffen Klassert <[email protected]> CC: Daniel Borkmann <[email protected]> Fixes: 2d151d3 ("xfrm: Add possibility to set the default to block if we have no policy") Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>

…notify_init() The following WARNING message was given when rmmod cros_usbpd_notify: Unexpected driver unregister! WARNING: CPU: 0 PID: 253 at drivers/base/driver.c:270 driver_unregister+0x8a/0xb0 Modules linked in: cros_usbpd_notify(-) CPU: 0 PID: 253 Comm: rmmod Not tainted 6.1.0-rc3 #24 ... Call Trace: <TASK> cros_usbpd_notify_exit+0x11/0x1e [cros_usbpd_notify] __x64_sys_delete_module+0x3c7/0x570 ? __ia32_sys_delete_module+0x570/0x570 ? lock_is_held_type+0xe3/0x140 ? syscall_enter_from_user_mode+0x17/0x50 ? rcu_read_lock_sched_held+0xa0/0xd0 ? syscall_enter_from_user_mode+0x1c/0x50 do_syscall_64+0x37/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f333fe9b1b7 The reason is that the cros_usbpd_notify_init() does not check the return value of platform_driver_register(), and the cros_usbpd_notify can install successfully even if platform_driver_register() failed. Fix by checking the return value of platform_driver_register() and unregister cros_usbpd_notify_plat_driver when it failed. Fixes: ec2daf6 ("platform: chrome: Add cros-usbpd-notify driver") Signed-off-by: Yuan Can <[email protected]> Reviewed-by: Brian Norris <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Prashant Malani <[email protected]>

The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Reviewed-by: Vlad Buslov <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>

The following processes run into a deadlock. CPU 41 was waiting for CPU 29 to handle a CSD request while holding spinlock "crashdump_lock", but CPU 29 was hung by that spinlock with IRQs disabled. PID: 17360 TASK: ffff95c1090c5c40 CPU: 41 COMMAND: "mrdiagd" !# 0 [ffffb80edbf37b58] __read_once_size at ffffffff9b871a40 include/linux/compiler.h:185:0 !# 1 [ffffb80edbf37b58] atomic_read at ffffffff9b871a40 arch/x86/include/asm/atomic.h:27:0 !# 2 [ffffb80edbf37b58] dump_stack at ffffffff9b871a40 lib/dump_stack.c:54:0 # 3 [ffffb80edbf37b78] csd_lock_wait_toolong at ffffffff9b131ad5 kernel/smp.c:364:0 # 4 [ffffb80edbf37b78] __csd_lock_wait at ffffffff9b131ad5 kernel/smp.c:384:0 # 5 [ffffb80edbf37bf8] csd_lock_wait at ffffffff9b13267a kernel/smp.c:394:0 # 6 [ffffb80edbf37bf8] smp_call_function_many at ffffffff9b13267a kernel/smp.c:843:0 # 7 [ffffb80edbf37c50] smp_call_function at ffffffff9b13279d kernel/smp.c:867:0 # 8 [ffffb80edbf37c50] on_each_cpu at ffffffff9b13279d kernel/smp.c:976:0 # 9 [ffffb80edbf37c78] flush_tlb_kernel_range at ffffffff9b085c4b arch/x86/mm/tlb.c:742:0 #10 [ffffb80edbf37cb8] __purge_vmap_area_lazy at ffffffff9b23a1e0 mm/vmalloc.c:701:0 #11 [ffffb80edbf37ce0] try_purge_vmap_area_lazy at ffffffff9b23a2cc mm/vmalloc.c:722:0 #12 [ffffb80edbf37ce0] free_vmap_area_noflush at ffffffff9b23a2cc mm/vmalloc.c:754:0 #13 [ffffb80edbf37cf8] free_unmap_vmap_area at ffffffff9b23bb3b mm/vmalloc.c:764:0 #14 [ffffb80edbf37cf8] remove_vm_area at ffffffff9b23bb3b mm/vmalloc.c:1509:0 #15 [ffffb80edbf37d18] __vunmap at ffffffff9b23bb8a mm/vmalloc.c:1537:0 #16 [ffffb80edbf37d40] vfree at ffffffff9b23bc85 mm/vmalloc.c:1612:0 #17 [ffffb80edbf37d58] megasas_free_host_crash_buffer [megaraid_sas] at ffffffffc020b7f2 drivers/scsi/megaraid/megaraid_sas_fusion.c:3932:0 #18 [ffffb80edbf37d80] fw_crash_state_store [megaraid_sas] at ffffffffc01f804d drivers/scsi/megaraid/megaraid_sas_base.c:3291:0 #19 [ffffb80edbf37dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 #20 [ffffb80edbf37dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 #21 [ffffb80edbf37de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 #22 [ffffb80edbf37e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 #23 [ffffb80edbf37ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 #24 [ffffb80edbf37ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 #25 [ffffb80edbf37ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 #26 [ffffb80edbf37f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 #27 [ffffb80edbf37f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 PID: 17355 TASK: ffff95c1090c3d80 CPU: 29 COMMAND: "mrdiagd" !# 0 [ffffb80f2d3c7d30] __read_once_size at ffffffff9b0f2ab0 include/linux/compiler.h:185:0 !# 1 [ffffb80f2d3c7d30] native_queued_spin_lock_slowpath at ffffffff9b0f2ab0 kernel/locking/qspinlock.c:368:0 # 2 [ffffb80f2d3c7d58] pv_queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/paravirt.h:674:0 # 3 [ffffb80f2d3c7d58] queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/qspinlock.h:53:0 # 4 [ffffb80f2d3c7d68] queued_spin_lock at ffffffff9b8961a6 include/asm-generic/qspinlock.h:90:0 # 5 [ffffb80f2d3c7d68] do_raw_spin_lock_flags at ffffffff9b8961a6 include/linux/spinlock.h:173:0 # 6 [ffffb80f2d3c7d68] __raw_spin_lock_irqsave at ffffffff9b8961a6 include/linux/spinlock_api_smp.h:122:0 # 7 [ffffb80f2d3c7d68] _raw_spin_lock_irqsave at ffffffff9b8961a6 kernel/locking/spinlock.c:160:0 # 8 [ffffb80f2d3c7d88] fw_crash_buffer_store [megaraid_sas] at ffffffffc01f8129 drivers/scsi/megaraid/megaraid_sas_base.c:3205:0 # 9 [ffffb80f2d3c7dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 #10 [ffffb80f2d3c7dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 #11 [ffffb80f2d3c7de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 #12 [ffffb80f2d3c7e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 #13 [ffffb80f2d3c7ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 #14 [ffffb80f2d3c7ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 #15 [ffffb80f2d3c7ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 #16 [ffffb80f2d3c7f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 #17 [ffffb80f2d3c7f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 The lock is used to synchronize different sysfs operations, it doesn't protect any resource that will be touched by an interrupt. Consequently it's not required to disable IRQs. Replace the spinlock with a mutex to fix the deadlock. Signed-off-by: Junxiao Bi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mike Christie <[email protected]> Cc: [email protected] Signed-off-by: Martin K. Petersen <[email protected]>

The current implementation of the mov instruction with sign extension has the following problems: 1. It clobbers the source register if it is not stacked because it sign extends the source and then moves it to the destination. 2. If the dst_reg is stacked, the current code doesn't write the value back in case of 64-bit mov. 3. There is room for improvement by emitting fewer instructions. The steps for fixing this and the instructions emitted by the JIT are explained below with examples in all combinations: Case A: offset == 32: ===================== Case A.1: src and dst are stacked registers: -------------------------------------------- 1. Load src_lo into tmp_lo 2. Store tmp_lo into dst_lo 3. Sign extend tmp_lo into tmp_hi 4. Store tmp_hi to dst_hi Example: r3 = (s32)r3 r3 is a stacked register ldr r6, [r11, #-16] // Load r3_lo into tmp_lo // str to dst_lo is not emitted because src_lo == dst_lo asr r7, r6, #31 // Sign extend tmp_lo into tmp_hi str r7, [r11, #-12] // Store tmp_hi into r3_hi Case A.2: src is stacked but dst is not: ---------------------------------------- 1. Load src_lo into dst_lo 2. Sign extend dst_lo into dst_hi Example: r6 = (s32)r3 r6 maps to {ARM_R5, ARM_R4} and r3 is stacked ldr r4, [r11, #-16] // Load r3_lo into r6_lo asr r5, r4, #31 // Sign extend r6_lo into r6_hi Case A.3: src is not stacked but dst is stacked: ------------------------------------------------ 1. Store src_lo into dst_lo 2. Sign extend src_lo into tmp_hi 3. Store tmp_hi to dst_hi Example: r3 = (s32)r6 r3 is stacked and r6 maps to {ARM_R5, ARM_R4} str r4, [r11, #-16] // Store r6_lo to r3_lo asr r7, r4, #31 // Sign extend r6_lo into tmp_hi str r7, [r11, #-12] // Store tmp_hi to dest_hi Case A.4: Both src and dst are not stacked: ------------------------------------------- 1. Mov src_lo into dst_lo 2. Sign extend src_lo into dst_hi Example: (bf) r6 = (s32)r6 r6 maps to {ARM_R5, ARM_R4} // Mov not emitted because dst == src asr r5, r4, #31 // Sign extend r6_lo into r6_hi Case B: offset != 32: ===================== Case B.1: src and dst are stacked registers: -------------------------------------------- 1. Load src_lo into tmp_lo 2. Sign extend tmp_lo according to offset. 3. Store tmp_lo into dst_lo 4. Sign extend tmp_lo into tmp_hi 5. Store tmp_hi to dst_hi Example: r9 = (s8)r3 r9 and r3 are both stacked registers ldr r6, [r11, #-16] // Load r3_lo into tmp_lo lsl r6, r6, #24 // Sign extend tmp_lo asr r6, r6, #24 // .. str r6, [r11, #-56] // Store tmp_lo to r9_lo asr r7, r6, #31 // Sign extend tmp_lo to tmp_hi str r7, [r11, #-52] // Store tmp_hi to r9_hi Case B.2: src is stacked but dst is not: ---------------------------------------- 1. Load src_lo into dst_lo 2. Sign extend dst_lo according to offset. 3. Sign extend tmp_lo into dst_hi Example: r6 = (s8)r3 r6 maps to {ARM_R5, ARM_R4} and r3 is stacked ldr r4, [r11, #-16] // Load r3_lo to r6_lo lsl r4, r4, #24 // Sign extend r6_lo asr r4, r4, #24 // .. asr r5, r4, #31 // Sign extend r6_lo into r6_hi Case B.3: src is not stacked but dst is stacked: ------------------------------------------------ 1. Sign extend src_lo into tmp_lo according to offset. 2. Store tmp_lo into dst_lo. 3. Sign extend src_lo into tmp_hi. 4. Store tmp_hi to dst_hi. Example: r3 = (s8)r1 r3 is stacked and r1 maps to {ARM_R3, ARM_R2} lsl r6, r2, #24 // Sign extend r1_lo to tmp_lo asr r6, r6, #24 // .. str r6, [r11, #-16] // Store tmp_lo to r3_lo asr r7, r6, #31 // Sign extend tmp_lo to tmp_hi str r7, [r11, #-12] // Store tmp_hi to r3_hi Case B.4: Both src and dst are not stacked: ------------------------------------------- 1. Sign extend src_lo into dst_lo according to offset. 2. Sign extend dst_lo into dst_hi. Example: r6 = (s8)r1 r6 maps to {ARM_R5, ARM_R4} and r1 maps to {ARM_R3, ARM_R2} lsl r4, r2, #24 // Sign extend r1_lo to r6_lo asr r4, r4, #24 // .. asr r5, r4, #31 // Sign extend r6_lo to r6_hi Fixes: fc83265 ("arm32, bpf: add support for sign-extension mov instruction") Reported-by: [email protected] Signed-off-by: Puranjay Mohan <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Closes: https://lore.kernel.org/all/[email protected] Link: https://lore.kernel.org/bpf/[email protected]

Historically, arm64 implemented raw_smp_processor_id() as a read of current_thread_info()->cpu. This changed when arm64 moved thread_info into task struct, as at the time CONFIG_THREAD_INFO_IN_TASK made core code use thread_struct::cpu for the cpu number, and due to header dependencies prevented using this in raw_smp_processor_id(). As a workaround, we moved to using a percpu variable in commit: 57c8295 ("arm64: make cpu number a percpu variable") Since then, thread_info::cpu was reintroduced, and core code was made to use this in commits: 001430c ("arm64: add CPU field to struct thread_info") bcf9033 ("sched: move CPU field back into thread_info if THREAD_INFO_IN_TASK=y") Consequently it is possible to use current_thread_info()->cpu again. This decreases the number of emitted instructions like in the following example: Dump of assembler code for function bpf_get_smp_processor_id: 0xffff8000802cd608 <+0>: nop 0xffff8000802cd60c <+4>: nop 0xffff8000802cd610 <+8>: adrp x0, 0xffff800082138000 0xffff8000802cd614 <+12>: mrs x1, tpidr_el1 0xffff8000802cd618 <+16>: add x0, x0, #0x8 0xffff8000802cd61c <+20>: ldrsw x0, [x0, x1] 0xffff8000802cd620 <+24>: ret After this patch: Dump of assembler code for function bpf_get_smp_processor_id: 0xffff8000802c9130 <+0>: nop 0xffff8000802c9134 <+4>: nop 0xffff8000802c9138 <+8>: mrs x0, sp_el0 0xffff8000802c913c <+12>: ldr w0, [x0, #24] 0xffff8000802c9140 <+16>: ret A microbenchmark[1] was built to measure the performance improvement provided by this change. It calls the following function given number of times and finds the runtime overhead: static noinline int get_cpu_id(void) { return smp_processor_id(); } Run the benchmark like: modprobe smp_processor_id nr_function_calls=1000000000 +--------------------------+------------------------+ | | Number of Calls | Time taken | +--------+-----------------+------------------------+ | Before | 1000000000 | 1602888401ns | +--------+-----------------+------------------------+ | After | 1000000000 | 1206212658ns | +--------+-----------------+------------------------+ | Difference (decrease) | 396675743ns (24.74%) | +---------------------------------------------------+ Remove the percpu variable cpu_number as it is used only in set_smp_ipi_range() as a dummy variable to be passed to ipi_handler(). Use irq_stat in place of cpu_number here like arm32. [1] puranjaymohan/linux@77d3fdd Signed-off-by: Puranjay Mohan <[email protected]> Acked-by: Mark Rutland <[email protected]> Reviewed-by: Stephen Boyd <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>

profile->parent->dents[AAFS_PROF_DIR] could be NULL only if its parent is made from __create_missing_ancestors(..) and 'ent->old' is NULL in aa_replace_profiles(..). In that case, it must return an error code and the code, -ENOENT represents its state that the path of its parent is not existed yet. BUG: kernel NULL pointer dereference, address: 0000000000000030 PGD 0 P4D 0 PREEMPT SMP PTI CPU: 4 PID: 3362 Comm: apparmor_parser Not tainted 6.8.0-24-generic #24 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:aafs_create.constprop.0+0x7f/0x130 Code: 4c 63 e0 48 83 c4 18 4c 89 e0 5b 41 5c 41 5d 41 5e 41 5f 5d 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 45 31 d2 c3 cc cc cc cc <4d> 8b 55 30 4d 8d ba a0 00 00 00 4c 89 55 c0 4c 89 ff e8 7a 6a ae RSP: 0018:ffffc9000b2c7c98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000000041ed RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffc9000b2c7cd8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82baac10 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007be9f22cf740(0000) GS:ffff88817bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000030 CR3: 0000000134b08000 CR4: 00000000000006f0 Call Trace: <TASK> ? show_regs+0x6d/0x80 ? __die+0x24/0x80 ? page_fault_oops+0x99/0x1b0 ? kernelmode_fixup_or_oops+0xb2/0x140 ? __bad_area_nosemaphore+0x1a5/0x2c0 ? find_vma+0x34/0x60 ? bad_area_nosemaphore+0x16/0x30 ? do_user_addr_fault+0x2a2/0x6b0 ? exc_page_fault+0x83/0x1b0 ? asm_exc_page_fault+0x27/0x30 ? aafs_create.constprop.0+0x7f/0x130 ? aafs_create.constprop.0+0x51/0x130 __aafs_profile_mkdir+0x3d6/0x480 aa_replace_profiles+0x83f/0x1270 policy_update+0xe3/0x180 profile_load+0xbc/0x150 ? rw_verify_area+0x47/0x140 vfs_write+0x100/0x480 ? __x64_sys_openat+0x55/0xa0 ? syscall_exit_to_user_mode+0x86/0x260 ksys_write+0x73/0x100 __x64_sys_write+0x19/0x30 x64_sys_call+0x7e/0x25c0 do_syscall_64+0x7f/0x180 entry_SYSCALL_64_after_hwframe+0x78/0x80 RIP: 0033:0x7be9f211c574 Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d d5 ea 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 RSP: 002b:00007ffd26f2b8c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00005d504415e200 RCX: 00007be9f211c574 RDX: 0000000000001fc1 RSI: 00005d504418bc80 RDI: 0000000000000004 RBP: 0000000000001fc1 R08: 0000000000001fc1 R09: 0000000080000000 R10: 0000000000000000 R11: 0000000000000202 R12: 00005d504418bc80 R13: 0000000000000004 R14: 00007ffd26f2b9b0 R15: 00007ffd26f2ba30 </TASK> Modules linked in: snd_seq_dummy snd_hrtimer qrtr snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device i2c_i801 snd_timer i2c_smbus qxl snd soundcore drm_ttm_helper lpc_ich ttm joydev input_leds serio_raw mac_hid binfmt_misc msr parport_pc ppdev lp parport efi_pstore nfnetlink dmi_sysfs qemu_fw_cfg ip_tables x_tables autofs4 hid_generic usbhid hid ahci libahci psmouse virtio_rng xhci_pci xhci_pci_renesas CR2: 0000000000000030 ---[ end trace 0000000000000000 ]--- RIP: 0010:aafs_create.constprop.0+0x7f/0x130 Code: 4c 63 e0 48 83 c4 18 4c 89 e0 5b 41 5c 41 5d 41 5e 41 5f 5d 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 45 31 d2 c3 cc cc cc cc <4d> 8b 55 30 4d 8d ba a0 00 00 00 4c 89 55 c0 4c 89 ff e8 7a 6a ae RSP: 0018:ffffc9000b2c7c98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000000041ed RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffc9000b2c7cd8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82baac10 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007be9f22cf740(0000) GS:ffff88817bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000030 CR3: 0000000134b08000 CR4: 00000000000006f0 Signed-off-by: Leesoo Ahn <[email protected]> Signed-off-by: John Johansen <[email protected]>

scx_ops_bypass() can currently race on the ops enable / disable path as follows: 1. scx_ops_bypass(true) called on enable path, bypass depth is set to 1 2. An op on the init path exits, which schedules scx_ops_disable_workfn() 3. scx_ops_bypass(false) is called on the disable path, and bypass depth is decremented to 0 4. kthread is scheduled to execute scx_ops_disable_workfn() 5. scx_ops_bypass(true) called, bypass depth set to 1 6. scx_ops_bypass() races when iterating over CPUs While it's not safe to take any blocking locks on the bypass path, it is safe to take a raw spinlock which cannot be preempted. This patch therefore updates scx_ops_bypass() to use a raw spinlock to synchronize, and changes scx_ops_bypass_depth to be a regular int. Without this change, we observe the following warnings when running the 'exit' sched_ext selftest (sometimes requires a couple of runs): .[root@virtme-ng sched_ext]# ./runner -t exit ===== START ===== TEST: exit ... [ 14.935078] WARNING: CPU: 2 PID: 360 at kernel/sched/ext.c:4332 scx_ops_bypass+0x1ca/0x280 [ 14.935126] Modules linked in: [ 14.935150] CPU: 2 UID: 0 PID: 360 Comm: sched_ext_ops_h Not tainted 6.11.0-virtme #24 [ 14.935192] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 14.935242] Sched_ext: exit (enabling+all) [ 14.935244] RIP: 0010:scx_ops_bypass+0x1ca/0x280 [ 14.935300] Code: ff ff ff e8 48 96 10 00 fb e9 08 ff ff ff c6 05 7b 34 e8 01 01 90 48 c7 c7 89 86 88 87 e8 be 1d f8 ff 90 0f 0b 90 90 eb 95 90 <0f> 0b 90 41 8b 84 24 24 0a 00 00 eb 97 90 0f 0b 90 41 8b 84 24 24 [ 14.935394] RSP: 0018:ffffb706c0957ce0 EFLAGS: 00010002 [ 14.935424] RAX: 0000000000000009 RBX: 0000000000000001 RCX: 00000000e3fb8b2a [ 14.935465] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffffff88a4c080 [ 14.935512] RBP: 0000000000009b56 R08: 0000000000000004 R09: 00000003f12e520a [ 14.935555] R10: ffffffff863a9795 R11: 0000000000000000 R12: ffff8fc5fec31300 [ 14.935598] R13: ffff8fc5fec31318 R14: 0000000000000286 R15: 0000000000000018 [ 14.935642] FS: 0000000000000000(0000) GS:ffff8fc5fe680000(0000) knlGS:0000000000000000 [ 14.935684] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 14.935721] CR2: 0000557d92890b88 CR3: 000000002464a000 CR4: 0000000000750ef0 [ 14.935765] PKRU: 55555554 [ 14.935782] Call Trace: [ 14.935802] <TASK> [ 14.935823] ? __warn+0xce/0x220 [ 14.935850] ? scx_ops_bypass+0x1ca/0x280 [ 14.935881] ? report_bug+0xc1/0x160 [ 14.935909] ? handle_bug+0x61/0x90 [ 14.935934] ? exc_invalid_op+0x1a/0x50 [ 14.935959] ? asm_exc_invalid_op+0x1a/0x20 [ 14.935984] ? raw_spin_rq_lock_nested+0x15/0x30 [ 14.936019] ? scx_ops_bypass+0x1ca/0x280 [ 14.936046] ? srso_alias_return_thunk+0x5/0xfbef5 [ 14.936081] ? __pfx_scx_ops_disable_workfn+0x10/0x10 [ 14.936111] scx_ops_disable_workfn+0x146/0xac0 [ 14.936142] ? finish_task_switch+0xa9/0x2c0 [ 14.936172] ? srso_alias_return_thunk+0x5/0xfbef5 [ 14.936211] ? __pfx_scx_ops_disable_workfn+0x10/0x10 [ 14.936244] kthread_worker_fn+0x101/0x2c0 [ 14.936268] ? __pfx_kthread_worker_fn+0x10/0x10 [ 14.936299] kthread+0xec/0x110 [ 14.936327] ? __pfx_kthread+0x10/0x10 [ 14.936351] ret_from_fork+0x37/0x50 [ 14.936374] ? __pfx_kthread+0x10/0x10 [ 14.936400] ret_from_fork_asm+0x1a/0x30 [ 14.936427] </TASK> [ 14.936443] irq event stamp: 21002 [ 14.936467] hardirqs last enabled at (21001): [<ffffffff863aa35f>] resched_cpu+0x9f/0xd0 [ 14.936521] hardirqs last disabled at (21002): [<ffffffff863dd0ba>] scx_ops_bypass+0x11a/0x280 [ 14.936571] softirqs last enabled at (20642): [<ffffffff863683d7>] __irq_exit_rcu+0x67/0xd0 [ 14.936622] softirqs last disabled at (20637): [<ffffffff863683d7>] __irq_exit_rcu+0x67/0xd0 [ 14.936672] ---[ end trace 0000000000000000 ]--- [ 14.953282] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) [ 14.953352] ------------[ cut here ]------------ [ 14.953383] WARNING: CPU: 2 PID: 360 at kernel/sched/ext.c:4335 scx_ops_bypass+0x1d8/0x280 [ 14.953428] Modules linked in: [ 14.953453] CPU: 2 UID: 0 PID: 360 Comm: sched_ext_ops_h Tainted: G W 6.11.0-virtme #24 [ 14.953505] Tainted: [W]=WARN [ 14.953527] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 14.953574] RIP: 0010:scx_ops_bypass+0x1d8/0x280 [ 14.953603] Code: c6 05 7b 34 e8 01 01 90 48 c7 c7 89 86 88 87 e8 be 1d f8 ff 90 0f 0b 90 90 eb 95 90 0f 0b 90 41 8b 84 24 24 0a 00 00 eb 97 90 <0f> 0b 90 41 8b 84 24 24 0a 00 00 eb 92 f3 0f 1e fa 49 8d 84 24 f0 [ 14.953693] RSP: 0018:ffffb706c0957ce0 EFLAGS: 00010046 [ 14.953722] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001 [ 14.953763] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8fc5fec31318 [ 14.953804] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [ 14.953845] R10: ffffffff863a9795 R11: 0000000000000000 R12: ffff8fc5fec31300 [ 14.953888] R13: ffff8fc5fec31318 R14: 0000000000000286 R15: 0000000000000018 [ 14.953934] FS: 0000000000000000(0000) GS:ffff8fc5fe680000(0000) knlGS:0000000000000000 [ 14.953974] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 14.954009] CR2: 0000557d92890b88 CR3: 000000002464a000 CR4: 0000000000750ef0 [ 14.954052] PKRU: 55555554 [ 14.954068] Call Trace: [ 14.954085] <TASK> [ 14.954102] ? __warn+0xce/0x220 [ 14.954126] ? scx_ops_bypass+0x1d8/0x280 [ 14.954150] ? report_bug+0xc1/0x160 [ 14.954178] ? handle_bug+0x61/0x90 [ 14.954203] ? exc_invalid_op+0x1a/0x50 [ 14.954226] ? asm_exc_invalid_op+0x1a/0x20 [ 14.954250] ? raw_spin_rq_lock_nested+0x15/0x30 [ 14.954285] ? scx_ops_bypass+0x1d8/0x280 [ 14.954311] ? __mutex_unlock_slowpath+0x3a/0x260 [ 14.954343] scx_ops_disable_workfn+0xa3e/0xac0 [ 14.954381] ? __pfx_scx_ops_disable_workfn+0x10/0x10 [ 14.954413] kthread_worker_fn+0x101/0x2c0 [ 14.954442] ? __pfx_kthread_worker_fn+0x10/0x10 [ 14.954479] kthread+0xec/0x110 [ 14.954507] ? __pfx_kthread+0x10/0x10 [ 14.954530] ret_from_fork+0x37/0x50 [ 14.954553] ? __pfx_kthread+0x10/0x10 [ 14.954576] ret_from_fork_asm+0x1a/0x30 [ 14.954603] </TASK> [ 14.954621] irq event stamp: 21002 [ 14.954644] hardirqs last enabled at (21001): [<ffffffff863aa35f>] resched_cpu+0x9f/0xd0 [ 14.954686] hardirqs last disabled at (21002): [<ffffffff863dd0ba>] scx_ops_bypass+0x11a/0x280 [ 14.954735] softirqs last enabled at (20642): [<ffffffff863683d7>] __irq_exit_rcu+0x67/0xd0 [ 14.954782] softirqs last disabled at (20637): [<ffffffff863683d7>] __irq_exit_rcu+0x67/0xd0 [ 14.954829] ---[ end trace 0000000000000000 ]--- [ 15.022283] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) [ 15.092282] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) [ 15.149282] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) ok 1 exit # ===== END ===== And with it, the test passes without issue after 1000s of runs: .[root@virtme-ng sched_ext]# ./runner -t exit ===== START ===== TEST: exit DESCRIPTION: Verify we can cleanly exit a scheduler in multiple places OUTPUT: [ 7.412856] sched_ext: BPF scheduler "exit" enabled [ 7.427924] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) [ 7.466677] sched_ext: BPF scheduler "exit" enabled [ 7.475923] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) [ 7.512803] sched_ext: BPF scheduler "exit" enabled [ 7.532924] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) [ 7.586809] sched_ext: BPF scheduler "exit" enabled [ 7.595926] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) [ 7.661923] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) [ 7.723923] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) ok 1 exit # ===== END ===== ============================= RESULTS: PASSED: 1 SKIPPED: 0 FAILED: 0 Fixes: f0e1a06 ("sched_ext: Implement BPF extensible scheduler class") Signed-off-by: David Vernet <[email protected]> Signed-off-by: Tejun Heo <[email protected]>

lafoucriere and others added 30 commits November 1, 2013 17:32

Jinshan Xiong and others added 2 commits November 5, 2013 16:10

bergwolf closed this Jun 24, 2015

bergwolf deleted the master-sync-round-3 branch June 24, 2015 02:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

port patches from Lustre tree to kernel tree #24

port patches from Lustre tree to kernel tree #24

bergwolf commented Nov 5, 2013

port patches from Lustre tree to kernel tree #24

port patches from Lustre tree to kernel tree #24

Conversation

bergwolf commented Nov 5, 2013