Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux 6.7.2 #295

Closed
wants to merge 10,000 commits into from
Closed

Linux 6.7.2 #295

wants to merge 10,000 commits into from

Conversation

mmstick
Copy link
Member

@mmstick mmstick commented Feb 1, 2024

No description provided.

tiwai and others added 30 commits January 25, 2024 15:45
commit a03cfad upstream.

There was a typo in oxygen mixer code that didn't update the right
channel value properly for the capture volume.  Let's fix it.

This trivial fix was originally reported on Bugzilla.

Fixes: a360156 ("[ALSA] oxygen: add front panel controls")
Cc: <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=156561
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit bc7863d upstream.

This HP Laptop uses ALC236 codec with COEF 0x07 idx 1 controlling
the mute LED. This patch enables the already existing quirk for
this device.

Signed-off-by: Çağhan Demir <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
… ZBook

commit b018cee upstream.

On some HP ZBooks, the audio LEDs can be enabled by
ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF. So use it accordingly.

Signed-off-by: Yo-Jung Lin <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit fb3c007 upstream.

Lenovo M70 Gen5 is equipped with ALC623, and it needs
ALC283_FIXUP_HEADSET_MIC quirk to make its headset mic work.

Signed-off-by: Bin Li <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 92e4701 upstream.

If client send invalid mech token in session setup request, ksmbd
validate and make the error if it is invalid.

Cc: [email protected]
Reported-by: [email protected] # ZDI-CAN-22890
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 38d20c6 upstream.

The race is between the handling of a new TCP connection and
its disconnection. It leads to UAF on `struct tcp_transport` in
ksmbd_tcp_new_connection() function.

Cc: [email protected]
Reported-by: [email protected] # ZDI-CAN-22991
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 77bebd1 upstream.

When smb2 leases is disable, ksmbd can send oplock break notification
and cause wait oplock break ack timeout. It may appear like hang when
accessing a directory. This patch make only v2 leases handle the
directory.

Cc: [email protected]
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 9c896d6 upstream.

The kconfig options for filesystems that support FS_ENCRYPTION are
supposed to select FS_ENCRYPTION_ALGS.  This is needed to ensure that
required crypto algorithms get enabled as loadable modules or builtin as
is appropriate for the set of enabled filesystems.  Do this for CEPH_FS
so that there aren't any missing algorithms if someone happens to have
CEPH_FS as their only enabled filesystem that supports encryption.

Cc: [email protected]
Fixes: f061fed ("ceph: add fscrypt ioctls and ceph.fscrypt.auth vxattr")
Signed-off-by: Eric Biggers <[email protected]>
Reviewed-by: Xiubo Li <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit c239665 upstream.

There has been a lingering bug in LoongArch Linux systems causing some
GCC tests to intermittently fail (see Closes link).  I've made a minimal
reproducer:

    zsh% cat measure.s
    .align 4
    .globl _start
    _start:
        movfcsr2gr  $a0, $fcsr0
        bstrpick.w  $a0, $a0, 16, 16
        beqz        $a0, .ok
        break       0
    .ok:
        li.w        $a7, 93
        syscall     0
    zsh% cc mesaure.s -o measure -nostdlib
    zsh% echo $((1.0/3))
    0.33333333333333331
    zsh% while ./measure; do ; done

This while loop should not stop as POSIX is clear that execve must set
fenv to the default, where FCSR should be zero.  But in fact it will
just stop after running for a while (normally less than 30 seconds).
Note that "$((1.0/3))" is needed to reproduce this issue because it
raises FE_INVALID and makes fcsr0 non-zero.

The problem is we are currently relying on SET_PERSONALITY2() to reset
current->thread.fpu.fcsr.  But SET_PERSONALITY2() is executed before
start_thread which calls lose_fpu(0).  We can see if kernel preempt is
enabled, we may switch to another thread after SET_PERSONALITY2() but
before lose_fpu(0).  Then bad thing happens: during the thread switch
the value of the fcsr0 register is stored into current->thread.fpu.fcsr,
making it dirty again.

The issue can be fixed by setting current->thread.fpu.fcsr after
lose_fpu(0) because lose_fpu() clears TIF_USEDFPU, then the thread
switch won't touch current->thread.fpu.fcsr.

The only other architecture setting FCSR in SET_PERSONALITY2() is MIPS.
I've ran a similar test on MIPS with mainline kernel and it turns out
MIPS is buggy, too.  Anyway MIPS do this for supporting different FP
flavors (NaN encodings, etc.) which do not exist on LoongArch.  So for
LoongArch, we can simply remove the current->thread.fpu.fcsr setting
from SET_PERSONALITY2() and do it in start_thread(), after lose_fpu(0).

The while loop failing with the mainline kernel has survived one hour
after this change on LoongArch.

Fixes: 803b0fc ("LoongArch: Add process management")
Closes: loongson-community/discussions#7
Link: https://lore.kernel.org/linux-mips/[email protected]/
Cc: [email protected]
Signed-off-by: Xi Ruoyao <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 9b43ef3 upstream.

IOPOLL request should never return IOU_OK, so the following iopoll
queueing check in io_issue_sqe() after getting IOU_OK doesn't make any
sense as would never turn true. Let's optimise on that and return a bit
earlier. It's also much more resilient to potential bugs from
mischieving iopoll implementations.

Cc:  <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/2f8690e2fa5213a2ff292fac29a7143c036cdd60.1701390926.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 0a535ed upstream.

If IOSQE_ASYNC is set and we fail importing an iovec for a readv or
writev request, then we leave ->bytes_done uninitialized and hence the
eventual failure CQE posted can potentially have a random res value
rather than the expected -EINVAL.

Setup ->bytes_done before potentially failing, so we have a consistent
value if we fail the request early.

Cc: [email protected]
Reported-by: xingwei lee <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 6ff1407 upstream.

A previous commit added an earlier break condition here, which is fine if
we're using non-local task_work as it'll be run on return to userspace.
However, if DEFER_TASKRUN is used, then we could be leaving local
task_work that is ready to process in the ctx list until next time that
we enter the kernel to wait for events.

Move the break condition to _after_ we have run task_work.

Cc: [email protected]
Fixes: 846072f ("io_uring: mimimise io_cqring_wait_schedule")
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit b488077 upstream.

Fix build by using the correct name for the initializer macro
for struct fb_ops.

Signed-off-by: Thomas Zimmermann <[email protected]>
Fixes: 9037afd ("fbdev/acornfb: Use fbdev I/O helpers")
Cc: Thomas Zimmermann <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Javier Martinez Canillas <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: <[email protected]> # v6.6+
Reviewed-by: Javier Martinez Canillas <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 15e4c1f upstream.

The driver's fsync() is supposed to flush any pending operation to
hardware. It is implemented in this driver by cancelling the queued
deferred IO first, then schedule it for "immediate execution" by calling
schedule_delayed_work() again with delay=0. However, setting delay=0
only means the work is scheduled immediately, it does not mean the work
is executed immediately. There is no guarantee that the work is finished
after schedule_delayed_work() returns. After this driver's fsync()
returns, there can still be pending work. Furthermore, if close() is
called by users immediately after fsync(), the pending work gets
cancelled and fsync() may do nothing.

To ensure that the deferred IO completes, use flush_delayed_work()
instead. Write operations to this driver either write to the device
directly, or invoke schedule_delayed_work(); so by flushing the
workqueue, it can be guaranteed that all previous writes make it to the
device.

Fixes: 5e841b8 ("fb: fsync() method for deferred I/O flush.")
Cc: [email protected]
Signed-off-by: Nam Cao <[email protected]>
Reviewed-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 33cd6ea upstream.

When framebuffer gets closed, the queued deferred IO gets cancelled. This
can cause some last display data to vanish. This is problematic for users
who send a still image to the framebuffer, then close the file: the image
may never appear.

To ensure none of display data get lost, flush the queued deferred IO
first before closing.

Another possible solution is to delete the cancel_delayed_work_sync()
instead. The difference is that the display may appear some time after
closing. However, the clearing of page mapping after this needs to be
removed too, because the page mapping is used by the deferred work. It is
not completely obvious whether it is okay to not clear the page mapping.
For a patch intended for stable trees, go with the simple and obvious
solution.

Fixes: 60b59be ("fbdev: mm: Deferred IO support")
Cc: [email protected]
Signed-off-by: Nam Cao <[email protected]>
Reviewed-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit daf7795 upstream.

ufshcd_init() calls pm_runtime_get_sync() before it calls
async_schedule(). ufshcd_async_scan() calls pm_runtime_put_sync() directly
or indirectly from ufshcd_add_lus(). Simplify ufshcd_async_scan() by always
calling pm_runtime_put_sync() from ufshcd_async_scan().

Cc: <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Can Guo <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 0db1d53 upstream.

The callers of vfs_iter_write() are required to hold file_start_write().
file_start_write() is a no-op for the S_ISBLK() case, but it is really
needed when the backing file is a regular file.

We are going to move file_{start,end}_write() into vfs_iter_write(), but
we need to fix this first, so that the fix could be backported to stable
kernels.

Suggested-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]/
Cc:  <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Martin K. Petersen <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit e5aab84 upstream.

After a controller reset, the firmware may modify the device queue depth.
Therefore, update the device queue depth accordingly.

Cc: <[email protected]> # v5.15+
Co-developed-by: Sathya Prakash <[email protected]>
Signed-off-by: Sathya Prakash <[email protected]>
Signed-off-by: Chandrakanth patil <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit c01d515 upstream.

After a controller reset, if the firmware changes the state of devices to
"hide", then remove those devices from the OS.

Cc: <[email protected]> # v6.6+
Co-developed-by: Sathya Prakash <[email protected]>
Signed-off-by: Sathya Prakash <[email protected]>
Signed-off-by: Chandrakanth patil <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
…verable State

commit f8fb3f3 upstream.

If a controller reset is underway or the controller is in an unrecoverable
state, the PEL enable management command will be returned as EAGAIN or
EFAULT.

Cc: <[email protected]> # v6.1+
Co-developed-by: Sathya Prakash <[email protected]>
Signed-off-by: Sathya Prakash <[email protected]>
Signed-off-by: Chandrakanth patil <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit f9cfe7e upstream.

Commit cf1b6d4 ("md: simplify md_seq_ops") introduce following
regressions:

1) If list all_mddevs is emptly, personalities and unused devices won't
   be showed to user anymore.
2) If seq_file buffer overflowed from md_seq_show(), then md_seq_start()
   will be called again, hence personalities will be showed to user
   again.
3) If seq_file buffer overflowed from md_seq_stop(), seq_read_iter()
   doesn't handle this, hence unused devices won't be showed to user.

Fix above problems by printing personalities and unused devices in
md_seq_show().

Fixes: cf1b6d4 ("md: simplify md_seq_ops")
Cc: [email protected] # v6.7+
Signed-off-by: Yu Kuai <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit bd1f6a3 upstream.

When dGPU is put into BOCO it may be in D3cold but still able send
PME on display hotplug event. For this to work it must be enabled
as wake source from D3.

When runpm is enabled use pci_wake_from_d3() to mark wakeup as
enabled by default.

Cc: [email protected] # 6.1+
Signed-off-by: Mario Limonciello <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
… size

commit 6f64f86 upstream.

Before calling add partition or resize partition, there is no check
on whether the length is aligned with the logical block size.
If the logical block size of the disk is larger than 512 bytes,
then the partition size maybe not the multiple of the logical block size,
and when the last sector is read, bio_truncate() will adjust the bio size,
resulting in an IO error if the size of the read command is smaller than
the logical block size.If integrity data is supported, this will also
result in a null pointer dereference when calling bio_integrity_free.

Cc:  <[email protected]>
Signed-off-by: Min Li <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 1b151e2 upstream.

The special casing was originally added in pre-git history; reproducing
the commit log here:

> commit a318a92567d77
> Author: Andrew Morton <[email protected]>
> Date:   Sun Sep 21 01:42:22 2003 -0700
>
>     [PATCH] Speed up direct-io hugetlbpage handling
>
>     This patch short-circuits all the direct-io page dirtying logic for
>     higher-order pages.  Without this, we pointlessly bounce BIOs up to
>     keventd all the time.

In the last twenty years, compound pages have become used for more than
just hugetlb.  Rewrite these functions to operate on folios instead
of pages and remove the special case for hugetlbfs; I don't think
it's needed any more (and if it is, we can put it back in as a call
to folio_test_hugetlb()).

This was found by inspection; as far as I can tell, this bug can lead
to pages used as the destination of a direct I/O read not being marked
as dirty.  If those pages are then reclaimed by the MM without being
dirtied for some other reason, they won't be written out.  Then when
they're faulted back in, they will not contain the data they should.
It'll take a pretty unusual setup to produce this problem with several
races all going the wrong way.

This problem predates the folio work; it could for example have been
triggered by mmaping a THP in tmpfs and using that as the target of an
O_DIRECT read.

Fixes: 800d8c6 ("shmem: add huge pages support")
Cc:  <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 7bed6f3 upstream.

If the bio contains no data, bio_first_folio() calls page_folio() on a
NULL pointer and oopses.  Move the test that we've reached the end of
the bio from bio_next_folio() to bio_first_folio().

Reported-by: [email protected]
Reported-by: [email protected]
Fixes: 640d193 ("block: Add bio_for_each_folio_all()")
Cc: [email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[axboe: add unlikely() to error case]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
…t generation

commit b1db244 upstream.

When deactivating the catch-all set element, check the state in the next
generation that represents this transaction.

This bug uncovered after the recent removal of the element busy mark
a2dd023 ("netfilter: nf_tables: remove busy mark and gc batch API").

Fixes: aaa3104 ("netfilter: nftables: add catch-all set element support")
Cc: [email protected]
Reported-by: lonial con <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 9320fc5 upstream.

dev_err_probe() is only supposed to be used in probe functions. While it
probably doesn't hurt, both the EPROBE_DEFER handling and calling
device_set_deferred_probe_reason() are conceptually wrong in the request
callback. So replace the call by dev_err() and a separate return
statement.

This effectively reverts commit c0bfe96 ("pwm: jz4740: Simplify
with dev_err_probe()").

Reviewed-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Fixes: c0bfe96 ("pwm: jz4740: Simplify with dev_err_probe()")
Cc: [email protected]
Signed-off-by: Uwe Kleine-König <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit a297d07 upstream.

With args->args_count == 2 args->args[2] is not defined. Actually the
flags are contained in args->args[1].

Fixes: 3ab7b6a ("pwm: Introduce single-PWM of_xlate function")
Cc: [email protected]
Link: https://lore.kernel.org/r/243908750d306e018a3d4bf2eb745d53ab50f663.1704835845.git.u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 7dab245 upstream.

Use the type blk_opf_t for read and write operations instead of int. This
patch does not affect the generated code but fixes the following sparse
warning:

drivers/md/raid1.c:1993:60: sparse: sparse: incorrect type in argument 5 (different base types)
     expected restricted blk_opf_t [usertype] opf
     got int rw

Cc: Song Liu <[email protected]>
Cc: Jens Axboe <[email protected]>
Fixes: 3c5e514 ("md/raid1: Use the new blk_opf_t type")
Cc: [email protected] # v6.0+
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Bart Van Assche <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 21528c6 upstream.

Documentation/filesystems/ramfs-rootfs-initramfs.rst states:

  If CONFIG_TMPFS is enabled, rootfs will use tmpfs instead of ramfs by
  default.  To force ramfs, add "rootfstype=ramfs" to the kernel command
  line.

This currently does not work when root= is provided since then
saved_root_name contains a string and rootfstype= is ignored. Therefore,
ramfs is currently always chosen when root= is provided.

The current behavior for rootfs's filesystem is:

   root=       | rootfstype= | chosen rootfs filesystem
   ------------+-------------+--------------------------
   unspecified | unspecified | tmpfs
   unspecified | tmpfs       | tmpfs
   unspecified | ramfs       | ramfs
    provided   | ignored     | ramfs

rootfstype= should be respected regardless whether root= is given,
as shown below:

   root=       | rootfstype= | chosen rootfs filesystem
   ------------+-------------+--------------------------
   unspecified | unspecified | tmpfs  (as before)
   unspecified | tmpfs       | tmpfs  (as before)
   unspecified | ramfs       | ramfs  (as before)
    provided   | unspecified | ramfs  (compatibility with before)
    provided   | tmpfs       | tmpfs  (new)
    provided   | ramfs       | ramfs  (new)

This table represents the new behavior.

Fixes: 6e19ede ("initmpfs: use initramfs if rootfstype= or root= specified")
Cc: <[email protected]>
Signed-off-by: Rob Landley <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
Reviewed-and-Tested-by: Mimi Zohar <[email protected]>
Signed-off-by: Stefan Berger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
@mmstick mmstick requested review from a team February 1, 2024 12:59
@mmstick mmstick closed this Feb 1, 2024
@mmstick mmstick deleted the linux-6.7.2 branch February 1, 2024 13:23
@mmstick mmstick restored the linux-6.7.2 branch February 1, 2024 13:25
@mmstick mmstick reopened this Feb 1, 2024
jackpot51
jackpot51 previously approved these changes Feb 2, 2024
@leviport
Copy link
Member

leviport commented Feb 8, 2024

Here's where I'm at currently. Please check off anything I haven't gotten to yet.

Kernel Releases Tests

  • Updating to new kernel works with apt update && apt upgrade
    • No new dependencies are required for kernel update
  • system76-power still operates as expected (across Intel/NVIDIA/switchable machines)
  • Mic in
  • Audio out:
    • Laptop's built-in speakers
    • Headphones
    • DisplayPort
    • HDMI
  • Video out via:
    • DisplayPort
    • HDMI
  • Thunderbolt docking station:
    • HDMI and DisplayPort
    • External storage device
    • Networking
  • Suspend and resume:
    • Suspend and resume works with a bluetooth device paired
    • Switchable graphics laptops
      • In Hybrid graphics mode
      • In Nvidia graphics mode
      • In Integrated graphics mode
    • Nvidia desktop
      • Current Nvidia driver works (i.e. 510)
      • Legacy Nvidia driver works (i.e. 470)
    • Discrete AMD desktop
    • Integrated Intel desktop
    • Integrated AMD desktop
    • 150 suspend/resume cycles (fwts s3 --s3-multiple 150)
  • Graphics drivers included in the kernel (Intel/AMD)
    • Switchable/hybrid graphics and graphics switching
  • Steam
    • Steam installs via the Pop!_OS .deb in Pop!_Shop
    • Steam launches from launcher
    • Linux native game installs and runs
    • Proton game installs and runs
  • VirtualBox installs and works as expected
  • Broadcom wireless driver (bcmwl-kernel-source) installs without DKMS errors

@leviport
Copy link
Member

leviport commented Feb 8, 2024

All tests on Dev One are also complete. My Dev One is happy with 6.7.2

@XV-02
Copy link

XV-02 commented Feb 8, 2024

I am seeing an failures to resume from suspend on Pang12 with the 6.7.2 kernel. Journalctl logs report the system entering an s2idle suspend as the last message. I want to check with other drives, as I was using a WD550 to test, and those have had known issues before which have impacted suspend.

They aren't consistent, either, so I want to look into it a bit more.

@XV-02
Copy link

XV-02 commented Feb 8, 2024

Okay, so, things I have seen :

Doesn't appear to be drive specific, which is kinda a plus, I suppose.

I am yet to have the Pang12 fail on the first suspend/resume. It has reliably failed to resume after no more than the third attempted suspend-resume cycle. It doesn't appear to matter how suspend is triggered. fwts, systemctl suspend, using autosuspend, changing power-button behaviour, or closing the lid all had the same results.

The other aspect is that I am sometimes seeing logs cut off in the middle of entering suspend. That makes me think that it's less of a resume issue than a suspend issue.

I'm going to try a slightly newer kernel (6.7.4) at this junction and see if the issue persists.

@XV-02
Copy link

XV-02 commented Feb 9, 2024

Testing the Ubuntu builds of 6.7.3 and 6.8.rc3 and building and checking 6.7.4 - all of them exhibit the same issue of failure to suspend/resume within 3 attempts.

I'm not certain where in the process the failure is occurring. journalctl logs are inconclusive, usually ending after reporting that they have reached the s2idle sleep state. However, some of them have cut out before that point - which makes me think that it may be tied more to the suspend side of the process than the resume part. dmesg is not recoverable, and the Pangolin is not an open ec/firmware system, so I can't leverage those tools as far as I am aware.

Most often, I'm seeing a failure during my second attempt to suspend and resume on a given boot, though I have had failure which appear the same on both my first and third attempts. I have never had three successful consecutive attempts to suspend and resume on a single boot.

The current 6.6 series kernel doesn't exhibit these issues.

I'm going to look at Ubuntu's pre-6.7.2 builds in the 6.7 series, and see if any of those exhibit the issue to try and narrow it down. I'll look at post 6.6.10 builds in the 6.6 series after that, and try and make the footprint for finding which change introduced this issue smaller.

@XV-02
Copy link

XV-02 commented Feb 9, 2024

Hosanna!

It looks like whatever is causing our Pang12 pains was introduced between 6.7.rc5 and 6.7.rc6

RC5 has no issues entering into suspend and resuming many times in a row. RC6 however, fails after no more than 2 or 3 cycles.
Hopefully this will help to narrow down the root cause for this.

@leviport
Copy link
Member

leviport commented Feb 12, 2024

I think this also made larger initramfs files? My Dev One now has a full ESP (it still has the original install with the 500mb ESP) even though I switched the initramfs compression method to xz instead of zstd. That's something we should probably be careful of, as I'm sure plenty of users still have 500mb ESP's.

@leviport
Copy link
Member

Pang13 also appears to be affected by the suspend breaking bug.

Pang14 is unaffected.

@leviport
Copy link
Member

On pang13, kernel 6.8.0-rc1 restores correct suspend behavior.

@jackpot51
Copy link
Member

This will need to pick #298

Fixes headset detection on Clevo V350SNEQ.

Signed-off-by: Tim Crawford <[email protected]>
@mmstick mmstick closed this Mar 11, 2024
@mmstick mmstick deleted the linux-6.7.2 branch March 11, 2024 18:16
mmstick pushed a commit that referenced this pull request Oct 16, 2024
[ Upstream commit 18ec12c ]

Inject fault while probing of-fpga-region, if kasprintf() fails in
module_add_driver(), the second sysfs_remove_link() in exit path will cause
null-ptr-deref as below because kernfs_name_hash() will call strlen() with
NULL driver_name.

Fix it by releasing resources based on the exit path sequence.

	 KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
	 Mem abort info:
	   ESR = 0x0000000096000005
	   EC = 0x25: DABT (current EL), IL = 32 bits
	   SET = 0, FnV = 0
	   EA = 0, S1PTW = 0
	   FSC = 0x05: level 1 translation fault
	 Data abort info:
	   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
	   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
	   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
	 [dfffffc000000000] address between user and kernel address ranges
	 Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
	 Dumping ftrace buffer:
	    (ftrace buffer empty)
	 Modules linked in: of_fpga_region(+) fpga_region fpga_bridge cfg80211 rfkill 8021q garp mrp stp llc ipv6 [last unloaded: of_fpga_region]
	 CPU: 2 UID: 0 PID: 2036 Comm: modprobe Not tainted 6.11.0-rc2-g6a0e38264012 #295
	 Hardware name: linux,dummy-virt (DT)
	 pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
	 pc : strlen+0x24/0xb0
	 lr : kernfs_name_hash+0x1c/0xc4
	 sp : ffffffc081f97380
	 x29: ffffffc081f97380 x28: ffffffc081f97b90 x27: ffffff80c821c2a0
	 x26: ffffffedac0be418 x25: 0000000000000000 x24: ffffff80c09d2000
	 x23: 0000000000000000 x22: 0000000000000000 x21: 0000000000000000
	 x20: 0000000000000000 x19: 0000000000000000 x18: 0000000000001840
	 x17: 0000000000000000 x16: 0000000000000000 x15: 1ffffff8103f2e42
	 x14: 00000000f1f1f1f1 x13: 0000000000000004 x12: ffffffb01812d61d
	 x11: 1ffffff01812d61c x10: ffffffb01812d61c x9 : dfffffc000000000
	 x8 : 0000004fe7ed29e4 x7 : ffffff80c096b0e7 x6 : 0000000000000001
	 x5 : ffffff80c096b0e0 x4 : 1ffffffdb990efa2 x3 : 0000000000000000
	 x2 : 0000000000000000 x1 : dfffffc000000000 x0 : 0000000000000000
	 Call trace:
	  strlen+0x24/0xb0
	  kernfs_name_hash+0x1c/0xc4
	  kernfs_find_ns+0x118/0x2e8
	  kernfs_remove_by_name_ns+0x80/0x100
	  sysfs_remove_link+0x74/0xa8
	  module_add_driver+0x278/0x394
	  bus_add_driver+0x1f0/0x43c
	  driver_register+0xf4/0x3c0
	  __platform_driver_register+0x60/0x88
	  of_fpga_region_init+0x20/0x1000 [of_fpga_region]
	  do_one_initcall+0x110/0x788
	  do_init_module+0x1dc/0x5c8
	  load_module+0x3c38/0x4cac
	  init_module_from_file+0xd4/0x128
	  idempotent_init_module+0x2cc/0x528
	  __arm64_sys_finit_module+0xac/0x100
	  invoke_syscall+0x6c/0x258
	  el0_svc_common.constprop.0+0x160/0x22c
	  do_el0_svc+0x44/0x5c
	  el0_svc+0x48/0xb8
	  el0t_64_sync_handler+0x13c/0x158
	  el0t_64_sync+0x190/0x194
	 Code: f2fbffe1 a90157f4 12000802 aa0003f5 (38e16861)
	 ---[ end trace 0000000000000000 ]---
	 Kernel panic - not syncing: Oops: Fatal exception

Fixes: 85d2b0a ("module: don't ignore sysfs_create_link() failures")
Signed-off-by: Jinjie Ruan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment