Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GW2 w/ Futex2 on 5.13.1: CoherentUIHost going nuts on CPU usage (~11Cores) #217

Closed
Atemu opened this issue Jul 15, 2021 · 7 comments
Closed
Assignees

Comments

@Atemu
Copy link

Atemu commented Jul 15, 2021

I have no knowledge or ability to debug this but this did not happen on 5.12, nor does it happen with 5.13.1 when using esync instead.

This makes the game nearly unplayable (low 20s) and stresses the whole CPU for no good reason.

@Atemu Atemu changed the title GW2 w/ Futex2: CoherentUIHost going nuts on CPU usage (~11Cores) GW2 w/ Futex2 on 5.13.1: CoherentUIHost going nuts on CPU usage (~11Cores) Jul 15, 2021
@damentz
Copy link
Member

damentz commented Jul 17, 2021

@Atemu this should be fixed now in 5.13/master. We'll need to wait for @heftig to spin another build of linux-zen.

And details of what I did. I'm not entirely sure what was in 5.13/futex2 before, but I believe there were porting errors and missing patches. To fix, I took the v4 patch set [1] (based on 5.13 rc something), and added some proton and basic fixes on top. Did a basic build test and everything's hunky dory.

[1] http://patchwork.sourceware.org/project/glibc/cover/[email protected]/

@Atemu
Copy link
Author

Atemu commented Jul 18, 2021

Hi @damentz, thanks for looking into this. Unfortunately, it's still happening on 18cfd38 though.

If you want to reproduce this for yourself, GW2's launcher also exhibits this behaviour and can easily be installed via Lutris without downloading the whole 50G game.

@damentz
Copy link
Member

damentz commented Jul 19, 2021

@Atemu thanks for giving the updated master branch a shot. I sent André an email asking if he could provide a port of his futex2-proton working branch for v5.13 and pointed him to the problems you found in this issue. Hopefully we can come out of this with a better futex2 than what we have on 5.12 🤞

damentz added a commit that referenced this issue Jul 20, 2021
The v4 patch set doesn't work well in 5.13 per github issue [1].  After
a reponse from André, he recommends using the patch from linux-tkg [2].
Even though the v4 patch set should be better, it's an in-progress patch
set and he recommends using an older version if the current doesn't work
out for now.

[1] #217 (comment)
[2] https://github.com/Frogging-Family/linux-tkg/blob/master/linux-tkg-patches/5.13/0007-v5.13-futex2_interface.patch

This reverts commit 67c07b3, reversing
changes made to f1a8de1.
damentz added a commit that referenced this issue Jul 20, 2021
@damentz
Copy link
Member

damentz commented Jul 20, 2021

@Atemu can you try 5.13/master again? Per André's recommendation, he suggested the patch by the linux-tkg folk at https://github.com/Frogging-Family/linux-tkg/blob/master/linux-tkg-patches/5.13/0007-v5.13-futex2_interface.patch.

André pointed out that even though it's based on a much older futex2 patch set, since the futex2 patch is a work-in-progress, it's expected that there'll be regressions as new drafts come out. Or TL;DR, this should work now 🤞

@damentz
Copy link
Member

damentz commented Jul 22, 2021

@Atemu Does latest Zen Kernel resolve CPU usage issues with GW2?

@damentz
Copy link
Member

damentz commented Jul 24, 2021

I'll mark this issue as resolved. Zen Kernel is now using the same patch set used by TKG and XanMod. If there was an issue with this patch then it would be affecting all kernels. Let me know if there's an issue, otherwise I'm marking this as resolved.

@damentz damentz closed this as completed Jul 24, 2021
@Atemu
Copy link
Author

Atemu commented Jul 26, 2021

Yup, it's fixed now. Thanks!

damentz added a commit that referenced this issue Aug 7, 2021
Turns out the reason this version of futex2 resolved #217 is that it
doesn't actually work.  Launching a game through proton shows that esync
is used instead of fsync.

Issue #228 was opened later showing this exact issue and I reproduced on
my local system.  Will try using Andre's latest patch set to LKML from
August 5th, 2021.

This reverts commit 1efbf88, reversing
changes made to 3136f92.
damentz pushed a commit that referenced this issue Nov 26, 2022
…kprobe_event_gen_test_exit()

commit e0d7526 upstream.

When trace_get_event_file() failed, gen_kretprobe_test will be assigned
as the error code. If module kprobe_event_gen_test is removed now, the
null pointer dereference will happen in kprobe_event_gen_test_exit().
Check if gen_kprobe_test or gen_kretprobe_test is error code or NULL
before dereference them.

BUG: kernel NULL pointer dereference, address: 0000000000000012
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 3 PID: 2210 Comm: modprobe Not tainted
6.1.0-rc1-00171-g2159299a3b74-dirty #217
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:kprobe_event_gen_test_exit+0x1c/0xb5 [kprobe_event_gen_test]
Code: Unable to access opcode bytes at 0xffffffff9ffffff2.
RSP: 0018:ffffc900015bfeb8 EFLAGS: 00010246
RAX: ffffffffffffffea RBX: ffffffffa0002080 RCX: 0000000000000000
RDX: ffffffffa0001054 RSI: ffffffffa0001064 RDI: ffffffffdfc6349c
RBP: ffffffffa0000000 R08: 0000000000000004 R09: 00000000001e95c0
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000800
R13: ffffffffa0002420 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f56b75be540(0000) GS:ffff88813bc00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff9ffffff2 CR3: 000000010874a006 CR4: 0000000000330ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __x64_sys_delete_module+0x206/0x380
 ? lockdep_hardirqs_on_prepare+0xd8/0x190
 ? syscall_enter_from_user_mode+0x1c/0x50
 do_syscall_64+0x3f/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Link: https://lore.kernel.org/all/[email protected]/

Fixes: 6483624 ("tracing: Add kprobe event command generation test module")
Signed-off-by: Shang XiaoJing <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Cc: [email protected]
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
damentz pushed a commit that referenced this issue Dec 2, 2022
commit e428e96 upstream.

If a kernel thread is created by a user thread, it may carry FPU/SIMD
thread info flags (TIF_USEDFPU, TIF_USEDSIMD, etc.). Then it will be
considered as a fpu owner and kernel try to save its FPU/SIMD context
and cause such errors:

[   41.518931] do_fpu invoked from kernel context![#1]:
[   41.523933] CPU: 1 PID: 395 Comm: iou-wrk-394 Not tainted 6.1.0-rc5+ #217
[   41.530757] Hardware name: Loongson Loongson-3A5000-7A1000-1w-CRB/Loongson-LS3A5000-7A1000-1w-CRB, BIOS vUDK2018-LoongArch-V2.0.pre-beta8 08/18/2022
[   41.544064] $ 0   : 0000000000000000 90000000011e9468 9000000106c7c000 9000000106c7fcf0
[   41.552101] $ 4   : 9000000106305d40 9000000106689800 9000000106c7fd08 0000000003995818
[   41.560138] $ 8   : 0000000000000001 90000000009a72e4 0000000000000020 fffffffffffffffc
[   41.568174] $12   : 0000000000000000 0000000000000000 0000000000000020 00000009aab7e130
[   41.576211] $16   : 00000000000001ff 0000000000000407 0000000000000001 0000000000000000
[   41.584247] $20   : 0000000000000000 0000000000000001 9000000106c7fd70 90000001002f0400
[   41.592284] $24   : 0000000000000000 900000000178f740 90000000011e9834 90000001063057c0
[   41.600320] $28   : 0000000000000000 0000000000000001 9000000006826b40 9000000106305140
[   41.608356] era   : 9000000000228848 _save_fp+0x0/0xd8
[   41.613542] ra    : 90000000011e9468 __schedule+0x568/0x8d0
[   41.619160] CSR crmd: 000000b0
[   41.619163] CSR prmd: 00000000
[   41.622359] CSR euen: 00000000
[   41.625558] CSR ecfg: 00071c1c
[   41.628756] CSR estat: 000f0000
[   41.635239] ExcCode : f (SubCode 0)
[   41.638783] PrId  : 0014c010 (Loongson-64bit)
[   41.643191] Modules linked in: acpi_ipmi vfat fat ipmi_si ipmi_devintf cfg80211 ipmi_msghandler rfkill fuse efivarfs
[   41.653734] Process iou-wrk-394 (pid: 395, threadinfo=0000000004ebe913, task=00000000636fa1be)
[   41.662375] Stack : 00000000ffff0875 9000000006800ec0 9000000006800ec0 90000000002d57e0
[   41.670412]         0000000000000001 0000000000000000 9000000106535880 0000000000000001
[   41.678450]         9000000105291800 0000000000000000 9000000105291838 900000000178e000
[   41.686487]         9000000106c7fd90 9000000106305140 0000000000000001 90000000011e9834
[   41.694523]         00000000ffff0875 90000000011f034c 9000000105291838 9000000105291830
[   41.702561]         0000000000000000 9000000006801440 00000000ffff0875 90000000002d48c0
[   41.710597]         9000000128800001 9000000106305140 9000000105291838 9000000105291838
[   41.718634]         9000000105291830 9000000107811740 9000000105291848 90000000009bf1e0
[   41.726672]         9000000105291830 9000000107811748 2d6b72772d756f69 0000000000343933
[   41.734708]         0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   41.742745]         ...
[   41.745252] Call Trace:
[   42.197868] [<9000000000228848>] _save_fp+0x0/0xd8
[   42.205214] [<90000000011ed468>] __schedule+0x568/0x8d0
[   42.210485] [<90000000011ed834>] schedule+0x64/0xd4
[   42.215411] [<90000000011f434c>] schedule_timeout+0x88/0x188
[   42.221115] [<90000000009c36d0>] io_wqe_worker+0x184/0x350
[   42.226645] [<9000000000221cf0>] ret_from_kernel_thread+0xc/0x9c

This can be easily triggered by ltp testcase syscalls/io_uring02 and it
can also be easily fixed by clearing the FPU/SIMD thread info flags for
kernel threads in copy_thread().

Cc: [email protected]
Reported-by: Qi Hu <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants