Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.15] Track Steam performance patches #16

Closed
wants to merge 45 commits into from

Conversation

kakra
Copy link
Owner

@kakra kakra commented Nov 13, 2021

@orbea
Copy link

orbea commented Mar 29, 2022

@kakra Is there any information on usage for these patches? I enabled the winesync kernel module, but I am not sure what else needs to be done?

@kakra
Copy link
Owner Author

kakra commented Mar 30, 2022

The winesync kernel module itself does nothing unless you also use the appropriate wine patches. There's very low activity around these patches, last time I looked at it, some commits still contained todo and fixme lines. It seems it has since been rebased. The patches should be here: https://repo.or.cz/wine/zf.git/shortlog/refs/heads/fastsync3

This is the commit, that actually opens the winesync device: https://repo.or.cz/wine/zf.git/commitdiff/85481c0a11baabc529c252fd36e58ee9e626860d (search for /dev/winesync) ... just to identify we are using the correct fastsync branch. The other patches following it will be needed, too, and the previous patch.

I don't think that any of the major protonized wine distributions currently include these patches so you'd need to rebase the patchset yourself. They'll conflict with esync and fsync so you'd need to either remove those, or integrate it properly (those conflicts should be easy to resolve, just take care to enable winesync with highest priority when detected, rebuilding of the wineserver protocol headers will be needed after resolving conflicts).

It looks like Tkg has limited support for it but it's not enabled by default: https://github.com/Frogging-Family/wine-tkg-git/tree/master/wine-tkg-git/wine-tkg-patches/misc/fastsync

So with the kernel module enabled, you may receive better support from Tkg on how to actually make use of the module in wine.

Since winesync is still in very early (and silent) stages, I do not currently look into updates for the kernel module, it might be out of date. But I don't think there have been any critical updates to it. If you find evidence for a new kernel patch revision, let me know, and I'd happily update this patchset.

@orbea
Copy link

orbea commented Mar 30, 2022

Thank you, that is useful information. The xanmod project also seems to have kernel patches for winesync, but I haven't compared the patches with what is here.

https://xanmod.org/

If none of the major proton wine builds are using these patches how does this PR relate to steam? Maybe there is something else I missed?

@kakra
Copy link
Owner Author

kakra commented Mar 30, 2022

I just collect patches somehow related to my Steam installation here... It'll also improve non-Steam gaming probably. It's just a personal preference, and I had to give it a name. OTOH, we have to consider that these changes were mostly pushed by Valve activities (directly and indirectly) - so this gives "Steam" some credit for it. ;-)

@kakra
Copy link
Owner Author

kakra commented Mar 30, 2022

I'm using sets of patches for different systems, this is one of the kernel patchsets I'm using for the system that has Steam installed - maybe take it that way... ;-)

@orbea
Copy link

orbea commented Mar 30, 2022

That makes sense, thanks for explaining! I'm not sure how much time I will spend getting winesync to work right now, but I'll update here if I have any more information. :)

@kakra kakra force-pushed the rebase-5.15/steam-patches branch from d3b6690 to cde56e4 Compare June 2, 2022 07:11
kakra pushed a commit that referenced this pull request Aug 1, 2022
[ Upstream commit e4a41c2 ]

The following error is reported when running "./test_progs -t for_each"
under arm64:

  bpf_jit: multi-func JIT bug 58 != 56
  [...]
  JIT doesn't support bpf-to-bpf calls

The root cause is the size of BPF_PSEUDO_FUNC instruction increases
from 2 to 3 after the address of called bpf-function is settled and
there are two bpf-to-bpf calls in test_pkt_access. The generated
instructions are shown below:

  0x48:  21 00 C0 D2    movz x1, #0x1, lsl #32
  0x4c:  21 00 80 F2    movk x1, #0x1

  0x48:  E1 3F C0 92    movn x1, #0x1ff, lsl #32
  0x4c:  41 FE A2 F2    movk x1, #0x17f2, lsl #16
  0x50:  81 70 9F F2    movk x1, #0xfb84

Fixing it by using emit_addr_mov_i64() for BPF_PSEUDO_FUNC, so
the size of jited image will not change.

Fixes: 69c087b ("bpf: Add bpf_for_each_map_elem() helper")
Signed-off-by: Hou Tao <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
@kakra kakra force-pushed the rebase-5.15/steam-patches branch from cde56e4 to 4d58cd7 Compare August 1, 2022 22:03
@kakra kakra added the done To be superseded by next LTS label Dec 27, 2022
andrealmeid and others added 18 commits January 10, 2023 10:24
Add support to wait on multiple futexes. This is the interface
implemented by this syscall:

futex_waitv(struct futex_waitv *waiters, unsigned int nr_futexes,
	    unsigned int flags, struct timespec *timeout, clockid_t clockid)

struct futex_waitv {
	__u64 val;
	__u64 uaddr;
	__u32 flags;
	__u32 __reserved;
};

Given an array of struct futex_waitv, wait on each uaddr. The thread
wakes if a futex_wake() is performed at any uaddr. The syscall returns
immediately if any waiter has *uaddr != val. *timeout is an optional
absolute timeout value for the operation. This syscall supports only
64bit sized timeout structs. The flags argument of the syscall should be
empty, but it can be used for future extensions. Flags for shared
futexes, sizes, etc. should be used on the individual flags of each
waiter.

__reserved is used for explicit padding and should be 0, but it might be
used for future extensions. If the userspace uses 32-bit pointers, it
should make sure to explicitly cast it when assigning to waitv::uaddr.

Returns the array index of one of the woken futexes. There’s no given
information of how many were woken, or any particular attribute of it
(if it’s the first woken, if it is of the smaller index...).

Signed-off-by: André Almeida <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Wire up syscall entry point for x86 arch, for both i386 and x86_64.

Signed-off-by: André Almeida <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Zebediah Figura and others added 27 commits January 10, 2023 10:24
Currently, the table that stores information about the connected hidraw
devices has a mutex to prevent concurrent hidraw users to manipulate the
hidraw table (e.g. delete an entry) while someone is trying to use
the table (e.g. issuing an ioctl to the device), preventing the kernel
to referencing a NULL pointer. However, since that every user that wants
to access the table for both manipulating it and reading it content,
this prevents concurrent access to the table for read-only operations
for different or the same device (e.g. two hidraw ioctls can't happen at
the same time, even if they are completely unrelated).

This proves to be a bottleneck and gives performance issues when using
multiple HID devices at same time, like VR kits where one can have two
controllers, the headset and some tracking sensors.

To improve the performance, replace the table mutex with a read-write
semaphore, enabling multiple threads to issue parallel syscalls to
multiple devices at the same time while protecting the table for
concurrent modifications.

Signed-off-by: André Almeida <[email protected]>
Use [defer+madvise] as default khugepaged defrag strategy:

For some reason, the default strategy to respond to THP fault fallbacks
is still just madvise, meaning stall if the program wants transparent
hugepages, but don't trigger a background reclaim / compaction if THP
begins to fail allocations.  This creates a snowball affect where we
still use the THP code paths, but we almost always fail once a system
has been active and busy for a while.

The option "defer" was created for interactive systems where THP can
still improve performance.  If we have to fallback to a regular page due
to an allocation failure or anything else, we will trigger a background
reclaim and compaction so future THP attempts succeed and previous
attempts eventually have their smaller pages combined without stalling
running applications.

We still want madvise to stall applications that explicitely want THP,
so defer+madvise _does_ make a ton of sense.  Make it the default for
interactive systems, especially if the kernel maintainer left
transparent hugepages on "always".

Reasoning and details in the original patch: https://lwn.net/Articles/711248/

Signed-off-by: Kai Krakow <[email protected]>
Also add ifdefs so that elevator_get_default() remains unchanged with
respect to upstream if CONFIG_IOSCHED_BFQ is disabled.

Signed-off-by: Juuso Alasuutari <[email protected]>
@kakra kakra closed this Mar 11, 2023
kakra pushed a commit that referenced this pull request Sep 9, 2024
[ Upstream commit a699781 ]

A sysfs reader can race with a device reset or removal, attempting to
read device state when the device is not actually present. eg:

     [exception RIP: qed_get_current_link+17]
  #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
  #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
 torvalds#13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
 #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
 #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb

 crash> struct net_device.state ffff9a9d21336000
    state = 5,

state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
The device is not present, note lack of __LINK_STATE_PRESENT (0b10).

This is the same sort of panic as observed in commit 4224cfd
("net-sysfs: add check for netdevice being present to speed_show").

There are many other callers of __ethtool_get_link_ksettings() which
don't have a device presence check.

Move this check into ethtool to protect all callers.

Fixes: d519e17 ("net: export device speed and duplex via sysfs")
Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show")
Signed-off-by: Jamie Bainbridge <[email protected]>
Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
done To be superseded by next LTS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants