[6.1] Track Steam performance patches #23

kakra · 2023-03-11T15:52:47Z

Export patch series: https://github.com/kakra/linux/pull/23.patch

winesync: experimental winesync device driver which can be used by some Proton versions
hugepages background reclaim: patch cherry-picked from ZEN
threaded IRQs by default: cherry-picked from CK
always use bfq IO scheduler by default: although it might benchmark lower throughput it is almost always better for more consistent desktop IO latency during high IO write loads
memory soft-dirty flag: used by Proton to support Windows memory write monitoring with better performance
ACS patch: for whoever may need it, patch may be dropped at any time
futex backward compatibility patch: Properly supports older Proton versions using the latest futex kernel functions
lower latency scheduling: CFS patches cherry-picked and combined from TKG, Pop! OS and PF
memory management: improved scheduling for huge memory pages and memory zones, cherry-picked from ZEN
readahead patches: IO readahead raised to 2 MB to match huge pages, cherry-picked from XANMOD
raised vm.max_map_count: as suggested by Valve (in Steam Deck) and TKG, cherry-picked from TKG

Many patches are enabled unconditionally, e.g., there's no config flag to enable ZEN patches as in their original patchset. This is because there's no point otherwise in using this patchset for me.

orbea · 2023-03-12T03:50:24Z

@kakra This doesn't apply against a 6.1.18 kernel from kernel.org, am I missing something?

kakra · 2023-03-12T10:26:07Z

@orbea Maybe, my distribution doesn't have kernel 6.1.18 yet, so I didn't try. I'm on 6.17. I'll bump the base-6.1 branch when I'm seeing conflicts.

But I just checked: 6.1.18 is available since today, so stay tuned.

kakra · 2023-03-12T10:41:45Z

Yep, confirmed: The conflict is in the ACS patch because a new PCIe quirk has been added. Easy fix, new patchset will be available once I rebooted with the patched kernel.

kakra · 2023-03-12T11:47:10Z

@orbea Bumped to 6.1.18

kakra · 2023-05-20T09:47:11Z

New patches added for better scheduling, memory, and IO latency. This also improves compatibility with some demanding games like Detroit: Become Human by raising vm.max_map_count by default, similar to what Valve does on the Steam Deck.

Signed-off-by: Kai Krakow <[email protected]>

…C_IOC_WAIT_ANY. Signed-off-by: Kai Krakow <[email protected]>

…C_IOC_WAIT_ALL. Signed-off-by: Kai Krakow <[email protected]>

Signed-off-by: Kai Krakow <[email protected]>

…events. Signed-off-by: Kai Krakow <[email protected]>

Signed-off-by: Kai Krakow <[email protected]>

Use [defer+madvise] as default khugepaged defrag strategy: For some reason, the default strategy to respond to THP fault fallbacks is still just madvise, meaning stall if the program wants transparent hugepages, but don't trigger a background reclaim / compaction if THP begins to fail allocations. This creates a snowball affect where we still use the THP code paths, but we almost always fail once a system has been active and busy for a while. The option "defer" was created for interactive systems where THP can still improve performance. If we have to fallback to a regular page due to an allocation failure or anything else, we will trigger a background reclaim and compaction so future THP attempts succeed and previous attempts eventually have their smaller pages combined without stalling running applications. We still want madvise to stall applications that explicitely want THP, so defer+madvise _does_ make a ton of sense. Make it the default for interactive systems, especially if the kernel maintainer left transparent hugepages on "always". Reasoning and details in the original patch: https://lwn.net/Articles/711248/ Signed-off-by: Kai Krakow <[email protected]>

Also add ifdefs so that elevator_get_default() remains unchanged with respect to upstream if CONFIG_IOSCHED_BFQ is disabled. Signed-off-by: Juuso Alasuutari <[email protected]>

Signed-off-by: Kai Krakow <[email protected]>

This an updated version of Alex Williamson's patch from: https://lkml.org/lkml/2013/5/30/513 Original commit message follows: PCIe ACS (Access Control Services) is the PCIe 2.0+ feature that allows us to control whether transactions are allowed to be redirected in various subnodes of a PCIe topology. For instance, if two endpoints are below a root port or downsteam switch port, the downstream port may optionally redirect transactions between the devices, bypassing upstream devices. The same can happen internally on multifunction devices. The transaction may never be visible to the upstream devices. One upstream device that we particularly care about is the IOMMU. If a redirection occurs in the topology below the IOMMU, then the IOMMU cannot provide isolation between devices. This is why the PCIe spec encourages topologies to include ACS support. Without it, we have to assume peer-to-peer DMA within a hierarchy can bypass IOMMU isolation. Unfortunately, far too many topologies do not support ACS to make this a steadfast requirement. Even the latest chipsets from Intel are only sporadically supporting ACS. We have trouble getting interconnect vendors to include the PCIe spec required PCIe capability, let alone suggested features. Therefore, we need to add some flexibility. The pcie_acs_override= boot option lets users opt-in specific devices or sets of devices to assume ACS support. The "downstream" option assumes full ACS support on root ports and downstream switch ports. The "multifunction" option assumes the subset of ACS features available on multifunction endpoints and upstream switch ports are supported. The "id:nnnn:nnnn" option enables ACS support on devices matching the provided vendor and device IDs, allowing more strategic ACS overrides. These options may be combined in any order. A maximum of 16 id specific overrides are available. It's suggested to use the most limited set of options necessary to avoid completely disabling ACS across the topology. Note to hardware vendors, we have facilities to permanently quirk specific devices which enforce isolation but not provide an ACS capability. Please contact me to have your devices added and save your customers the hassle of this boot option. Signed-off-by: Mark Weiman <[email protected]>

Add an option to wait on multiple futexes using the old interface, that uses opcode 31 through futex() syscall. Do that by just translation the old interface to use the new code. This allows old and stable versions of Proton to still use fsync in new kernel releases. Signed-off-by: André Almeida <[email protected]>

Link: https://codeberg.org/pf-kernel/linux Link: https://github.com/Frogging-Family/linux-tkg/tree/master/linux-tkg-patches Link: https://github.com/pop-os/system76-scheduler Signed-off-by: Kai Krakow <[email protected]>

Signed-off-by: Alexandre Frade <[email protected]>

…g delays The page allocator processes free pages in groups of pageblocks, where the size of a pageblock is typically quite large (1024 pages without hugetlbpage support). Pageblocks are processed atomically with the zone lock held, which can cause severe scheduling delays on both the CPU going through the pageblock and any other CPUs waiting to acquire the zone lock. A frequent offender is move_freepages_block(), which is used by rmqueue() for page allocation. As it turns out, there's no requirement for pageblocks to be so large, so the pageblock order can simply be reduced to ease the scheduling delays and zone lock contention. PAGE_ALLOC_COSTLY_ORDER is used as a reasonable setting to ensure non-costly page allocation requests can still be serviced without always needing to free up more than one pageblock's worth of pages at a time. This has a noticeable effect on overall system latency when memory pressure is elevated. The various mm functions which operate on pageblocks no longer appear in the preemptoff tracer, where previously they would spend up to 100 ms on a mobile arm64 CPU processing a pageblock with preemption disabled and the zone lock held. Signed-off-by: Sultan Alsawaf <[email protected]>

There is noticeable scheduling latency and heavy zone lock contention stemming from rmqueue_bulk's single hold of the zone lock while doing its work, as seen with the preemptoff tracer. There's no actual need for rmqueue_bulk() to hold the zone lock the entire time; it only does so for supposed efficiency. As such, we can relax the zone lock and even reschedule when IRQs are enabled in order to keep the scheduling delays and zone lock contention at bay. Forward progress is still guaranteed, as the zone lock can only be relaxed after page removal. With this change, rmqueue_bulk() no longer appears as a serious offender in the preemptoff tracer, and system latency is noticeably improved. Signed-off-by: Sultan Alsawaf <[email protected]>

The value is still pretty low, and AMD64-ABI and ELF extended numbering supports that, so we should be fine on modern x86 systems. This fixes crashes in some applications using more than 65535 vmas (also affects some windows games running in wine, such as Star Citizen). Signed-off-by: Kai Krakow <[email protected]>

Some games such as Detroit: Become Human tend to be very crash prone with lower values. Signed-off-by: Kai Krakow <[email protected]>

Tejun reported that when he targets workqueues towards a specific LLC on his Zen2 machine with 3 cores / LLC and 4 LLCs in total, he gets significant idle time. This is, of course, because of how select_idle_sibling() will not consider anything outside of the local LLC, and since all these tasks are short running the periodic idle load balancer is ineffective. And while it is good to keep work cache local, it is better to not have significant idle time. Therefore, have select_idle_sibling() try other LLCs inside the same node when the local one comes up empty. Reported-by: Tejun Heo <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>

kakra · 2023-11-26T05:53:21Z

Rebased to #30

kakra marked this pull request as draft March 11, 2023 16:19

kakra force-pushed the rebase-6.1/steam-patches branch from 2512f84 to be074eb Compare March 12, 2023 11:46

Zebediah Figura added 23 commits October 11, 2023 21:53

winesync: Introduce the winesync driver and character device.

af300b7

Signed-off-by: Kai Krakow <[email protected]>

winesync: Reserve a minor device number and ioctl range.

ab85c28

Signed-off-by: Kai Krakow <[email protected]>

winesync: Introduce WINESYNC_IOC_CREATE_SEM and WINESYNC_IOC_DELETE.

da8c53f

Signed-off-by: Kai Krakow <[email protected]>

winesync: Introduce WINESYNC_IOC_PUT_SEM.

6089e29

Signed-off-by: Kai Krakow <[email protected]>

winesync: Introduce WINESYNC_IOC_WAIT_ANY.

90274ee

Signed-off-by: Kai Krakow <[email protected]>

winesync: Introduce WINESYNC_IOC_WAIT_ALL.

75a6a4e

Signed-off-by: Kai Krakow <[email protected]>

winesync: Introduce WINESYNC_IOC_CREATE_MUTEX.

d163b59

Signed-off-by: Kai Krakow <[email protected]>

winesync: Introduce WINESYNC_IOC_PUT_MUTEX.

cdebeb4

Signed-off-by: Kai Krakow <[email protected]>

winesync: Introduce WINESYNC_IOC_KILL_OWNER.

5df341f

Signed-off-by: Kai Krakow <[email protected]>

winesync: Introduce WINESYNC_IOC_READ_SEM.

8601753

Signed-off-by: Kai Krakow <[email protected]>

winesync: Introduce WINESYNC_IOC_READ_MUTEX.

1134b2f

Signed-off-by: Kai Krakow <[email protected]>

docs: winesync: Add documentation for the winesync uAPI.

0a9f622

Signed-off-by: Kai Krakow <[email protected]>

selftests: winesync: Add some tests for semaphore state.

933b724

Signed-off-by: Kai Krakow <[email protected]>

selftests: winesync: Add some tests for mutex state.

224ff64

Signed-off-by: Kai Krakow <[email protected]>

selftests: winesync: Add some tests for WINESYNC_IOC_WAIT_ANY.

5fcf183

Signed-off-by: Kai Krakow <[email protected]>

selftests: winesync: Add some tests for WINESYNC_IOC_WAIT_ALL.

5550802

Signed-off-by: Kai Krakow <[email protected]>

selftests: winesync: Add some tests for invalid object handling.

5baf952

Signed-off-by: Kai Krakow <[email protected]>

selftests: winesync: Add some tests for wakeup signaling with WINESYN…

b3d5425

…C_IOC_WAIT_ANY. Signed-off-by: Kai Krakow <[email protected]>

selftests: winesync: Add some tests for wakeup signaling with WINESYN…

6016d6f

…C_IOC_WAIT_ALL. Signed-off-by: Kai Krakow <[email protected]>

maintainers: Add an entry for winesync.

e7b33ff

Signed-off-by: Kai Krakow <[email protected]>

winesync: Introduce WINESYNC_IOC_CREATE_EVENT.

951a50b

Signed-off-by: Kai Krakow <[email protected]>

winesync: Introduce WINESYNC_IOC_SET_EVENT.

7f3746b

Signed-off-by: Kai Krakow <[email protected]>

winesync: Introduce WINESYNC_IOC_RESET_EVENT.

344afe7

Signed-off-by: Kai Krakow <[email protected]>

Zebediah Figura and others added 25 commits October 11, 2023 21:53

winesync: Introduce WINESYNC_IOC_PULSE_EVENT.

51e1585

Signed-off-by: Kai Krakow <[email protected]>

winesync: Introduce WINESYNC_IOC_READ_EVENT.

76a83ac

Signed-off-by: Kai Krakow <[email protected]>

selftests: winesync: Add some tests for manual-reset event state.

8854d6d

Signed-off-by: Kai Krakow <[email protected]>

selftests: winesync: Add some tests for auto-reset event state.

ec4618b

Signed-off-by: Kai Krakow <[email protected]>

selftests: winesync: Add some tests for wakeup signaling with events.

d51e698

Signed-off-by: Kai Krakow <[email protected]>

selftests: winesync: Add some tests for invalid object handling with …

a6136d4

…events. Signed-off-by: Kai Krakow <[email protected]>

docs: winesync: Document event APIs.

09c86f2

Signed-off-by: Kai Krakow <[email protected]>

winesync: Introduce alertable waits.

b128c31

Signed-off-by: Kai Krakow <[email protected]>

selftests: winesync: Add tests for alertable waits.

9e2ea60

Signed-off-by: Kai Krakow <[email protected]>

serftests: winesync: Add some tests for wakeup signaling via alerts.

e570237

Signed-off-by: Kai Krakow <[email protected]>

docs: winesync: Document alertable waits.

d9231f5

Signed-off-by: Kai Krakow <[email protected]>

blk: elevator: always use bfq unless overridden by flag

a45b6ca

Also add ifdefs so that elevator_get_default() remains unchanged with respect to upstream if CONFIG_IOSCHED_BFQ is disabled. Signed-off-by: Juuso Alasuutari <[email protected]>

Make threaded IRQs optionally the default which can be disabled.

b8c44ff

Signed-off-by: Kai Krakow <[email protected]>

mm: Support soft dirty flag reset for VA range.

34292ab

Signed-off-by: Kai Krakow <[email protected]>

mm: Support soft dirty flag read with reset.

28cc36e

Signed-off-by: Kai Krakow <[email protected]>

cfs: Tune CFS for lower scheduling latency

ef2bf78

Link: https://codeberg.org/pf-kernel/linux Link: https://github.com/Frogging-Family/linux-tkg/tree/master/linux-tkg-patches Link: https://github.com/pop-os/system76-scheduler Signed-off-by: Kai Krakow <[email protected]>

mm: set 2 megabytes for address_space-level file read-ahead pages size

76d59e7

Signed-off-by: Alexandre Frade <[email protected]>

mm: bump DEFAULT_MAX_MAP_COUNT

97ad6cf

Some games such as Detroit: Become Human tend to be very crash prone with lower values. Signed-off-by: Kai Krakow <[email protected]>

kakra force-pushed the rebase-6.1/steam-patches branch from 9c8331b to 5bc7728 Compare October 11, 2023 21:42

kakra added the done To be superseded by next LTS label Nov 26, 2023

kakra closed this Nov 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[6.1] Track Steam performance patches #23

[6.1] Track Steam performance patches #23

kakra commented Mar 11, 2023 •

edited

Loading

orbea commented Mar 12, 2023

kakra commented Mar 12, 2023

kakra commented Mar 12, 2023

kakra commented Mar 12, 2023

kakra commented May 20, 2023

kakra commented Nov 26, 2023

[6.1] Track Steam performance patches #23

[6.1] Track Steam performance patches #23

Conversation

kakra commented Mar 11, 2023 • edited Loading

orbea commented Mar 12, 2023

kakra commented Mar 12, 2023

kakra commented Mar 12, 2023

kakra commented Mar 12, 2023

kakra commented May 20, 2023

kakra commented Nov 26, 2023

kakra commented Mar 11, 2023 •

edited

Loading