Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proton: Force disable AMD switchable graphics layer #4931

Conversation

misyltoad
Copy link
Contributor

This crap layer is still horribly broken and enabled by default.

Signed-off-by: Joshua Ashton [email protected]

rbernon and others added 12 commits June 25, 2021 09:32
Add https://github.com/jp7677/dxvk-nvapi as a submodule. dxvk-nvapi will
not be copied into Proton prefixes by default, but instead will be
controlled via the environment variable PROTON_ENABLE_NVAPI. This is
done to avoid any potential adverse effects of the nvapi DLL existing
in cases where an application may require a function that is not
implemented by dxvk-nvapi.

This new functionality can be enabled by setting the following environment
variable to a value of `1`:
    `PROTON_ENABLE_NVAPI`

This functionality is needed in order to support DLSS within Proton.

Reviewed-by: Adam Moss <[email protected]>
The upcoming NVIDIA 470 driver series will introduce a DLL (nvngx.dll)
for the support of NVIDIA DLSS in Proton. This change adds logic for
discovering the location of DLL files provided by the NVIDIA driver, and
copies them to C:\Windows\System32\

Reviewed-by: Adam Moss <[email protected]>
proton Outdated Show resolved Hide resolved
@misyltoad misyltoad force-pushed the disable-stupid-amd-layer branch from 3f7872f to 0674071 Compare June 30, 2021 02:14
This crap layer is still horribly broken and enabled by default.

Signed-off-by: Joshua Ashton <[email protected]>
@misyltoad misyltoad force-pushed the disable-stupid-amd-layer branch from 0674071 to 1d06d84 Compare June 30, 2021 02:15
@aeikum
Copy link
Collaborator

aeikum commented Jul 6, 2021

I'm missing some context here. What is this layer and what problems is it causing? Does it affect non-Proton applications? If so, why is disabling it in Proton the best option instead of wherever it's coming from?

@misyltoad
Copy link
Contributor Author

@aeikum This layer comes from the AMDVLK driver, it removes RADV from being enumerated entirely which is problematic in itself, but additionally, it also tends to break applications entirely, eg. it makes vulkaninfo crash or apps fail to launch/crash on startup. It's broken and has been a source of several bug reports in DXVK/Proton.

Ideally something like this would be disabled by the Steam Client, but Proton is the best option for now.

I've been bit by this recently when trying to test something on AMDVLK. The worst part is that when it is active, it doesn't just remove RADV, it tends to just end up breaking things entirely and stopping anything from launching.

@aeikum
Copy link
Collaborator

aeikum commented Jul 30, 2021

Has someone reported the problem to them? I found this: GPUOpen-Drivers/AMDVLK#196 but couldn't find anything else relevant in their tracker.

@aeikum
Copy link
Collaborator

aeikum commented Jul 30, 2021

And this: GPUOpen-Drivers/AMDVLK#195 They seem to think it should be working now, someone should let them know it isn't.

@jinjianrong
Copy link

@Joshua-Ashton are you still seeing the issue on AMD switchable graphics layer with latest AMDVLK release? could you provide a test case to reproduce the issue? If you don't want to use the layer, it's better to disable it in your AMDVLK installation instead of doing this in Proton which will cause other problems when both AMDVLK and RADV are installed.

@misyltoad
Copy link
Contributor Author

misyltoad commented Aug 6, 2021

@jinjianrong Installing the latest AMDVLK (amdvlk-2021.Q3.2-1) still breaks vulkaninfo and every app for me.

The selected gpu (0) is not a valid GPU index. The available GPUs are in the range of 0 to 18446744073709551615.

then it crashes.

I tested the Mesa device selection layer for fun and it works fine with vulkaninfo.

Disabling this layer by default is preferable because it's is still broken and doesn't need to exist at all. Quite often users will accidentally install AMDVLK, or have it be upgraded when they were testing before and automatically get this layer which forces them to magically switch to AMDVLK.

This gets bug reports and messages sent to me asking why stuff either crashes on startup because of the layer, users getting confused because they have RADV and are confused why it isn't being used, etc.

We desperately need something to solve the device selection and device ordering problems in the Vulkan ecosystem, but your layer is making this problem so much worse.

@jinjianrong
Copy link

@Joshua-Ashton could you provide the information of your system? The layer works well in our internal testing. we need to check if there is anything special to reproduce the issue and get the layer working perfectly.
if "users will accidentally install AMDVLK" on a system with RADV installed, I expect Vulkan will be broken even without the AMD switchable graphics layer.

@misyltoad
Copy link
Contributor Author

AFAIK, there is nothing special about my system or setup, I even decided to remove all my ICDs in /usr/local as well as any un-used ICDs just to be sure and it is still broken.

Here's my VulkanInfo without AMDVLK and system info.

vulkaninfo_no_amdvlk.txt

System:    Host: lilypad Kernel: 5.13.8-187-tkg-cacule x86_64 bits: 64 compiler: gcc v: 11.1.0 Desktop: KDE Plasma 5.22.80
           tk: Qt 5.15.2 wm: kwin_wayland dm: SDDM Distro: Arch Linux
CPU:       Info: 12-Core model: AMD Ryzen 9 3900X bits: 64 type: MT MCP arch: Zen 2 rev: 0 cache: L2: 6 MiB
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 182044
           Speed: 3604 MHz min/max: 2200/3800 MHz boost: enabled Core speeds (MHz): 1: 3604 2: 3572 3: 3520 4: 4191 5: 4035
           6: 3567 7: 3590 8: 3479 9: 4033 10: 4041 11: 4070 12: 3601 13: 3583 14: 3473 15: 3446 16: 3587 17: 3592 18: 3595
           19: 3443 20: 4102 21: 4012 22: 3539 23: 3591 24: 3595
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Vega 20 [Radeon VII] driver: amdgpu v: kernel bus-ID: 0b:00.0
           chip-ID: 1002:66af
           Device-2: NVIDIA TU106 [GeForce RTX 2060 SUPER] driver: vfio-pci v: 0.2 bus-ID: 0c:00.0 chip-ID: 10de:1f06
           Display: wayland server: X.Org 1.21.1.2 compositor: kwin_wayland driver: loaded: amdgpu unloaded: modesetting
           alternate: ati,fbdev,vesa resolution: 1: 3840x2160~60Hz 2: 2560x1440~60Hz s-dpi: 96
           OpenGL: renderer: AMD Radeon VII (VEGA20 DRM 3.41.0 5.13.8-187-tkg-cacule LLVM 12.0.1)
           v: 4.6 Mesa 21.3.0-devel (git-7055282231) direct render: Yes

@misyltoad
Copy link
Contributor Author

Same happens on my laptop.

@misyltoad
Copy link
Contributor Author

extra/vulkan-icd-loader 1.2.185-1 (109.4 KiB 393.5 KiB) (Installed)

@misyltoad
Copy link
Contributor Author

I have decided to just fix it myself GPUOpen-Drivers/xgl#125 which should at least stop people making issues and emailing me about "why won't X start, DXVK is broken".

@jinjianrong
Copy link

@Joshua-Ashton thanks for the fix!

@misyltoad
Copy link
Contributor Author

Np, it would be nice to have CTS to test this to avoid it in future with loader integration.

@soararing
Copy link

@Joshua-Ashton, we are trying to reproduce the issue today, however we can't reproduce the crash on A+N config, Vulkaninfo works well on our local, both AMD and NV are using the latest public release driver, we also try the latest Vulkan loader 1.2.187, still no crash.

I believe there is something different on your test environment, do you enable many layers when running Vulkan apps?

@soararing
Copy link

I see there are 15 Vulkan layers enabled in your test environment, we will try to enable some to make a reproduce.

@misyltoad
Copy link
Contributor Author

misyltoad commented Aug 9, 2021

No, my layer's don't matter. They're all explicit layers.

As I stated in my commit description that I mentioned on the PR when you asked,
the problem occurs if RADV is naturally ordered above AMDVLK.
This is because you are passing the caller's physicalDeviceCount down the chain.

So you're doing:
pPhysicalDeviceCount = NULL

  • RADV
  • Some other driver
  • AMDVLK
    -> returning 2 devices to the user because you filter RADV

pPhysicalDeviceCount = 2 and you pass that to the next layer so

  • RADV
  • Some Other Driver
    Then you filter out RADV because you got an incomplete list back and leave a garbage/uninitialized physical device

As stated in my commit, you MUST query ALL physical devices and THEN filter. You can't pass this down the chain.

@TimisRobert
Copy link

Hello, im having somewhat the same issue, if i don't add DISABLE_LAYER_AMD_SWITCHABLE_GRAPHICS_1=1 no vk application will work.

@jinjianrong
Copy link

Please try the new amdvlk release https://github.com/GPUOpen-Drivers/AMDVLK/releases/tag/v-2021.Q3.4 which includes a hotfix from @Joshua-Ashton

@TimisRobert
Copy link

Hello, I currently have amdvlk 2021.Q3.4-1 installed, problem still persists.

@kisak-valve
Copy link
Member

Hello @Mistooo, I think it would make sense to open a new issue report over at https://github.com/GPUOpen-Drivers/AMDVLK so that the video driver devs can track your issue properly.

@misyltoad
Copy link
Contributor Author

Out of curiosity, what GPU are you using?

@TimisRobert
Copy link

I'm using a Radeon RX 580. I don't mind setting the env variable, but since I spent a fair bit of time trying to make it work maybe someone else will find it useful.

@aeikum
Copy link
Collaborator

aeikum commented Aug 23, 2021

Does it make sense to close this?

@aeikum aeikum force-pushed the experimental_6.3 branch 2 times, most recently from 4586e99 to 790ec2c Compare August 23, 2021 20:29
@aeikum
Copy link
Collaborator

aeikum commented Sep 14, 2021

Closing. Please file new issues in AMD's tracker.

@aeikum aeikum closed this Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants