Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVIDIA GPU is not supported since 4.8 #810

Open
Thaodan opened this issue Oct 18, 2016 · 79 comments
Open

NVIDIA GPU is not supported since 4.8 #810

Thaodan opened this issue Oct 18, 2016 · 79 comments

Comments

@Thaodan
Copy link

Thaodan commented Oct 18, 2016

After updating to linux 4.8 the nvidia driver says your gpu isn't supported when trying to access with primus:

[Okt18 12:54] bbswitch: enabling discrete graphics
[  +0,926684] nvidia: module license 'NVIDIA' taints kernel.
[  +0,000001] Disabling lock debugging due to kernel taint
[  +0,529426] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=none:owns=none
[  +0,000037] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:13b0)
              NVRM: installed in this system is not supported by the 370.28
              NVRM: NVIDIA Linux driver release.  Please see 'Appendix
              NVRM: A - Supported NVIDIA GPU Products' in this release's
              NVRM: README, available on the Linux driver download page
              NVRM: at www.nvidia.com.
[  +0,000014] nvidia: probe of 0000:01:00.0 failed with error -1
[  +0,000053] nvidia-nvlink: Nvlink Core is being initialized, major device number 241
[  +0,000033] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  +0,000002] NVRM: None of the NVIDIA graphics adapters were initialized!
[  +0,000002] nvidia-nvlink: Unregistered the Nvlink Core, major device number 241

uname:
uname -a Linux hellion 4.8.2-pf #1 SMP PREEMPT Tue Oct 18 10:19:55 CEST 2016 x86_64 GNU/Linux

@bluca
Copy link
Member

bluca commented Oct 18, 2016

What GPU is that?

@Thaodan
Copy link
Author

Thaodan commented Oct 18, 2016

Nvidia Quadro M2000M (in Dell M7510)

There are users that use Manjoro Linux reporting similar issues [1]

[1] https://forum.manjaro.org/t/bumblebee-not-working-under-4-8-kernel/10676/8

@olifre
Copy link

olifre commented Oct 20, 2016

That seems to be the same I described here:
https://devtalk.nvidia.com/default/topic/971733/linux/nvidia-gtx-960m-not-supported-anymore-by-370-28-/?offset=5#4999717
In short: Until nvidia changes their driver, add pcie_port_pm=off to your kernel commandline (you will then not get everything from the new energy savings introduced by 4.8).

Note that this affects everybody who is part of the following group (see my original linked post):

Anybody using Kernel >=4.8, with a system for which the nvidia card is not the primary output (otherwise it could never enter D3), and a BIOS released >=2015. 

The most ugly thing of this story is that the kernel in any case uses pcie_port_pm=off if your BIOS / UEFI release date is <=2014.

EDIT: This also means it's not really a bumblebee issue, but a problem with kernel 4.8 and nvidia, which just happens to show up on laptops for which nvidia is not the primary card. Apparently I was the first to report this in their forums, but good to know I am not alone with this issue.

@Thaodan
Copy link
Author

Thaodan commented Oct 20, 2016

As far as I read the first commit revered in the forum post:
If the card is advertised as hotplug the pm runtime won't power it to d3 cold
and as bumblebee/optimus is somekind of a hotplug mechnism this should apply
to the card too or not?
Is it possible to set this flag in bbswitch or somewhere else?
Or would it be better to place this flag in the gpu driver?

@olifre
Copy link

olifre commented Oct 20, 2016

Or would it be better to place this flag in the gpu driver?

Definitely that. That's since this also affects machines even without bumblebee.
If you just boot without starting bumblebeed and never loading bbswitch, the nvidia driver will still fail to detect the card - so the driver claiming the device (nvidia) should take care.
I guess it is even possible to do that in a better way than claiming to be hotpluggable (which the pcie interface to which the card is connected is not really...): The nvidia driver could check whether the card is in d3cold and power it up if necessary.

@Thaodan
Copy link
Author

Thaodan commented Oct 20, 2016

So its a "bug" that needs to be fixed in nvidia/nouveau.
But the problem is that nvidia is slow.
Would it be possible to do one of the solutions either in bumblebee or
bbswitch?

@olifre
Copy link

olifre commented Oct 20, 2016

But the problem is that nvidia is slow.

AFAIK they do not even claim to support kernel 4.8 yet, so if we're lucky, it could already be in the next driver release.

Would it be possible to do one of the solutions either in bumblebee or bbswitch?

I'm not sure, but the bumblebee maintainers will know ;).

For an immediate "fix", the trick to add pcie_port_pm=off to kernel commandline is sufficient.

@Thaodan
Copy link
Author

Thaodan commented Oct 20, 2016

Ah I didn't noticed that you're not a maintainer (:

Should the dirty fix change something for the other devices in the "typical"
notebook?

@olifre
Copy link

olifre commented Oct 20, 2016

Should the dirty fix change something for the other devices in the "typical" notebook?

It could lead to slightly higher power consumption since unused PCI bridges can not enter the deepest sleep state anymore.
However, since this (pcie_port_pm=off) was in any case what was done in kernel 4.7 and before, and it's also what the kernel 4.8 still does on any machine with a BIOS released before 2015, I don't expect this is significant on a "typical" machine.
But I'm not sure whether this is also true for the most modern Skylake / Broadwell machines, nowadays things start becoming more and more sensitive and power consumption stays high unless everything on the busses is in the deepest sleep states.

At least I can promise you things won't be worse over kernel 4.7 ;).

@Lekensteyn
Copy link
Member

Lekensteyn commented Oct 21, 2016

At least I can promise you things won't be worse over kernel 4.7 ;).

Very true, the boot option reverts to the pre-4.8 behavior.

Have you (via udev rules or some other "laptop mode tools") enabled runtime PM? You can check that by reading /sys/bus/pci/devices/0000:01:00:0/power/control. If it says "auto", then it is enabled. If it is "on", then I would expect it to have the same behavior as adding the boot option.

Btw, some laptops require the new 4.8 method or else may experience memory corruption (see the commit message of https://git.kernel.org/linus/692a17dcc2922a91c6bcf11b3321503a3377b1b1).

@olifre
Copy link

olifre commented Oct 21, 2016

Have you (via udev rules or some other "laptop mode tools") enabled runtime PM? You can check that by reading /sys/bus/pci/devices/0000:01:00:0/power/control. If it says "auto", then it is enabled. If it is "on", then I would expect it to have the same behavior as adding the boot option.

Yes, runtime PM is active on that machine: I'm using laptop-mode-tools.
In addition, I have enabled PCIe-ASPM for all ports in the UEFI (it's one with unlocked features) and I'm also using "pcie_aspm=force".
Only after all this, I could achieve maximum battery runtime almost comparable to Windows on that laptop (sadly ASPM for the NIC and card reader is not working, even in Windows, so the machine saves significantly more if I turn off the full corresponding PCI port).

Btw, some laptops require the new 4.8 method or else may experience memory corruption

Thanks a lot for the link!

@Espionage724
Copy link

This doesn't seem to affect me with kernel 4.8.3 on Solus, but I don't use Bumblebee (I just pass everything to the NV GPU with xrandr):

Linux spinesnap 4.8.3 #1 SMP Thu Oct 20 11:50:13 UTC 2016 x86_64 GNU/Linux

[    9.786656] nvidia: loading out-of-tree module taints kernel.
[    9.786660] nvidia: module license 'NVIDIA' taints kernel.
[    9.793430] nvidia 0000:01:00.0: enabling device (0006 -> 0007)
[    9.793562] nvidia-nvlink: Nvlink Core is being initialized, major device number 247
[    9.793574] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  370.28  Thu Sep  1 19:45:04 PDT 2016
[    9.806149] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  370.28  Thu Sep  1 19:18:48 PDT 2016
[    9.814270] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[   11.790229] nvidia-modeset: Allocated GPU:0 (GPU-4d140bcd-71e6-3bb9-ed1c-033e9cf5bec2) @ PCI:0000:01:00.0
[   11.790322] nvidia-modeset: Freed GPU:0 (GPU-4d140bcd-71e6-3bb9-ed1c-033e9cf5bec2) @ PCI:0000:01:00.0

I have a GTX 960M in a Acer Aspire V Nitro VN7-792G laptop.

@Thaodan
Copy link
Author

Thaodan commented Oct 21, 2016

do you use nvidia prime? When yes you wont get the issue as your egpu isn't disabled before using it.

@Lekensteyn
Copy link
Member

Even if you use nvidia, you might still run into issues if you enable runtime PM for all devices using a udev rule or using "laptop mode tools" before the nvidia driver is loaded (and have kernel 4.8+ without the pcie_port_pm=off parameter and a new enough laptop).

@Thaodan
Copy link
Author

Thaodan commented Oct 24, 2016

Using the pm-rework branch which enables using runtime pm is bbswitch
worksaround the issue that the nvidia driver doesn't handle runtime pm.

Could someone with an older laptop that doesn't use runtime pm and someone
else with an newer laptop if the newer version works/fixes the issue?

@Lekensteyn
Copy link
Member

That branch is unfinished, last time I was working on it there was still an Oops somewhere. If you have no NVIDIA HDMI audio device, then it might be safe to use though (revert Bumblebee-Project/bbswitch@e0c6859 just to be sure).

@Thaodan
Copy link
Author

Thaodan commented Oct 25, 2016

Using the pm-rework branch which enables using runtime pm is bbswitch
worksaround the issue that the nvidia driver doesn't handle runtime pm.

Could someone with an older laptop that doesn't use runtime pm and someone
else with an newer laptop if the newer version works/fixes the issue?

@olifre
Copy link

olifre commented Oct 25, 2016 via email

@kmare
Copy link

kmare commented Nov 1, 2016

Fedora 24 just updated to the 4.8.4 kernel. I'm using the bumblebee fedora repo, updated as normal and everything seems to be fine. What exactly is NOT supposed to be working with the 4.8 kernel?

@olifre
Copy link

olifre commented Nov 1, 2016

What exactly is NOT supposed to be working with the 4.8 kernel?

To trigger the issue, you have to have:

  • A BIOS released >=2015 (the "year" discoverable with dmidecode counts). That's since the kernel only activates PCI port power management in this case. Alternatively, use pcie_port_pm=force.
  • You likely need PCI runtime power management activated, e.g by using laptop-mode-tools (never tested without).

@seekermoc
Copy link

seekermoc commented Nov 2, 2016

I'm having the same problem on my laptop with a GTX 970M. After updating the kernel from 4.7.9 to kernel 4.8.4 on Fedora 24, bumblebee's proprietary driver wouldn't install. I didn't think anything of it, as the same thing happened when I upgraded from 4.6 to 4.7, but it was then fixed about a week later with an updated bumblebee-nvidia package.

Today, about a week after Fedora 24 moved to the 4.8 kernel, a new bumblebee-nvidia package was released. I expected it to fix my problem, but the nvidia module still won't install. Running it with the --debug flag, it output this:


-> Kernel module compilation complete.
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: No such device
-> Kernel messages:
[ 2203.500272] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:13d8)
               NVRM: installed in this system is not supported by the 367.57
               NVRM: NVIDIA Linux driver release.  Please see 'Appendix
               NVRM: A - Supported NVIDIA GPU Products' in this release's
               NVRM: README, available on the Linux driver download page
               NVRM: at www.nvidia.com.
[ 2203.500285] nvidia: probe of 0000:01:00.0 failed with error -1
[ 2203.500319] nvidia-nvlink: Nvlink Core is being initialized, major device number 241
[ 2203.500341] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 2203.500341] NVRM: None of the NVIDIA graphics adapters were initialized!
[ 2203.500342] nvidia-nvlink: Unregistered the Nvlink Core, major device number 241
[ 2203.500477] NVRM: NVIDIA init module failed!

@olifre
Copy link

olifre commented Nov 2, 2016

Running it with the --debug flag, it output this:

I expect your issue will vanish if you add (as discussed previously) pcie_port_pm=off to your kernel commandline which reverts PCIe port power saving behaviour to the pre-4.8 state.

A better fix could be done by nvidia, or a different workaround could be implemented in bumblebee (but since the issue can also be reproduced without bumblebee, I rather think the correct fix should enter the nvidia binary blob).

@seekermoc
Copy link

seekermoc commented Nov 3, 2016

Ok, I added the kernel parameter to /etc/default/grub and ran "grub2-mkconfig -o /boot/grub2/grub.cfg" rebooted, and tried again, but I'm still getting the same error. Is there a way to verify whether the command line edit took effect?

Edit: Nevermind, I must have messed something up. I manually added the pcie_port_pm=off to the GRUB command line during boot and now it works fine. Thanks for the help.

@Thaodan
Copy link
Author

Thaodan commented Nov 3, 2016 via email

@seekermoc
Copy link

Thanks. It's showing that for whatever reason grub2-mkconfig isn't actually editing my commandline, but that it works when I manually add it during boot.

That said, even though that will allow me to load the nvidia module, and running bumblebee-nvidia --check shows that everything is working, I can't actually open anything using optirun or primusrun. When I do I get the following error (with or without pcie_port_pm=off in the commandline):

[seeker@ ~]$ primusrun glxgears
primus: fatal: Bumblebee daemon reported: error: [XORG] (EE) /dev/dri/card0: failed to set DRM interface version 1.4: Permission denied

[seeker@ ~]$ optirun glxgears
[   52.986505] [ERROR]Cannot access secondary GPU - error: [XORG] (EE) /dev/dri/card0: failed to set DRM interface version 1.4: Permission denied

[   52.986543] [ERROR]Aborting because fallback start is disabled.

@seekermoc
Copy link

seekermoc commented Nov 4, 2016

Regardless of whether I add the pcie bit to the kernel command line, running cat /sys/bus/pci/devices/0000\:01\:00.0/power/control returns "auto" either way.

Also, when I add the pcie line, and try to primusrun or optirun, the Nvidia GPU will turn on (even though I receive the error and the glxgears won't open) and won't shut back off again. I can see when the Nvidia GPU is on or off from a LED on my laptop. Without the extra pcie line, the Nvidia GPU shuts back off after I receive the error (as it should).

Edit: Thaodan mentioned using a newer bbswitch. Does such a thing exist somewhere that I can try, or was that a theoretical comment that a future version may fix the issue?

@seekermoc
Copy link

dmesg gives the following output:

[   15.258289] nvidia: loading out-of-tree module taints kernel.
[   15.258294] nvidia: module license 'NVIDIA' taints kernel.
[   15.261356] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[   15.266352] nvidia 0000:01:00.0: enabling device (0006 -> 0007)
[   15.266522] nvidia-nvlink: Nvlink Core is being initialized, major device number 242
[   15.266533] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  367.57  Mon Oct  3 20:37:01 PDT 2016
[   15.285279] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  367.57  Mon Oct  3 20:32:57 PDT 2016
[   15.303408] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[   15.547166] bbswitch: version 0.8
[   15.547171] bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PCI0.GFX0
[   15.547176] bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PCI0.PEG0.PEGP
[   15.547274] bbswitch: detected an Optimus _DSM function
[   15.547283] bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is on
[   15.550748] [drm] [nvidia-drm] [GPU ID 0x00000100] Unloading driver
[   15.578716] nvidia-modeset: Unloading
[   15.633877] nvidia-nvlink: Unregistered the Nvlink Core, major device number 242
[   15.646869] bbswitch: disabling discrete graphics
[ 6704.175466] bbswitch: enabling discrete graphics
[ 6720.390241] bbswitch: enabling discrete graphics
[ 6728.181651] bbswitch: enabling discrete graphics

@Lekensteyn
Copy link
Member

@seekermoc The power/control state is not affected by pcie_port_pm=off. When the latter opiton is given, enabling runtime PM for the port (something like 00:01.0, not 01:00:0) will have no observable effect.

There is a bbswitch branch (pr-rework for example), but these are not suitable for use, it can cause an Oops last time I was working on it. At that time I shifted priority to nouveau because that was easier to fix.

@Thaodan
Copy link
Author

Thaodan commented Nov 5, 2016

I'm using the branch without the commit mentioned and it works fine without the
Oops.
My device is a dell precision m7510.

I have the gpu enable on exit on in bumblebeed and stop and start it after
every suspend.

@arcivanov
Copy link

Bump, same issue here Dell Precision 7510, Quadro M2000M, BIOS 1.8.3 (Oct 2016), Fedora 24, kernel 4.8.6-201.

The driver doesn't even install properly since the kernel module doesn't load.

   make[2]: Leaving directory '/usr/src/kernels/4.8.6-201.fc24.x86_64'
   make[1]: Leaving directory '/usr/src/kernels/4.8.6-201.fc24.x86_64'
-> done.
-> Kernel module compilation complete.
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: No such device
-> Kernel messages:
[   73.757880] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:13b0)
               NVRM: installed in this system is not supported by the 367.57
               NVRM: NVIDIA Linux driver release.  Please see 'Appendix
               NVRM: A - Supported NVIDIA GPU Products' in this release's
               NVRM: README, available on the Linux driver download page
               NVRM: at www.nvidia.com.
[   73.757885] nvidia: probe of 0000:01:00.0 failed with error -1
[   73.757897] nvidia-nvlink: Nvlink Core is being initialized, major device number 238
[   73.757906] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   73.757906] NVRM: None of the NVIDIA graphics adapters were initialized!
[   73.757907] nvidia-nvlink: Unregistered the Nvlink Core, major device number 238
[   73.757964] NVRM: NVIDIA init module failed!
[  180.055104] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=none,decodes=none:owns=none
[  180.055158] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:13b0)
               NVRM: installed in this system is not supported by the 367.57
               NVRM: NVIDIA Linux driver release.  Please see 'Appendix
               NVRM: A - Supported NVIDIA GPU Products' in this release's
               NVRM: README, available on the Linux driver download page
               NVRM: at www.nvidia.com.
[  180.055166] nvidia: probe of 0000:01:00.0 failed with error -1
[  180.055194] nvidia-nvlink: Nvlink Core is being initialized, major device number 238
[  180.055211] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  180.055212] NVRM: None of the NVIDIA graphics adapters were initialized!
[  180.055213] nvidia-nvlink: Unregistered the Nvlink Core, major device number 238
[  180.055325] NVRM: NVIDIA init module failed!
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

@gsgatlin
Copy link

Ok. Yeah. So you know your custum version is getting loaded. I guess you can try

systemctl restart bumblebeed.service

to see if it made anything better.

@seekermoc
Copy link

That at least turns the dGPU back off (and enables me to manually switch it on and off again), but I still can't use primus/optirun.

@edoantonioco
Copy link

edoantonioco commented Nov 14, 2016

Latest kernel (4.8.7-1 on Manjaro) fixed this issue on my case.
Edit: it did not, now Im using the workaround provided previously

@olifre
Copy link

olifre commented Nov 14, 2016

Latest kernel (4.8.7-1 on Manjaro) fixed this issue on my case.

I can confirm that for me, the problem persists with 4.8.7 on Gentoo Linux.
There's also nothing in the (vanilla) kernel logs about any power management related changes, so I don't see how a kernel update could change the issue (the only "fix" would be if the newer kernel reverted the change, or for some reason power management got broken in the update).
pcie_port_pm=off again works around the issue.

@kmare
Copy link

kmare commented Nov 18, 2016

Nvidia just released a new set of drives (375.20). Can anyone report back if it solved the issue?
http://www.nvidia.com/download/driverResults.aspx/111596/en-us

@Thaodan
Copy link
Author

Thaodan commented Nov 18, 2016 via email

@olifre
Copy link

olifre commented Nov 18, 2016

For the bumblebee way using the gpu bbswitch needs to extended (see pm rewrite for a good start).

Agreed - but I still think nvidia needs to fix something, too, since even if bumblebee is disabled and bbswitch is blacklisted, if the nvidia card is not actively used on boot and thus the kernel disables the PCI-port, the nvidia driver itself will not reactivate the port if modprobing it (as I have described in the nvidia forums). Depending on the timing during boot (activating nvidia-persistenced of course might help...), I believe this also prevents prime etc.

I haven't tested 375.20 yet, though.

@putterson
Copy link

I have just tested with nvidia 375.20 and the issue is certainly fixed for me.
I am running arch with kernel 4.8.8-2-ARCH on a Dell XPS 9550.
I have bbswitch loaded and I am not passing 'pcie_port_pm=off' to my kernel (which I used before to work around this issue.)

I'm not sure if the fix is with kernel 4.8.8 or with the nvidia drivers but if anybody would like me to test anything I'd be happy to oblige.

@seekermoc
Copy link

seekermoc commented Nov 20, 2016

I can confirm that this issue is fixed with me as well with the 375.20 drivers. I switched from the "managed" to "unmanaged" Fedora repo and downloaded the 375.20 drivers. They installed perfectly on kernel 4.8.8 with the default bbswitch version from the repo (not the pm_rework version). Everything works normally now, including optirun and primusrun.

Edit: I was mistaken, I did still have pcie_port_pm=off in my cmdline. I tried removing it, and primus/optirun stopped working, so you do still need the workaround, but at least bumblebee works again.

@seekermoc
Copy link

For fun, I tried using the pm_rework version of bbswitch to see if it would work without the pcie_port_pm=off workaround, but it did not work.

For me, bottom line is that drivers 375.20, default repo bbswitch, and pcie_port_pm=off now works in full.

Due to this, I think this may have been two separate problems that occurred concurrently. First, kernel 4.8 requires the pcie pm workaround. Second, for primus/optirun to work it requires the 375.20 drivers (possibly because 375.20 adds support for xorg 1.19, and Fedora updated us to xorg 1.19 around the same time as kernel 4.8).

@gsgatlin
Copy link

@seekermoc Thanks a lot for the info. I will try to update the managed version tomorrow. Sorry for the delay. Sometimes I miss these nvidia updates. I'm still on fedora 23.

@kmare
Copy link

kmare commented Nov 20, 2016

@gsgatlin thank you for your work! do you think you'll have the repo updated for fedora 25 when it comes out in a few days?

@seekermoc
Copy link

@gsgatlin No problem, thanks for all your help.

@gsgatlin
Copy link

@kmare Yes. I will update everything (centos 6,7.fedora 23,24,25,26) at the same time. I still need to test fedora 25 though.

@olifre
Copy link

olifre commented Nov 21, 2016

Just to confirm the general picture: With 375.20, I can still reproduce the original problem (unless I add pcie_port_pm=off). After all, 375.20 claims only to have fixed the (independent) issue of incompatiblity with Xorg 1.19 (which I don't use yet in any case).

@kmare
Copy link

kmare commented Dec 16, 2016

While it's not explicitly listed in the changelog, could the new driver update 375.26 have fixed the problem mentioned here? Has anyone tried it?

https://devtalk.nvidia.com/default/topic/981831

@edoantonioco
Copy link

Just to confirm than this also happen on the latest stable nvidia 375.26 on kernel 4.9. Once I start the pc I can use the dedicated card without any problem. The way to reproduce this bug on my laptop is just to close the lid (send it to sleep) and start using the pc again. Now bumblebee cant use the nvidia card.

@anolting
Copy link

anolting commented Jan 2, 2017

Hi all,

I'm still having this problem with 4.9 and 375.26 on openSUSE Tumbleweed. If I'm forcing the kernel to switch back the PM method the laptop starts up into rl3 and freezes before I can login.

The Laptop is a DELL Inspiron 15 Series 7000 (7559) with a GTX960M.

Thanks
Alex

@jramapuram
Copy link

jramapuram commented Jan 7, 2017

On kernel 4.9 and using pcie_port_pm=off allows proper usage of bumblebee, however my external display is not detected. I am powering that via a thunderbolt 3 --> thunderbolt 2 adaptor. It generally lists as DP1 when using pcie_port_pm=on however is not at all listed otherwise. I have tried using intel-virtual-output as stated here to no avail.

Edit: realized this was due to another issue and not bumblebee; tb needs to be set to legacy mode to work properly in linux

@JohnOShock
Copy link

I'm using a laptop with an Nvidia 940M optimus... was using Bumblebee to switch but after upgrading the kernel to 4.8 and later 4.9 I experienced crashes .... poor start up and shutdown times... all of which stopped when I revert back to using the intel card only with Nouveau.. I am on openSUSE Tumbleweed. Someone suggested I use the proprietary driver only 375.26 with PRIME but it did not really solve anything. Plasma desktop wouldn't start at all

@jrupinski
Copy link

I still get this error:

ERROR: Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat Linux systems, for example, be sure you have the 'kernel-source' or 'kernel-devel' RPM installed. If you know the correct kernel source files are installed, you may specify the kernel source path with the '--kernel-source-path' command line option.

When I run sudo bumblebee-nvidia --debug on Fedora 25.
I installed managed repo using this guide:
https://fedoraproject.org/wiki/Bumblebee#Using_bumblebee_software

Kernel:
4.9.11-200.fc25.x86_64

Hardware:
Lenovo y510p
CPU: i5 4200m
GPU: Nvidia GT 755M

@gsgatlin
Copy link

@rupek1995 Do you have the kernel-devel package installed and is it the same version as your running kernel? (uname -r)

@seekermoc
Copy link

seekermoc commented Feb 28, 2017 via email

@jrupinski
Copy link

jrupinski commented Mar 1, 2017

EDIT: After rebooting for the second time system froze for about 30 seconds, and after I logged in there were SELinux errors about systemd, bbswitch, nvidia.ko and gnomeshell. Will removing SELinux fix this?

There was a kernel update after my post, I updated it, managed to successfully download the kernel-devel for my kernel, deleted kernel-debug-devel just in case. Nvidia driver unpacks... but now installation prints out an error like this:

`> -> Kernel module compilation complete.

ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: Permission denied
-> Kernel messages:
[ 68.542964] iwlwifi 0000:08:00.0: Radio type=0x2-0x0-0x0
[ 68.595688] IPv6: ADDRCONF(NETDEV_UP): wlp8s0: link is not ready
[ 69.121635] wlp8s0: authenticate with 18:a6:f7:65:30:b4
[ 69.125025] wlp8s0: send auth to 18:a6:f7:65:30:b4 (try 1/3)
[ 69.127213] wlp8s0: authenticated
[ 69.128834] wlp8s0: associate with 18:a6:f7:65:30:b4 (try 1/3)
[ 69.133463] wlp8s0: RX AssocResp from 18:a6:f7:65:30:b4 (capab=0x431 status=0 aid=2)
[ 69.158037] wlp8s0: associated
[ 69.158084] IPv6: ADDRCONF(NETDEV_CHANGE): wlp8s0: link becomes ready
[ 71.708158] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[ 75.815808] Netfilter messages via NETLINK v0.30.
[ 75.846240] ip_set: protocol 6
[ 95.459594] tun: Universal TUN/TAP device driver, 1.6
[ 95.459595] tun: (C) 1999-2004 Max Krasnyansky [email protected]
[ 95.499851] virbr0: port 1(virbr0-nic) entered blocking state
[ 95.499854] virbr0: port 1(virbr0-nic) entered disabled state
[ 95.499942] device virbr0-nic entered promiscuous mode
[ 96.510578] virbr0: port 1(virbr0-nic) entered blocking state
[ 96.510580] virbr0: port 1(virbr0-nic) entered listening state
[ 98.320456] virbr0: port 1(virbr0-nic) entered disabled state
[ 332.816405] mce: [Hardware Error]: Machine check events logged
[ 794.335093] fuse init (API version 7.26)
[ 796.061920] Bluetooth: RFCOMM TTY layer initialized
[ 796.061928] Bluetooth: RFCOMM socket layer initialized
[ 796.061981] Bluetooth: RFCOMM ver 1.11
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.`

@xen0f0n
Copy link

xen0f0n commented Mar 22, 2017

pcie_port_pm=off works for me:

Dell 3542, Fedora 25, kernel 4.9, GeForce 840M

@jrupinski
Copy link

@xen0f0n Thanks for the tip! I already managed to get it to work by changing the SELinux to permissive mode.

If anyone has this problem and pcie_port_pm=off doesn't work for you, you can try SELinux method:

  • Update kernel to newest version
  • Set SELinux to permissive mode (sudo dnf install /usr/bin/system-config-selinux* - using this tool)
  • Reboot Fedora twice - on second reboot bumblebee should install during login (that's why it might freeze for about a minute or two)
  • Check if it works - use bumblebee-nvidia --check
  • ???
  • PROFIT

@mcku
Copy link

mcku commented Feb 27, 2018

Hi, instead of disabling runtime PCI power management, would it be OK to selectively enable PCI runtime power management through udev? The following works fine fo far:

First, get the device ids using lspci -k.
By trial and error, disabling power management for

00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 05)
	Subsystem: ASUSTeK Computer Inc. Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 05)
	Kernel driver in use: pcieport
	Kernel modules: shpchp
01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] (rev a1)
	Subsystem: ASUSTeK Computer Inc. GP107M [GeForce GTX 1050 Mobile]
	Kernel driver in use: nvidia
	Kernel modules: nouveau, nvidia_drm, nvidia

was sufficient.
This was possible by the following workaround:

/etc/udev/rules.d/pci_pm.rules
# use lspci -k to get bus ids

#ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:00.0", ATTR{power/control}="auto"
#ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:01.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:02.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:08.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:14.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:14.2", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:15.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:16.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:17.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:1c.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:1c.3", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:1c.6", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:1d.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:1f.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:1f.2", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:1f.3", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:00:1f.4", ATTR{power/control}="auto"
#NVIDIA 
#ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:01:00.0", ATTR{power/control}="off"

ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:02:00.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:03:00.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:04:00.0", ATTR{power/control}="auto"
ACTION=="add", SUBSYSTEM=="pci", KERNELS=="0000:05:00.0", ATTR{power/control}="auto"

Using powertop, I could verify that other PCI devices appear to be power managed. And laptop battery usage is almost as good as PM was fully enabled.

If there is an easier way to do this, without going through the bus ids etc, life would be easier. But now I can use the laptop for coding and gaming, without rebooting in between, and with power management enabled..
Please advise if anything is missing or wrong. Thanks..

@arcivanov
Copy link

Actually I can tell you that this works for me.

Run sudo ~/nvidia_reset.sh. And the driver compiles and installs.

bash-4.4$ cat ~/nvidia_reset.sh 
#/bin/bash -eEu

systemctl stop bumblebeed
systemctl stop bumblebee-nvidia
rmmod nvidia
rmmod bbswitch
echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
echo 1 > /sys/bus/pci/devices/0000:00:01.0/rescan
modprobe bbswitch
systemctl start bumblebeed
systemctl start bumblebee-nvidia

@azat
Copy link

azat commented Nov 20, 2019

If there is an easier way to do this, without going through the bus ids etc, life would be easier

SUBSYSTEM!="pci", GOTO="pci_end"
ACTION!="add", GOTO="pci_end"
# Disable PM for NVIDIA to overcome "issue" in the nvidia driver
KERNELS=="0000:01:00.0", GOTO="pci_end"
TEST=="power/control", ATTR{power/control}="auto"
LABEL="pci_end"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests