-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latest kernel shows problems on some hardware #590
Latest kernel shows problems on some hardware #590
Comments
In order to increase the metapackage version without changing the required kernel image, based on https://www.debian.org/doc/manuals/maint-guide/first.en.html#namever, it looks like should update the version https://github.com/freedomofpress/securedrop-debian-packaging/blob/e7d5bea3f2eb6bbbc7ad76772ec42b4610830916/securedrop-workstation-grsec/debian/changelog-buster#L1 from
|
Test package is live on apt-test: https://apt-test.freedom.press/pool/main/s/securedrop-workstation-grsec/ . Tried to QA on local hardware, but mistakenly used "prod" environment, so the problem persists—because the packages didn't change. To proceed with testing, I will
That's not a perfect test of the updater scenario, since the |
Tested after converting to staging. The end result is that my VMs are working again, although I had to run the updater twice in order to get full coverage. See screenshots below. After running --apply (to convert to staging repos)After running the updater manuallyAfter running the updater manually a second timeAt no point did I reboot the host machine. So it looks like we have a resolution, but it didn't resolve for me entirely on the first pass. I recommend proceeding with release, and preparing support language for pilot participants that recommends 1) re-running the updater manually (with |
I've updated my workstation with the latest metapackage served by apt-test ( I was unable to reproduce the underlying issue, but did not observe any screen artifacts, nor any regressions after doing some quick basic client testing (login, export, open-in-dvm, reply) |
Same on T480:
So can confirm no regression from new metapackage, cannot confirm whether it resolves the original issue, since this laptop never had it, and the one that I have which did (X1) has already been fixed via dpkg-reconfigure. |
Thanks, folks. I'm going to proceed with preparing a prod artifact and submit for review. After doing so, I'll work on reverting my test hardware to prod, in an attempt to re-break it, so I can dig a bit more deeply on the resolution behavior. |
I reinstalled my X1, which did exhibit the issue previously, on latest prod (it finished when the new package was already up). (I did |
This appears to have been resolved via freedomofpress/securedrop-builder#179 and the associated updated package https://github.com/freedomofpress/securedrop-debian-packages-lfs/pull/30 . We've agreed to do more structured kernel testing next time around, at which time we may also want to investigate this paxctld logic further. |
We've seen this problem crop up again. Made a quick script in an attempt to repro it locally: https://gist.github.com/conorsch/9c5f4e69798200d069fe43f4d5ab4e76 That script is very naive: it just repeatedly installs the old kernel and the new one, back and forth, checking for module errors in syslog every time. After a 1000 iterations, no repro. Given the naive approach, that's not terribly surprising—the next step was to test with startup/shutdowns of the VM each time, to mimick more closely how updates land in prod VMs. After rebooting the test VM in which the loop had been running, however, I discovered that I did indeed have a repro:
Note that's the older kernel, not 4.14.186 when we first observed this problem. Will back up the VM image with the failure in it and investigate further. |
We've received more reports of this issue in the wild. We're certain the problem correlates with a missing u2mfn kernel module. The variable nature of the failure strongly implies a race condition. After exploring in a test VM, it appears that we're inadvertently calling So, if updates were only recently applied for the folks reporting the error, then we have a reasonable explanation for why the problem is appearing again. More testing required to determine whether a single run of dkms autoinstall is sufficient for reliable configuration of the modules. |
Ran through an updater scenario on test hardware, after intentionally breaking the GUI in There was a small surprise in my testing because I'd inadveterently used a prod setup on hardware, and the metapackage is only available on staging. After a full update run did not resolve the problem, and indeed had not even resulted in the new metapackage being installed, I switched the BeforeAfter |
After we shipped 4.14.186 kernels as part of #546, we received a report of distorted graphics after upgrading. The behavior described was quite similar to that documented in #308 (comment)
While we didn't catch the issue during QA, I was able to reproduce on test-only hardware. After reviewing logs, the problem seems to correlate with this event in syslog:
It appears that the
dkms autoinstall
line in the postinst forsecuredrop-workstation-grsec
https://github.com/freedomofpress/securedrop-debian-packaging/blob/e7d5bea3f2eb6bbbc7ad76772ec42b4610830916/securedrop-workstation-grsec/debian/postinst#L42 is failing, but still exiting zero—so apt/dpkg didn't consider it an error. As a workaround, I was able to rebuild the dkms projects after bouncing paxctld and the situation was resolved. Let's try updating the postinst logic to restart the paxctld service before runningdkms autoinstall
.Detailed dom0 logs
No hypothesis yet on why this change seems to affect only certain hardware. On the test laptop where I reproduced it, all SDW-based templates were affected:
Additionally we should investigate whether it's possible to cause
dkms autoinstall
to fail loudly, which would have notified the user about problems during the updater run.The text was updated successfully, but these errors were encountered: