-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try harder to attach to an existing tmux session #4231
Conversation
e6d1b7e
to
830411c
Compare
I could reliably reproduce the tmux bug using prod VMs: setting the host to sleep or disconnecting the network would very reliably trigger the bug. I can confirm that for instances provisioned on this branch, this appears to fix the problem! 🎉 However, the fix only gets applied for new installs, or installs on which the installer was run, as the bashrc additions are added to the servers as part of the Our documentation currently does not instruct users to run [0] : https://docs.securedrop.org/en/latest/upgrade/xenial_upgrade_in_place.html |
Agreed with @emkll's concerns: in order for this change to be useful to Admins performing the Trusty -> Xenial upgrade, we want the helper functions in place before the upgrade is started—ideally without requesting yet another run of @rmol Take a stab at porting this new logic to the Those customizations, with the improvements you're presenting here, should be taken out of the Ansible logic and placed in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes to convert solution from Ansible (which requires an install
run prior to changes taking effect) to securedrop-config
packaging logic (which will be applied automatically as part of the 0.12.1 release).
Moved Also confirmed that having the existing file in |
15ac1df
to
8502ca9
Compare
Beginning local test. Given that the change is implemented via debian package modifications, we must use staging to test. Intend to use the following test plan:
Will report back when testing is done. |
Finished test plan. Can confirm I'm able to reconnect to tmux after a service bounce on the App Server. Encountered a few test failures that are quick fixes, so I'll comment in-line with some pointers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Packaging logic refactor looks great. Requesting minor changes from @rmol:
- a few config tests are failing—see comments in-line
Also requesting additional review, specifically via the upgrade
scenario. Given the plan to have these changes land in 0.12.1 in support of Xenial upgrades, we should also make sure to run upgrade tests from Trusty 0.12.0 -> Trusty 0.12.1~rc1. As written, the packaging logic should simply clobber the target file (that was previously configured via Ansible), but we must confirm that.
(Additional context: in modern debian package logic, any file in /etc/
will automatically be marked a conffile
, which would not only not clobber, but would break unattended upgrades. Due to the legacy packaging logic we're using for securedrop-config
, this not relevant. In support of guarding against regressions, it's worth considering adding a packaging config test to validate that the securedrop-config
file has 0 conffiles
.)
@@ -4,7 +4,7 @@ Priority: optional | |||
Maintainer: SecureDrop Team <[email protected]> | |||
Homepage: https://securedrop.org | |||
Package: securedrop-config | |||
Version: 0.1.2+0.13.0~rc1 | |||
Version: 0.1.3+0.13.0~rc1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version bump here is absolutely correct. It requires matching changes in other places, though:
- group_vars for staging (already handled)
- config tests vars (specifically, see
molecule/builder-trusty/tests/vars.yml
)
The rationale here is that we want to ensure we're installing exactly a certain version, and also running the package checks on exactly that same version. By default, multiple package versions can pile up in build/
. We could automatically run rm -rf build/*
to avoid the pile-up, but that seems a bit heavy-handed for the dev env.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed config_version
in vars.yml.
|
||
assert 'test -z "$TMUX" && (tmux attach || tmux new-session)' in f.content | ||
assert f.contains(tmux_check) | ||
assert host_file.sha256sum == h.hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your motivation to DRY out the config tests here is understandable, but most of the config tests are intentionally duplicative, to enable us to refactor the provision logic confidently. For example, if you modify install_files/securedrop-config/etc/profile.d/securedrop_additions.sh
locally, then build packages and install them, you should reasonably expect the corresponding config tests to fail. These changes would cause them to pass.
A broader discussion is warranted here about the pattern used for hardcoding values in the config tests. For now, simply for the sake of consistency, I'd prefer to keep them. Even a multiline string asserting an exact match on host_file.contents
or host_file.content_string
would be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted to comparing the exact current content.
if test -z "$TMUX" | ||
then | ||
(tmux attach || tmux_attach_via_proc || tmux new-session) | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not requesting additional comments in this file because the logic is cleanly presented and the corresponding commit message is top-notch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a good not-request, though. No one should have to go find my top-notch commit message when trying to understand this. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make staging-trusty
failed with the following error.
TASK [reboot-if-first-install : Wait for server to come back.] *****************
fatal: [mon-staging -> localhost]: FAILED! => {"changed": false, "elapsed": 301, "msg": "Timeout when waiting for search string OpenSSH in 192.168.121.57:22"}
fatal: [app-staging -> localhost]: FAILED! => {"changed": false, "elapsed": 301, "msg": "Timeout when waiting for search string OpenSSH in 192.168.121.163:22"}
@kushaldas The wait timeout you describe should only occur if you've overridden |
If in the course of an ssh/tmux session the tmux package is upgraded and the ssh connection broken, the next ssh attempt will encounter an error because of the tmux protocol version mismatch. The old tmux can be used to reattach by searching for the tmux process under /proc and using its reference to the old tmux executable. This change attempts to do that automatically if a first "tmux attach" attempt fails, but we can see that a previous session still exists.
Comment the tmux_attach_via_proc function in securedrop_additions.sh. Fix version check in builder-trusty/tests/vars.yml. Fix test_sudoers_config so that it requires explicit content in securedrop_additions.sh, instead of just checking that the installed file exactly matches whatever's in the current revision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes and the rebase @rmol ! This looks good to merge from me 👍
Since I've tested and confirmed the functionality in #4231 (comment), I focused on testing the refactoring into the deb packages using the following test plan:
- Upgrade testing (xenial) confirm that
/etc/profile.d/securedrop_additions.sh
has been correctly updated - Upgrade testing (trusty) confirm that
/etc/profile.d/securedrop_additions.sh
has been correctly updated - Ensure CI passes, specifically
test_sudoers_config
- Inspect securedrop-config package and ensure securedrop_additions.sh is marked as conffile
None of the files in /etc/ are marked as conffiles per the deb package, so should be squashed for each upgrade, though this may change once the compat level is changed.
$ dpkg-deb -I build/trusty/securedrop-config-0.1.3+0.13.0~rc1-amd64.deb
new debian package, version 2.0.
size 3194 bytes: control archive=2325 bytes.
311 bytes, 10 lines control
5341 bytes, 135 lines * postinst #!/bin/sh
Source: securedrop
Section: web
Priority: optional
Maintainer: SecureDrop Team <[email protected]>
Homepage: https://securedrop.org
Package: securedrop-config
Version: 0.1.3+0.13.0~rc1
Architecture: all
Description: Establishes baseline system state for running SecureDrop.
Configures apt repositories.
I've also appended 690f0e3 to this branch to add some build-time tests.
At build time, let's ensure: - no conffiles are present so that files in /etc are properly squashed - securedrop-config contains the expected files
New test report from @emkll looks great, and the additional config tests are a mighty fine addition. Running the build logic locally to confirm all tests passing, then will mark approved and merge. |
Sufficient review provided by other maintainers, problem noted was artifact of test plan
Status
Ready for review
Description of Changes
If in the course of an ssh/tmux session the tmux package is upgraded and the ssh connection broken, the next ssh attempt will encounter an error because of the tmux protocol version mismatch.
The old tmux can be used to reattach by searching for the tmux process under /proc and using its reference to the old tmux executable. This change attempts to do that automatically if a first "tmux attach" attempt fails, but we can see that a previous session still exists.
Fixes #4221.
Testing
This is not deterministically reproducible, but I've had it happen on at least one of the servers during each upgrade from Trusty to Xenial, provided they are set up to use SSH over Tor. It seems more likely to occur on the app server, so try running
securedrop-admin install
with this branch, then executesudo do-release-upgrade
on the app server first. When you reconnect after libssl et al. are upgraded and Tor restarted, you should automatically resume the tmux session. If instead you get an error about the protocol mismatch, the fix hasn't worked.Checklist
If you made changes to the system configuration:
If you made non-trivial code changes: