Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[xenial] gpg 2.1+ compliance + fix decryption of journalist replies #4094

Merged
merged 10 commits into from
Feb 7, 2019

Conversation

redshiftzero
Copy link
Contributor

@redshiftzero redshiftzero commented Feb 1, 2019

Status

Ready for review

Description of Changes

Fixes #3622
Fixes #4013

Changes proposed in this pull request:

  • This PR makes SecureDrop work with gpg 2.1, while preserving functionality for earlier versions of gpg used on trusty (the gpg config options passed to the gpg binary will now vary based on the version of gpg that is used on the server). gpg 2.1 introduced some other significant changes, the most important of which is now requiring the use of gpg-agent.

thoughts welcome on this as I had to sidestep some API weirdness / bugs on the gpg side which I note in the commit messages

Testing

  • All xenial app test should now pass (and after this PR is merged we will make that a required CI job)
  • All trusty app tests should also still pass
  • In the xenial staging environment, a source should now be able to decrypt journalist replies
  • In the xenial dev environment (BASE_OS=xenial make dev), a source should now be able to decrypt journalist replies
  • In the trusty dev environment (make dev), a source should still be able to decrypt journalist replies
  • In the trusty staging environment, a source should still be able to decrypt journalist replies

Deployment

New installs and upgrades both will have the now required gpg-agent conf (for gpg 2.1+) written to disk via postinst of securedrop-app-code. A reboot will then occur.

Checklist

If you made changes to the server application code:

  • Linting (make ci-lint) and tests (make -C securedrop test) pass in the development container

If you made changes to the system configuration:

If you made non-trivial code changes:

  • I have written a test plan and validated it for this PR

kushaldas
kushaldas previously approved these changes Feb 4, 2019
Copy link
Contributor

@kushaldas kushaldas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • All xenial app test should now pass (and after this PR is merged we will make that a required CI job)
  • All trusty app tests should also still pass
  • In the xenial staging environment, a source should now be able to decrypt journalist replies
  • In the xenial dev environment (BASE_OS=xenial make dev), a source should now be able to decrypt journalist replies
  • In the trusty dev environment (make dev), a source should still be able to decrypt journalist replies
  • In the trusty staging environment, a source should still be able to decrypt journalist replies

Finally 👍 💯 🥇

@redshiftzero Though this will need a rebase.

install_files/securedrop-app-code/DEBIAN/postinst Outdated Show resolved Hide resolved
Copy link
Contributor

@heartsucker heartsucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible nit. Otherwise 💯 :D

@emkll
Copy link
Contributor

emkll commented Feb 5, 2019

Thanks @redshiftzero, great to see all xenial tests passing! Visual diff looks good to me. I've also done a functional upgrade test as follows:

  1. provision staging VMs on this branch
  2. upgrade to xenial (limited to app server)
  3. build xenial debs on this branch
  4. install xenial debs on app server

I was able to decrypt the message ! However, I do get this in /var/log/apache2/source-error.log (I have not observed any breakage in functionality) :

[Tue Feb 05 21:18:53.367308 2019] [wsgi:error] [pid 4478:tid 3513516791552] Exception in thread Thread-10:
[Tue Feb 05 21:18:53.367335 2019] [wsgi:error] [pid 4478:tid 3513516791552] Traceback (most recent call last):
[Tue Feb 05 21:18:53.367340 2019] [wsgi:error] [pid 4478:tid 3513516791552]   File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
[Tue Feb 05 21:18:53.367351 2019] [wsgi:error] [pid 4478:tid 3513516791552]     self.run()
[Tue Feb 05 21:18:53.367355 2019] [wsgi:error] [pid 4478:tid 3513516791552]   File "/usr/lib/python2.7/threading.py", line 754, in run
[Tue Feb 05 21:18:53.367358 2019] [wsgi:error] [pid 4478:tid 3513516791552]     self.__target(*self.__args, **self.__kwargs)
[Tue Feb 05 21:18:53.367361 2019] [wsgi:error] [pid 4478:tid 3513516791552]   File "/usr/local/lib/python2.7/dist-packages/pretty_bad_protocol/_meta.py", line 670, in _read_response
[Tue Feb 05 21:18:53.367365 2019] [wsgi:error] [pid 4478:tid 3513516791552]     result._handle_status(keyword, value)
[Tue Feb 05 21:18:53.367368 2019] [wsgi:error] [pid 4478:tid 3513516791552]   File "/usr/local/lib/python2.7/dist-packages/pretty_bad_protocol/_parsers.py", line 995, in _handle_status
[Tue Feb 05 21:18:53.367371 2019] [wsgi:error] [pid 4478:tid 3513516791552]     raise ValueError("Unknown status message: %r" % key)
[Tue Feb 05 21:18:53.367374 2019] [wsgi:error] [pid 4478:tid 3513516791552] ValueError: Unknown status message: u'IMPORT_OK'
[Tue Feb 05 21:18:53.367379 2019] [wsgi:error] [pid 4478:tid 3513516791552] 

Did anyone see this error in their testing under xenial?

@redshiftzero
Copy link
Contributor Author

Oh interesting, I have seen that, check out #4045. Did you only see this error once during upgrade?

@emkll
Copy link
Contributor

emkll commented Feb 5, 2019

Yes, it only happens once: the first time a source decrypts a reply once the server was upgraded. I cannot reproduce by creating a new source (and replying to this new source), nor by rebooting.

@heartsucker
Copy link
Contributor

Does the upgrade do something like symlink gpg2 to gpg or anything weird like that?

@redshiftzero
Copy link
Contributor Author

Ah gotcha, I expect this is happening due to the private key migration done (once) when running gpg2.1 for the first time

--pinentry-mode does not exist for the default version of gpg2
on trusty (it also won't use gpg-agent, which is fine)
Issues on the gnupg mailing list seems to indicate that the pinentry
option is the culprit

[0] https://dev.gnupg.org/T3465
[1] https://lists.gnupg.org/pipermail/gnupg-users/2016-May/055965.html
we can just use mock to set the platform for test purposes
there are some gpg 2.1 compatibility issues with how this was
written previously: namely that exporting secret keys needs to be
apparently requires a passphrase [0], which isis's python-gnupg
does not support [1].

this commit modifies the test to instead just assert that the file
decrypts with a key in the keyring, not the specific key.

[0] https://bitbucket.org/vinay.sajip/python-gnupg/commits/9e90361bfc2fc853953a64aca6a0b6bcb64981c8?at=default#Lgnupg.pyT1142
[1] https://github.com/isislovecruft/python-gnupg/blob/master/pretty_bad_protocol/gnupg.py#L423
since in the trusty dev env, there's no gpg-agent, let's make
sure we don't bail out this function (pkill returns 1 if the
process was not found)
@emkll
Copy link
Contributor

emkll commented Feb 5, 2019

On xenial, gpg still points to gpg (v1), no symlink. I think you are right @redshiftzero . Also I did have a source set up on that server, so a migration was necessary. It doesn't seem to generate an ossec alert, so I think we should keep a close eye on the apache logs during the QA period.

@redshiftzero
Copy link
Contributor Author

redshiftzero commented Feb 5, 2019

ok so one idea that should handle this situation (as well as validate the theory above re: this ValueError being due to the private key migration) is to run the migration manually, e.g. gpg2 --homedir=/var/lib/securedrop/keys --import < /var/lib/securedrop/keys/secring.gpg during postinst of securedrop-app-code, let me give that a try

@@ -97,6 +97,12 @@ case "$1" in
echo allow-loopback-pinentry > /var/lib/securedrop/keys/gpg-agent.conf
fi

# Migrate private keyring to gpg2.1 if needed
if [ ! -d "/var/lib/securedrop/keys/private-keys-v1.d" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good approach for the upgrade story, in future we can remove these postinst steps (after a few releases)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably never can. We had instances that were pre 0.4 recently, and that release was a looooooong time ago.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So while we have instances that had not ran their Ansible playbooks since before 0.4, the server side was up to latest. Since we can monitor what version the securedrop-app-code package is on via the source interface metadata endpoint (and soon we'll be able to monitor also the base OS via the source interface metadata endpoint), we should be able to remove this from postinst once we have everyone on xenial and post-0.12.0.

@redshiftzero redshiftzero changed the title [xenial] gpg 2.1+ compliance + fix decryption of journalist replies [wip] [xenial] gpg 2.1+ compliance + fix decryption of journalist replies Feb 6, 2019
@redshiftzero
Copy link
Contributor Author

OK so I tested 9796519 via the following procedure:

  1. Provision staging VMs
  2. sudo do-release-upgrade to upgrade to xenial
  3. Upgrade securedrop-app-code package

But 9796519 did not work: I still saw the exception, because the migration failed to work due to the fact that importing secret keys appeared to require the secret key passphrase, which we don't have (a behavior change in gpg2.1, similar to why 826aa96 was necessary). Furthermore, thinking about this a bit more, even if this did work, postinst would not be the right place for trying to run the manual migration because organizations will be updating to 0.12.0 before they do the upgrade to xenial, and the a manual migration will only work if organizations are already using gpg 2.1. So we need to go back to the drawing board on this one.

Since the note @emkll found won't produce an OSSEC alert (and recall source-error.log does not exist in prod), and this high-priority bug is resolved by this PR, I suggest we should merge this in as is. That said, it is still an unhandled exception, so I'm going to file a followup, and we should try another method to resolve if we have time before 0.12.0 (since the fix will be smaller than the diff in this PR, it's something we could do during QA depending on how things go).

Note to reviewers: I've made no other changes since the testing was done above (I force pushed to drop 9796519), so an additional round of testing is not necessary.

@redshiftzero redshiftzero changed the title [wip] [xenial] gpg 2.1+ compliance + fix decryption of journalist replies [xenial] gpg 2.1+ compliance + fix decryption of journalist replies Feb 7, 2019
Copy link
Contributor

@kushaldas kushaldas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on comments from @redshiftzero in #4094 (comment) 👍

@redshiftzero redshiftzero merged commit ff4e0db into develop Feb 7, 2019
@redshiftzero redshiftzero deleted the xenial-pinentry branch February 7, 2019 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants