Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework Ansible restore role to detect Onion service version mismatch #4801

Merged
merged 1 commit into from
Sep 13, 2019

Conversation

rmol
Copy link
Contributor

@rmol rmol commented Sep 11, 2019

Status

Ready for review

Description of Changes

Make the restore playbook extract the Tor configuration from the backup, retrieve the app server's Tor configuration, and compare them. If the service versions differ, abort the restore with an explanation.

Fixes #4786.

Testing

  • Install the latest 1.0.0 release candidate on hardware or VMs, specifying only v2_onion_services with sdconfig/site-specific.
  • Check out this branch on your admin workstation.
  • Create a marker file under /var/lib/securedrop/store to test the restore. Send a submission through the source interface, or simply create a file manually. If you've got the time to try a largish file, great.
  • Run securedrop-admin backup to backup the v2-only system.
  • Delete the marker file.
  • Run securedrop-admin restore to restore that backup. Confirm that the restore worked by verifying the presence of the marker file.
  • Add v3 with sdconfig, then securedrop-admin install, then securedrop-admin tailsconfig.
  • Run securedrop-admin restore with the backup file created earlier. The restore should abort with this message:
The Tor configuration on the app server offers version 2 and 3 services.
The Tor configuration in this backup offers version 2 services.

Restoring a backup with a different Tor configuration than the server 
is currently unsupported. If you require technical assistance, please 
contact the SecureDrop team via the support portal or at 
[email protected].

Deployment

This reworks the restore playbook, replacing the restore.py script previously copied to the server with equivalent Ansible tasks, preventing hangs when Tor is restarted on the server. Those changes are straightforward but deserve scrutiny.

Checklist

If you made changes to securedrop-admin:

  • Linting and tests (make -C admin test) pass in the admin development container

If you made non-trivial code changes:

  • I have written a test plan and validated it for this PR

Copy link
Contributor

@conorsch conorsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't run these changes yet, only performed a visual review. Thoughts on implementation:

Would simply diffing the files be sufficient check to trigger the mismatch? The logic to inspect the HiddenServiceDirs specifically is sound, that's definitely going to cover the intended use case. If we call out to diff, we have the option of displaying the differences to Admins (and no secrets are stored in torrc, so fine to write to stdout). Curious whether @zenmonkeykstop feels it'd be useful to have diff output readily available to Admins, whether for support comms or solo troubleshooting. If diff is available by default in Tails, then moot point.

Other than that, these changes look solid. I don't have strong opinions about Python-vs-Ansible for handling the tarball extraction and service bounces. The change to Ansible appears to make the individual steps more obvious, and should also help with identifying specific errors.

@@ -736,6 +736,10 @@ def restore_securedrop(args):
# but the securedrop-admin
# script will be invoked from the repo root, so preceding dirs are likely.
restore_file_basename = os.path.basename(args.restore_file)

# Would like readable output if there's a problem
os.environ["ANSIBLE_STDOUT_CALLBACK"] = "debug"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "debug" setting left over from development? If we want to ship the change, we should be sure it doesn't clutter up e.g. the graphical updater output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that was intentional, to present the output of compare_torrc.py more legibly. This should only affect the environment in the restore function.

when: torrc_check_dir.path is defined

- name: Copy backup to application server
synchronize:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: copy module is sufficient here, since we're just copying a single file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was that with partial: yes an interrupted upload of the backup could be resumed.

become: no
fetch:
src: /etc/tor/torrc
dest: "{{ torrc_check_dir.path }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First thought was that we needn't copy the torrc back to the Admin Workstation, but doing so allows us to compare the files before we copy the potentially-quite-large tarball to the remote. Great choice. 👍

Make the restore playbook extract the Tor configuration from the
backup, retrieve the app server's Tor configuration, and compare
them. If the service versions differ, abort the restore with an
explanation.
@rmol rmol force-pushed the robuster-restorer branch from d90c02d to 530c1c4 Compare September 12, 2019 03:56
@rmol
Copy link
Contributor Author

rmol commented Sep 12, 2019

Re: simply diffing -- I was originally just using Ansible's stat.checksum to detect differences, but it's useful to determine how they differ, to be able to say which side has v2, v3, or both. It was going to be more useful when I was hoping to provide guidance for reconfiguring to enable the restore, but I ran out of time.

@conorsch
Copy link
Contributor

conorsch commented Sep 12, 2019

Started full review, documenting progress below.

Status: In progress

  • Install the latest 1.0.0 release candidate on hardware or VMs, specifying only v2_onion_services with sdconfig/site-specific.
  • Check out this branch on your admin workstation.
  • Create a marker file under /var/lib/securedrop/store to test the restore. Send a submission through the source interface, or simply create a file manually. If you've got the time to try a largish file, great.
  • Run securedrop-admin backup to backup the v2-only system.
  • Delete the marker file.
  • Run securedrop-admin restore to restore that backup. Confirm that the restore worked by verifying the presence of the marker file.
  • Add v3 with sdconfig, then securedrop-admin install, then securedrop-admin tailsconfig.
  • Run securedrop-admin restore with the backup file created earlier. The restore should abort with this message:

@conorsch conorsch self-requested a review September 12, 2019 21:27
Copy link
Contributor

@conorsch conorsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does what it says on the tin! All steps successful, up to and including the expected failure during restore attempt of v2 tarball against v3-enabled remote. Thanks for the super clear test plan here, @rmol. No requested changes.

@eloquence eloquence added this to the 1.0.0 milestone Sep 13, 2019
@kushaldas kushaldas merged commit b35f85b into freedomofpress:develop Sep 13, 2019
emkll added a commit that referenced this pull request Sep 13, 2019
[1.0.0] backport #4801 Rework Ansible restore role to detect Onion service version mismatch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Restoring from a v2 backup on an instance with v3 enabled breaks v3 onion services
4 participants