Replace custom Ansible config logic with community roles #2360

conorsch · 2017-09-22T21:36:02Z

Feature request

Description

SecureDrop uses Ansible for first-run provisioning during install time, and also for applying major configuration updates over time (for upgrades that require Admin intervention). To date all of the Ansible logic is custom, written specifically for SecureDrop, which increases the maintenance burden for the project. We'd do better to position ourselves downstream from community-maintained solutions for managing basic services (e.g. apache, apt, ssh, tor), and take advantage of more robust hardening solutions, rather than porting changes to our custom config.

Rationale

If we lean heavily on configuration logic already maintained by the community, then the SecureDrop configuration becomes a declarative collection of YAML vars files, and the rest of the logic is maintained upstream. Here are a few strong candidates for inclusion into this project:

FPF maintains a number of roles that provide the desired functionality externally, and that work cannot be easily included in into the development, testing, QA, and release workflow for SecureDrop. Examples include the grsecurity kernel build process (see #2031), the OSSEC build logic (recently added to this repo in #1668), OSSEC install logic, and general building Debian packages.

By reusing community work we'll likely begin upstreaming improvements and thereby expand the knock-on effects of SecureDrop development, as well as potentially pulling more would-be contributors into the fold.

Technical details

Any implementation for including external contributions for server hardening should meet the following criteria:

Maintains trust with GPG-signed release tags.
Easy for maintainers to update into the future.
Simple for Admins to use (no complex git operations required).

Usage of Ansible Galaxy directly is ruled out by 1), since ansible-galaxy install <role> makes HTTPS calls and performs no additional verification of the downloaded tarball. Using git submodules is a common approach for managing remote dependencies in complex projects, and may satisfy 2), but would not meet muster on regarding 3).

Therefore I propose using git subtree --squash to commit the external roles directly to the repository, complete with a summary of commit messages. We can use this pattern both to add the external roles, and to update them over time, if necessary. PR reviews would should the entire diff of changed files, rather than just a commit hash change, as would be the case with a submodule.

Given the upcoming Xenial migration (#1530), we'd do well to lean on community solutions, otherwise we'd have to manually add support for Xenial to all the roles we maintain. It's common practice for Ansible role authors to support multiple distributions in a single role, via vars overrides. Given the recent push toward config testing (#1616), we can confidently evaluate the new role behavior by running the same suite of config tests to validate system state.

User Stories

As a SecureDrop Admin, I want my SecureDrop instance to use best practices for hardening the configuration.

As a SecureDrop maintainer, I want to utilize existing community work to configure and harden SecureDrop instances, rather than duplicating work by porting solutions that could instead be directly imported.

As a free software developer, I want to call attention to successful projects closely related to use cases I care about, and engage those maintainers so our various projects can mature over time.

The text was updated successfully, but these errors were encountered:

ghost · 2017-09-22T23:13:04Z

Relying on community maintained roles is a great move. Choosing a role is difficult because we have to assert the quality of each role. It would be great if we had a set of roles consistent with each other and maintained with a common QA process but... not yet ;-)

Here are my two cents regarding the criteria when evaluating a role (or any other Free Software project we depend on really).

Easy for maintainers to update into the future.
- Active developer, responding to bugs within less than a week, agreeable to changes that are in the scope of the role, likely to last for at least two years
- Integration tests (i.e. starts a service in a VM and verify the service is running)
Simple for Admins to use
- Complete reference documentation (i.e. no need to dig into the sources to figure out what a variable does)
- Is used by at least 100 people
- Has been around for at least two years
- Is standalone (i.e. does not depend on other roles providing some generic abstractions or something similar)

I don't know of any role matching these requirements so I'm not suggesting it should be an absolute rule. But I believe each criterion significantly increases the maintainability/usability of the role. And in the end it needs to be compared to the burden of manually maintaining a role.

conorsch · 2017-09-22T23:35:26Z

@dachary Great feedback, thanks for sharing. All of the roles I mentioned above pass the criteria you specify. The considerations you list should be folded into the contributor guidelines for this project when/if we start pulling in external roles.

As for the testing and CI story, I do not expect to run the test suites for the individual roles themselves as part of testing SecureDrop, but rather rely on the SecureDrop-specific app and config tests that validate end state based on the changes provided by the roles. Happy to discuss further, but that seems sanest to me.

I'll note further that many of the roles mentioned above are already packaged as part of base images for general use in backend services at FPF, so we ourselves have been using these roles regularly for several years already. The dev-sec roles in particular have had a lot of research go into them, and I'd like to take advantage of the interest there to improve the baseline security posture of SecureDrop.

Also, I've started breaking up the grsecurity-related roles that FPF maintains (freedomofpress/ansible-role-grsecurity#109), so they can be included into this repository, which should make it more straightforward to test and release new kernel versions for use with SecureDrop.

ghost · 2017-09-22T23:39:12Z

Regarding https://github.com/geerlingguy/ansible-role-apache as a candidate to replace install_files/ansible-base/roles/app/tasks/install_and_harden_apache.yml, I think it is more trouble than it is worth because:

it does not implement everything (removal of /var/www/html, does not support removal of multiple site-enabled files, no support for securedrop_worker.conf ...) so we end up keeping a good part of the existing install_and_harden_apache.yml, the associated variables and templates
it is not integration tested (i.e service is started but no check it actually answers)
most of the maintenance effort is on the cross platform aspects which is of no interest to us (i.e. we would be better off with an ubuntu specific role with more features)

ghost · 2017-09-22T23:43:07Z

I'll note further that many of the roles mentioned above are already packaged as part of base images for general use in backend services at FPF

that's a very powerful argument in favor of using these roles. I strongly believe the best software in the world is the one you're using right now. I'm only half joking, distorting the saying "the best beer in the world is the one you have in your hand" ;-)

conorsch · 2017-09-22T23:46:08Z

Or, put another way:

the best dogfood in the world is in the dish you're eating from

I'm only half joking. 😜

heartsucker · 2017-12-13T10:30:30Z

I'm actually against using these roles directly. Last I checked, and I can't find proof otherwise at the moment, there can be collisions between namespaces. Also, often the logic is convoluted because they are meant to work on every distro and not just Ubuntu like we're targeting. Additionally, we have lots of interdependencies between things like our IP tables rules, DNS servers, app/mon/firewall IP addresses, and so on. I also think SD is more paranoid than the average ansible package, and we would want additional changes on top.

I think it would be better to use them as references and upstream tweaks where we can. My gut says that abstracting everything away would be more trouble than it's worth.

conorsch · 2017-12-14T01:29:51Z

Last I checked, and I can't find proof otherwise at the moment, there can be collisions between namespaces.

We're actually more likely to run into collisions given our SD-specific role logic, since we haven't been diligent about sticking to the best practice of prefixing all role vars with the role name, precisely to avoid namespace collisions. All the roles linked to in the OP adhere to that practice.

Also, often the logic is convoluted because they are meant to work on every distro and not just Ubuntu like we're targeting.

Some are, true, but as #1530 shows, we do need to support more than just Ubuntu Trusty, and rewriting roles to expand platform support when we could simply depend on upstream seems like wasted effort.

Additionally, we have lots of interdependencies between things like our IP tables rules, DNS servers, app/mon/firewall IP addresses, and so on.

The spaghetti-like interdependencies you mention are precisely what I'd like to eliminate. As far as making sure that a single value propagates down to several roles in separate namespaces, yet stays DRY, we fortunately have the ability to do that now via group_vars for the Application and Monitor Servers, which landed in #1196. As of 0.4 we're already using those group_vars/all to set sane defaults for the servers.

I think it would be better to use them as references and upstream tweaks where we can. My gut says that abstracting everything away would be more trouble than it's worth.

Fair, but continuing to maintain internal roles means more changes like #2730—whereas if we'd pulled in the dev-sec.ssh-hardening role, we'd get that change and much more, plus be able to delete a sizable chunk of our SD-specific Ansible logic.

Final point: the install-time vulnerability patched in 0.4.4 would almost certainly have been mitigated by the use of external roles. The tor management role linked above does not have that bug, and never did. As for the ntp package, a number of community-maintained apt management roles could be reviewed and confirmed to pass muster. External roles often have more eyes on them than our roles do. For better or for worse, that's the case, and we'd do well to respond accordingly by leveraging the community's attention and review history.

heartsucker · 2017-12-14T10:44:54Z

Ok, those are fair points. I have worked at a couple of places that used ansible, and it was always spaghetti, so I kinda assumed it's not quite possible to keep things clean as I've seen with Chef. Also, this isn't exactly my area of significant contribution, so whatever y'all decide is probably for the best. Was just tossing my thoughts out there.

ghost · 2017-12-14T11:19:16Z

it was always spaghetti,

I believe using molecule fixes that. It greatly contributes to clarity in the securedrop-club ansible repository https://lab.securedrop.club/main/securedrop-club/tree/master/molecule

…r ossec_alerts

eloquence · 2021-01-08T21:40:42Z

Leaving this open per discussion with @conorsch at backlog review:

It may still make sense to replace some custom logic, but our general bias these days is to reduce the Ansible footprint overall as much as possible (e.g., by handling configuration logic in the packages; e.g., Move application configuration logic away from Ansible and into the .deb package #2641).
For changes we make here, we need to do a careful analysis as to whether or not they really reduce our own maintenance burden.

Let's revisit in the context of overall improvements to the provisioning story.

conorsch added Ansible ops/deployment labels Sep 22, 2017

ghost added goals: Improve Ansible logic / smoother install and removed goals: Improve Ansible logic / smoother install Ansible labels Dec 4, 2017

conorsch mentioned this issue Dec 12, 2017

remove hmac-sha1 from sshd_config #2730

Merged

1 task

emkll referenced this issue Jan 16, 2018

Initial proof-of-concept for using signal cli on monitoring server fo…

bd621a8

…r ossec_alerts

redshiftzero mentioned this issue Mar 9, 2018

Periodically enforce state on running servers with ansible pull-like logic #3136

Open

eloquence added the needs/discussion queued up for discussion at future team meeting. Use judiciously. label Jan 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace custom Ansible config logic with community roles #2360

Replace custom Ansible config logic with community roles #2360

conorsch commented Sep 22, 2017

ghost commented Sep 22, 2017 •

edited by ghost

Loading

conorsch commented Sep 22, 2017

ghost commented Sep 22, 2017

ghost commented Sep 22, 2017 •

edited by ghost

Loading

conorsch commented Sep 22, 2017

heartsucker commented Dec 13, 2017

conorsch commented Dec 14, 2017

heartsucker commented Dec 14, 2017

ghost commented Dec 14, 2017

eloquence commented Jan 8, 2021 •

edited

Loading

Replace custom Ansible config logic with community roles #2360

Replace custom Ansible config logic with community roles #2360

Comments

conorsch commented Sep 22, 2017

Feature request

Description

Rationale

Technical details

User Stories

ghost commented Sep 22, 2017 • edited by ghost Loading

conorsch commented Sep 22, 2017

ghost commented Sep 22, 2017

ghost commented Sep 22, 2017 • edited by ghost Loading

conorsch commented Sep 22, 2017

heartsucker commented Dec 13, 2017

conorsch commented Dec 14, 2017

heartsucker commented Dec 14, 2017

ghost commented Dec 14, 2017

eloquence commented Jan 8, 2021 • edited Loading

ghost commented Sep 22, 2017 •

edited by ghost

Loading

ghost commented Sep 22, 2017 •

edited by ghost

Loading

eloquence commented Jan 8, 2021 •

edited

Loading