Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test upgrade path in CI #1689

Open
2 of 5 tasks
redshiftzero opened this issue May 6, 2017 · 5 comments
Open
2 of 5 tasks

Test upgrade path in CI #1689

redshiftzero opened this issue May 6, 2017 · 5 comments
Assignees
Labels
epic Meta issue tracking child issues goals: sick CI ops/deployment

Comments

@redshiftzero
Copy link
Contributor

redshiftzero commented May 6, 2017

Related to #1681: it would be really great (at a future point) for CI to be catching issues that are introduced that do not break new installs but do break upgrades on existing instances.


Discussed on 2018-05-27 between @msheiny, @conorsch, @redshiftzero and @eloquence. Agreed upon initial scoping for this epic is as follows:

To be scoped further depending on the above:

  • Write logic for provisioning test image reqs, e.g. base box files via cache (Packer build as with AWS CI nodes for staging seems reasonable) - estimate 25 hr
  • Wire up CI to run once daily. I suspect we'll have to use Jenkins for this at this time. - estimate 4 hr
  • Write logic for alerting team members (slack? gitter?) of failing test - estimate 1 hr
@msheiny
Copy link
Contributor

msheiny commented Jun 2, 2017

So I'm really thinking we need this as part of 0.4 release - we can time-box it but I think its worth trying to bring in. In essence, testing upgrades in particular, is a huge time-drain and its going to hurt more when we try to particularly test nuances of the jump between ansible versions and tails versions.

(taking some of this convo from chat)
I see testing strategy for this issue as a two part problem:

  1. ensure users are able to upgrade tails from 2 -> 3 without breaking their ability to reach their servers and perform upgrades.
  2. ensure we do not break the server when running a new playbook against a 0.3.12 server

Re: 1 - Client-side testing for this scenario is going to be a pain in the ass to try and automate. At least as the docs currently describe it (create a backup tails stick, upgrade from 2 -> 3, test web/ssh/ansible access still exists). We can do particularly pieces of it but overall i think this is best left as a manual check in QA. :(

Re: 2 -- This should be much easier to do but still has a few hairy moving pieces. High-level workflow I see:

  • Create an image of a running 0.3.12 server (staging) - remove tor auth keys (so there isnt a conflict)
  • Clone off the image during CI run
  • Resprinkle tor tor auth pieces
  • Spin up ephemeral apt server and point securedrop CI server here
  • Perform upgrade and record results

Need to figure out how much time this takes to run and the best place to run in the pipeline depending on that answer. The more I type this out I realize its going to probably be a 2 week debug + implementation process.... maybe that isnt a good fit for pre-0.4 after-all.... anyways.. I need more tea.

tl;dr - I'm confused about whether adding this to 0.4 is a good idea or not. Glad I typed it out in a bunch of coherent sentences 🌮 🎉 🚲

@msheiny
Copy link
Contributor

msheiny commented Feb 13, 2018

index

@msheiny
Copy link
Contributor

msheiny commented Feb 14, 2018

Sooo @dachary brought up some really good points out of band in chat that I need to paste here. The gist was that he requested that what we run in CI should match completely what can be run by developers.

This is a valid request and usually in the past I've always tried to make CI as close to what's run locally by developers. The challenge with SD in particular though is that SD is designed for physical hardware, our kernels are explicitly not enabling certain VM guest features that make it break on clouds like AWS, and more importantly the cloud providers we work with so far do not offer nested virtualization. The last part is the biggest issue - having nested virtualization would allow us to use the same workflow and spin-up logic as developers.

So @dachary and I specifically started talking about the ability to run nested virtualization in public clouds. I've done a little research of the big three (note, i've stricken digitalocean from this since the do not have the ability to tightly scope API creds as far as I know):

We have credits for Azure + AWS. Most of our code experience is with AWS though I've started to play with Azure a little. Folks on the team have some GCloud experience.. and there are enough ansible modules... So it would probably be fine. Theres definitely a spin-up cost though.

Anyways... I'm fine with putting in this effort and I think it's worthwhile (it would probably be a lot more stable to provision to be honest) BUT it should be understood there is a time cost from adding this support. So I'm not sure with those changes if I can hit the release date target for 0.6 with this added scope.

@msheiny
Copy link
Contributor

msheiny commented Feb 14, 2018

Okay so it sounds like we are going to pivot this ticket slightly and aim for running under nested virtualizaiton. Going to break this into two tickets:

@redshiftzero redshiftzero modified the milestones: 0.6, 0.7 Feb 17, 2018
@msheiny
Copy link
Contributor

msheiny commented May 16, 2018

As a new direction for this ticket and to scope the work, I propose the following high-level goals:

[omitted]

(We discussed this on 2018-05-17. Agreed upon scoping moved to top of ticket for visibility. -- @eloquence)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Meta issue tracking child issues goals: sick CI ops/deployment
Projects
None yet
Development

No branches or pull requests

3 participants