Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve Travis concurrent jobs or consider Azure pipelines #118

Open
myii opened this issue May 16, 2019 · 36 comments
Open

Resolve Travis concurrent jobs or consider Azure pipelines #118

myii opened this issue May 16, 2019 · 36 comments

Comments

@myii
Copy link
Member

myii commented May 16, 2019

https://freenode.logbot.info/saltstack-formulas/20190516#c2180979:

- - -
09:59 myii @​gtmanfred A problem is beginning to manifest: as we convert more formulas to semantic-release (and generally get inspec testing working on formulas), our Travis queuing is slowing things down. It only appears to handle 3 jobs at a time; the most I've ever seen is 5, but that may have been on my own forks. So it's not uncommon to see 3/50 active jobs and when things get bad, it can take up to an hour to get results.I'm usually manually cancelling and rerunning (my own) jobs to minimise loads but contributors won't be able to do that. This load will only increase in time, as more formulas get converted. We could start limiting the number of jobs per formula but is there anything else that can be done?
10:06 slack1872 except paying for more concurrent builds, I'm not sure...
10:06 myii I reckon we're going to have to start limiting testing per formula.
10:07 myii Unless a formula has specific tests that need to be run on specific combinations of platform/salt-version.
10:08 slack1872 Same problem always pops: you had tests (which is great) and then your tests take too much time...We have the same problem @ work, with 8 kitchen VM created and highstate taking near 15 min. But on Jenkins at least we can run VM in parallel
10:09 myii What do you say to this idea:
10:09 myii 5 instances max, one from each os.
10:10 myii 2019.2 x2, 2018.3 x2 and 2017.7 x1.
10:12 myii We've got a nice set to choose from but now this is about how to select which ones to use per formula.
10:12 myii Is 5 sufficient for testing purposes?

https://freenode.logbot.info/saltstack-formulas/20190516#c2181226:

- - -
12:39 gtmanfred no, limit concurrent jobs
12:39 gtmanfred in travis
12:39 gtmanfred you can have as many runs as you want, but it will only run one per repo at a time

https://freenode.logbot.info/saltstack-formulas/20190516#c2181237:

- - -
12:41 gtmanfred if that doesn’t work, we can look into moving to azure pipelines
12:41 gtmanfred i have been playing with that recently and it is pretty good
12:41 myii Excellent.
12:41 gtmanfred not as easy to build a matrix automatically for, but still pretty easy
12:42 myii Look forward to it.
12:43 gtmanfred Here is a version that builds the matrix https://github.com/saltstack/pepper/blob/develop/azure-pipelines.yml
12:43 myii Looks decent.
12:43 gtmanfred and here is a version where you just define it https://github.com/gtmanfred/pepper/blob/f15ce586a02b377ef1f8bdb48e378c5505179074/azure-pipelines.yml
12:44 myii The loop is nicer.
12:44 gtmanfred agreed
12:44 gtmanfred one thing that it does not have is allowed_failures
12:44 gtmanfred and the yaml can be gross
12:45 myii I don't think we're really using that.
12:45 myii So it shouldn't be a problem.
12:45 gtmanfred There is also this limitation with the yaml https://twitter.com/resticbackup/status/1117327446456655873
12:45 gtmanfred it does not parse whole elements in dot net, it parses one line at a time
12:46 myii OK, good to be aware of that.
12:47 myii If this comes to fruition, we'll put these notes in a little doc.
@aboe76
Copy link
Member

aboe76 commented May 16, 2019

please also test cirrus-ci: https://cirrus-ci.org/#comparison-with-popular-ciaas

@myii
Copy link
Member Author

myii commented May 16, 2019

@aboe76 Would you mind bringing that up in the Slack room? More eyes will see it there. In the meantime, what do you think about limiting the number of runs per formula? Should it be one for each os; is that sufficient?

@aboe76
Copy link
Member

aboe76 commented May 16, 2019

most formula's only needs these different os's, :

  • debian (this covers ubuntu/debian derivites)
  • centos (this covers redhat/fedora derivites)
  • opensuse (because sles/susemanager stuff)
  • bsd

when switching to circus ci the concurency limit isn't a factor any more:
https://cirrus-ci.org/#key-highlights

  • free for opensource
  • no concurency limit
  • flexible execution
  • matrix builds

@javierbertoli
Copy link
Member

most formula's only needs these different os's, :
* debian (this covers ubuntu/debian derivites)
* centos (this covers redhat/fedora derivites)
* opensuse (because sles/susemanager stuff)
* bsd

I agree with @aboe76: having a 20+ images available does not mean that we need to test with all of them. I think that to test that a formula works, the rationale should be something like:

  1. Latest Salt + py3 on
  • debian
  • centos
  • opensuse
  • bsd
  1. If something is known to fail or behaves different with
  • a particular distro (ie, fedora),
  • distro release (ie, centos-6)
  • salt version (ie, 2018)
  • python version (ie, py )

tests for the required combination of the above should be added to the test matrix.

I think that part of the review of PRs should check the testing matrix to be enough but not an excess. Testing is easy, knowing what and when to test is not 😋

Also, cirrus-ci sounds interesting 👍

My 2c

@myii
Copy link
Member Author

myii commented May 17, 2019

So after some discussion on Slack prompted by @aboe76, we've got Cirrus CI enabled for the whole org. Some specific notes about BSD:

- - -
21:07 gtmanfred you would need to use vagrant for bsd
21:08 gtmanfred realistically though, you could use kitchen passthrough, to run the commands on the bsd instances, instead of creating a new docker container to run the tests on
21:10 gtmanfred when you get to it, see how we using the proxy for windows on appveyor https://github.com/saltstack/salt-jenkins/blob/develop/.kitchen.appveyor.yml
21:10 gtmanfred it should translate to bsd probably

I've started work on testing Cirrus CI with cron-formula but I've not yet had success getting it to play nicely with kitchen. There'll be some special invocation but there aren't any examples out there in the wild that I could use as references.

@n-rodriguez
Copy link
Member

I've started work on testing Cirrus CI with cron-formula but I've not yet had success getting it to play nicely with kitchen. There'll be some special invocation but there aren't any examples out there in the wild that I could use as references.

https://cirrus-ci.org/guide/writing-tasks/#additional-containers

@myii
Copy link
Member Author

myii commented May 18, 2019

So a first example of the difference between 15 and 5, thanks to @daks:

Instances Running time Total time
15 16 min 24 sec 37 min 20 sec
5 7 min 17 sec 10 min 47 sec

@aboe76 @javierbertoli Are these the 5 images you're suggesting or did you mean stick with the os_family images and only use the os variants when necessary?

By the way, also note that we're leaving all of the platforms available in kitchen.yml, so that any/all can be tested locally.

@aboe76
Copy link
Member

aboe76 commented May 18, 2019

@myii I would prefer to stick with the os_family variants, and then a little bit conservative,
so:
'RedHat' => use centos images
'Debian' => use debian images
'Suse' => use opensuse leap images if possible.
'Bsd' => ??

Having to wait 16 minutes to have a test run for a PR is a bit much...

@myii
Copy link
Member Author

myii commented May 18, 2019

@aboe76 Thanks for that. What do you think of my comment I've just made after you suggested that: saltstack-formulas/ufw-formula#7 (comment)?

@aboe76
Copy link
Member

aboe76 commented May 19, 2019

Yes @myii, I think that is a good comment, like with the testing of the the formula works on OS, the same thing can be applied to salt versions.

The testing should be able to spot regressions and issues earlier, but not in a way that is hindering or slowing everything down. I know about edge cases that we might miss, but I don't think that most of the community will be better served if we stick to the broad road instead of narrow paths.

myii added a commit to myii/template-formula that referenced this issue May 19, 2019
…s#118)

* The selected 5 in a "T"-shaped matrix:
  - 2019.2 (py3): debian-9   fedora-29   opensuse-leap-15
  - 2018.3 (py2):           ubuntu-1604
  - 2017.7 (py2):             centos6
* Covers each `os`, each Salt version and both Python versions
@myii
Copy link
Member Author

myii commented May 19, 2019

So #121 is a suggested workaround to use for the time being. Please have a look so that this selection can be propagated throughout all of the formulas using 10+ instances (i.e. all semantic-release formulas to date).

@myii
Copy link
Member Author

myii commented May 19, 2019

I've just noticed that Limit concurrent jobs has been set to 1 in Travis. I'm not sure that's such a good idea now, since it reduces the benefit of having these run in parallel, to get releases done quickly. Very busy periods aren't that common, so at quiet times, it really slows things down. I've unset this again for the time being but feel free to discuss this if it is an issue.

myii added a commit to myii/template-formula that referenced this issue May 22, 2019
…s#118)

* The selected 6 in a "tree"-shaped matrix:
  - 2019.2 (py3): debian-9   centos7   opensuse-leap-15
  - 2018.3 (py2):     fedora-29   ubuntu-1604
  - 2017.7 (py2):            centos6
* Covers each `os`, each Salt version and both Python versions
javierbertoli added a commit that referenced this issue May 22, 2019
ci(travis): reduce matrix down to 5/6 instances (ref: #118)
saltstack-formulas-travis pushed a commit that referenced this issue May 24, 2019
## [2.1.12](v2.1.11...v2.1.12) (2019-05-24)

### Continuous Integration

* **travis:** improve recommended matrix usage comment ([b08a0fd](b08a0fd))
* **travis:** reduce matrix down to 6 instances (ref: [#118](#118)) ([a8834e2](a8834e2))

### Documentation

* **contributing:** add `bind-formula` to `semantic-release` formulas ([3da78b0](3da78b0))
@myii
Copy link
Member Author

myii commented May 28, 2019

So a little update. With some help from the Cirrus folk and working together with @javierbertoli, we've started making progress with Cirrus CI. Some parts of the developments are mentioned from this point forward, for anyone interested in more of the details. Or to get straight to the output, you can see it here: https://cirrus-ci.com/build/5647302937542656.

I'd like to pull in @alxwr here: Cirrus CI allows testing on *BSD as well. Is this something you'd be interested in helping out with? Or for whoever fancies taking this on, this comment above is pertinent to that.

We're also seriously considering keeping both Travis and Cirrus. Travis for the the release process (i.e. commitlint and semantic-release), with Cirrus for the actual testing.

To gauge whether this is really going to make a significant difference, we need some "lambs to the slaughter" -- suggestions of formulas that can be converted to this hybrid system. Preferably formulas that involve *BSD, MacOS and Windows as well, so that we can test those features as well. It's not really much of a risk since we can always re-enable the matrix in Travis again if it doesn't work out.

@n-rodriguez
Copy link
Member

n-rodriguez commented May 28, 2019

We're also seriously considering keeping both Travis and Cirrus. Travis for the the release process (i.e. commitlint and semantic-release), with Cirrus for the actual testing.

Why not... :) I didn't reply until now because I was thinking about the CI config file : it's cool to have a performant CI but it's cool to have nice config files too ;) And IMHO, Azure's config file is not very sexy, so not very easy to understand and to maintain/evolve. On the other hand, Cirrus' config file is as easy to understand than Travis one so it's a good point for Cirrus.

@myii
Copy link
Member Author

myii commented May 28, 2019

@n-rodriguez You spoke too soon... wait until we start adding *BSD, MacOS and Windows!

@myii
Copy link
Member Author

myii commented May 28, 2019

Just noting another benefit of keeping the release process in Travis: all of the GitHub tokens that have already been set across the repos. Having to do that again for Cirrus isn't appealing.

@myii
Copy link
Member Author

myii commented May 28, 2019

https://cirrus-ci.org/faq/#are-there-any-limits:

Are there any limits?

There are no limits on how many VMs or Containers you can run in parallel if you bring your own compute services or use Compute Credits for either private or public repositories.

Cirrus CI has following limitations on how many VMs or Containers a single user can run for free for public repositories:

  • 8 Linux Containers or VMs
  • 2 Windows Containers or VMs
  • 2 FreeBSD VMs
  • 1 macOS VM

Which means that a single user can run at most 13 simultaneous tasks for free.

No per repository limits

Cirrus CI doesn't enforce any limits on repository or organization levels. All the limits are on per user basis. For example, if you have 10 active contributors to a repository then you can end up with 130 tasks running in parallel for the repository.

@myii
Copy link
Member Author

myii commented May 28, 2019

In terms of ensuring all checks are passing before allowing merges, it's probably best to leave that to the GitHub UI rather than cross-checking from Travis and Cirrus:

If there really was a need to check, there's probably a way via. the GitHub API:

@myii
Copy link
Member Author

myii commented May 28, 2019

Here's the first PR merged using Cirrus CI alongside Travis CI:

Note, we enabled all of the instances in the matrix for initial testing. We'll reduce that eventually, if all goes well.

@myii
Copy link
Member Author

myii commented May 28, 2019

@myii
Copy link
Member Author

myii commented May 28, 2019

First negative points to collect:

  1. There's no (easy?) way to cancel and re-run jobs.
  2. Concurrency per user can end up quite painful -- I'm still waiting for the merge of the first PR above to complete before the checks on the second PR can begin... it might delay me by 30 minutes (this is an extreme case).
  3. Each instance runs slower than the Travis equivalent, maybe even half the speed (1.5 minutes vs. 3 minutes).
  4. Instances can take a long time to start up; we're talking about sometimes even 10 minutes before any start up.
  5. The instances sometimes end up restarting without any clear reason why. This was the answer I got in response to that:

Cirrus CI uses preemptible instances for cost optimization for free OSS builds so instances sometimes got preempted but Cirrus then automatically reschedules it on a regular instances. This is a rare event.

Unfortunately, I've already seen it happen a few times.

@myii
Copy link
Member Author

myii commented May 29, 2019

Some more bad news, unfortunately.

Reduced the test matrix back to 6 (like our reduced Travis runs) but the first stage of the PR (before merge) hasn't completed yet: https://cirrus-ci.com/build/5778854061277184 -- over an hour.

Update: I went away from this comment for a few moments and it's just finished. Now to try the PR merge to see how long that takes...

myii added a commit to myii/ufw-formula that referenced this issue Jun 28, 2019
* Use balanced matrix based on `template-formula` guidelines
* Initial ref: saltstack-formulas/template-formula#118
myii added a commit to myii/syslog-ng-formula that referenced this issue Jun 28, 2019
* Use balanced matrix based on `template-formula` guidelines
* Initial ref: saltstack-formulas/template-formula#118
myii added a commit to myii/ufw-formula that referenced this issue Jun 28, 2019
* Use balanced matrix based on `template-formula` guidelines
* Initial ref: saltstack-formulas/template-formula#118
myii added a commit to myii/ufw-formula that referenced this issue Jun 28, 2019
* Use balanced matrix based on `template-formula` guidelines
* Initial ref: saltstack-formulas/template-formula#118
myii added a commit to myii/ufw-formula that referenced this issue Jun 28, 2019
* Use balanced matrix based on `template-formula` guidelines
* Initial ref: saltstack-formulas/template-formula#118
myii added a commit to myii/syslog-ng-formula that referenced this issue Jun 28, 2019
* Use balanced matrix based on `template-formula` guidelines
* Initial ref: saltstack-formulas/template-formula#118
myii added a commit to myii/syslog-ng-formula that referenced this issue Jun 28, 2019
* Use balanced matrix based on `template-formula` guidelines
* Initial ref: saltstack-formulas/template-formula#118
myii added a commit to myii/syslog-ng-formula that referenced this issue Jun 28, 2019
* Use balanced matrix based on `template-formula` guidelines
* Initial ref: saltstack-formulas/template-formula#118
myii added a commit to myii/syslog-ng-formula that referenced this issue Jun 28, 2019
* Use balanced matrix based on `template-formula` guidelines
* Initial ref: saltstack-formulas/template-formula#118
myii added a commit to myii/syslog-ng-formula that referenced this issue Jun 28, 2019
* Use balanced matrix based on `template-formula` guidelines
* Initial ref: saltstack-formulas/template-formula#118
myii added a commit to myii/fail2ban-formula that referenced this issue Jun 28, 2019
* Use balanced matrix based on `template-formula` guidelines
* Initial ref: saltstack-formulas/template-formula#118
myii added a commit to myii/locale-formula that referenced this issue Jun 28, 2019
* Use balanced matrix based on `template-formula` guidelines
* Initial ref: saltstack-formulas/template-formula#118
myii added a commit to myii/sudoers-formula that referenced this issue Jun 28, 2019
* Use balanced matrix based on `template-formula` guidelines
* Initial ref: saltstack-formulas/template-formula#118
myii added a commit to myii/postfix-formula that referenced this issue Jun 28, 2019
* Use balanced matrix based on `template-formula` guidelines
* Initial ref: saltstack-formulas/template-formula#118
myii added a commit to myii/prometheus-formula that referenced this issue Jun 28, 2019
* Use balanced matrix based on `template-formula` guidelines
* Initial ref: saltstack-formulas/template-formula#118
myii added a commit to myii/prometheus-formula that referenced this issue Jun 28, 2019
* Use balanced matrix based on `template-formula` guidelines
* Initial ref: saltstack-formulas/template-formula#118
myii added a commit to myii/grafana-formula that referenced this issue Jun 28, 2019
* Use balanced matrix based on `template-formula` guidelines
* Initial ref: saltstack-formulas/template-formula#118
myii added a commit to myii/grafana-formula that referenced this issue Jun 28, 2019
* Use balanced matrix based on `template-formula` guidelines
* Initial ref: saltstack-formulas/template-formula#118
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants