Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move CI to GitHub Actions #321

Open
wants to merge 35 commits into
base: main
Choose a base branch
from
Open

Conversation

ajbozarth
Copy link
Member

@ajbozarth ajbozarth commented Nov 20, 2024

PR for moving our CI from CircleCI to GitHub Actions. This work includes decoupling build and test steps from the push step as well as support multi-platform builds.

Implementation notes:

  • Each demo has it's own workflow
  • A workflow will only run when code specific to that demo is changed
  • ghcr.io is used for build and test steps to remove reliance on DockerHub credentials in forks
    • DockerHub auth will occur only when push is run
  • Build and test steps can also run against liboqs and oqs-provider main branches by manually triggering a workflow and setting the build_main flag to true.
  • A Run All workflow is included that triggers all the other workflows, this will be useful for times when all the CI need to be manually triggered, such a for a release or to test against lliboqs main.
    • I will be adding workflows to liboqs and oqs-provider to trigger this as followup work
  • The latest image will be pushed to DockerHub and ghcr.io when a workflow is run on the main branch, when not a fork, not a PR, and not building against liboqs main branch
    • The latest image is actually a manifest that points to new platform-based images latest-x86_64 and latest-arm64
    • If a build is manually triggered, the input value release_tag can be set that will replace latest in the above naming convention.
  • linux.yml has been deleted and quic.yml has been update to follow the above push rules, but otherwise current workflows have not been changed

Fixes #294 Fixes #213 Fixes #284

@ajbozarth ajbozarth self-assigned this Nov 20, 2024
@ajbozarth
Copy link
Member Author

ajbozarth commented Nov 20, 2024

Note that I have only opened this draft to allow for iterative feedback. It is far from complete and the current workflow is just my first working example (or MVP for agile folks).

I decided having an open draft would be ok once I discovered I could trigger workflows from local yaml files against branches on my fork via VSCode without needing to push commits. (edit: not actually true, so I'm pushing to a test branch while testing, then making a commit here once I know a workflow is ready)

@ajbozarth ajbozarth mentioned this pull request Nov 20, 2024
also update openssl3 workflow to only use the available 4 cores

Signed-off-by: Alex Bozarth <[email protected]>
Signed-off-by: Alex Bozarth <[email protected]>
move the build against liboqs/oqsprovider matrix builds
to a triggerable option not run automatically

Signed-off-by: Alex Bozarth <[email protected]>
@ajbozarth
Copy link
Member Author

Final update before I go on vacation til Monday. I added workflows for curl and httpd and moved the build against liboqs main from a matrix build to a trigger option. I've updated the description and previous commits to clarify my latest changes.

@Hawazyn
Copy link
Contributor

Hawazyn commented Nov 25, 2024

@ajbozarth I would like to add a linter for Docker files, such as Hadolint or should we rely on MegaLinter instead? What do you think?

@Hawazyn
Copy link
Contributor

Hawazyn commented Nov 25, 2024

I have noticed that some GitHub Actions are not using the latest versions, and I also observed the use of a fixed value with the -j flag. Wouldn’t it be better to rely on -j$(nproc) instead? What do you think?

@ajbozarth
Copy link
Member Author

I would like to add a linter for Docker files, such as Hadolint or should we rely on MegaLinter instead? What do you think?

This is a great idea, I'm not sure on which linter to use, I've never linted Dockerfiles before. Creating a linter GitHub Action would be a great task you can do in parallel to this PR. You would most likely need to do a lot of small updates to get the linter to pass initially though.

I have noticed that some GitHub Actions are not using the latest versions

How so? The actions I added check out and run the latest code locally, there's currently no versioning at all.

Wouldn’t it be better to rely on -j$(nproc) instead

This is something I'm aware of, on CircleCI it was being hard-coded to -j 18 and had a comment on the number of available cores in circle CI being 35. I tried to set it to -j$(nproc) in my workflows but it cause a string escape issue so I instead set it to -j4 since GitHub Actions only has 4 available cores. I plan to go back and see if I can get -j$(nproc) to work, but I didn't want that small issue to block my overall progress too long, so I set it aside for now.

@ajbozarth
Copy link
Member Author

A note here on my most recent update:

I had previously enabled triggering each workflow with a flag to use liboqs/oqsprovider main branches, I have moved this to a callable action (meaning it can be triggered by another workflow but not manually triggered). Instead I have created build.yml which is both trigger able and callable and runs all the workflows with the given input flag. This will enable the most common use case: letting liboqs or oqsprovider to remotely trigger a build and test of the demos using the latest code in their own workflows.

Signed-off-by: Alex Bozarth <[email protected]>
Signed-off-by: Alex Bozarth <[email protected]>
note this workflow does not include tests

Signed-off-by: Alex Bozarth <[email protected]>
note this workflow does not include tests

Signed-off-by: Alex Bozarth <[email protected]>
@ajbozarth
Copy link
Member Author

I just pushed a few more workflows, included in them are locust and wireshark, both of which are just build and have no test step:

  • In the case of wireshark this was how it was already in circleCI since it's a UI demo and doesn't have a way to run without X11. @Hayyaaf if you disagree with this and have an idea on a test command feel free to chime in as the new maintainer, and I can add it.

  • For locust though I am unsure if we want to include a test step, I read through the README and USAGE files and all the "check your work" instructions seem to rely on a UI. @davidgca as the maintainer of locust is there a command or set of commands you recommend using for the test step? Feel free to look at the other workflows I added in this PR for examples of how we test other demos. Including CI tests is left to your discretion as the maintainer.

Signed-off-by: Alex Bozarth <[email protected]>
Signed-off-by: Alex Bozarth <[email protected]>
Signed-off-by: Alex Bozarth <[email protected]>
Signed-off-by: Alex Bozarth <[email protected]>
Signed-off-by: Alex Bozarth <[email protected]>
@ajbozarth
Copy link
Member Author

I'm moving this out of draft since it is now in a "complete" state, that is it is fully functional for building and testing all our current demos. I still have quite a bit of work left to do as detailed in the following list, but this could be merge as is and further work done in a follow up PR if desired.

To Do's:

  • Add a step (or job) to each workflow that will publish the docker images to docker hub and ghcr.io
    • I have a few ideas on how to handle this and am still brainstorming the best way to add "push" to the CI, ideas I'm thinking about include:
      • should images be pushed automatically on every commit to main, this would occur only if the specific demo is edited. (Note this step would fail on main on forks due to perms, or if the user has configured their perms would cause an attempted push -> this could be addressed in the workflow, but I would need to find out how to distinguish a fork)
      • should we do "releases" using the Run All workflow? If so should the release tag be set as a workflow input? similarly should we enable releases for specific demos? (this would come automatically if we do so via the run all workflow)
      • should we ever release the images based on liboqs/oqsprovider main? if not should we specifically prevent this in the release workflow step?
  • Clean up current CI, I need to take a look at the current CI files and decide what to do with them. Some current thoughts on each workflow:
    • linux.yml this was a great launching off point by Michael, but I believe once I'm done we will just delete this file
    • quic.yml I'm torn on how to handle this workflow, it is really well written, but it handles the build/test/push for the two quic demos differently than how I handled the rest. No matter how it's handled I will need to update how it deals with push credentials.
    • docker-scan.yml This was a great add and I don't believe it will need to be touched, but we may want to consider updating it to run against PRs (which would require dealing with credential issues)
  • Follow up work potentially outside this PR:
    • Turn off CircleCI and remove related files
    • Add workflows to liboqs and oqsprovider that trigger the Run All workflow against their main branches. This would enable leveraging our demos as "tests" for those repos
    • Clean up old (and possibly broken) images remaining on DockerHub

My planned next step is to work on the adding push to the workflows, but with US Thanksgiving this week I might not get to it until next week

@ajbozarth ajbozarth marked this pull request as ready for review November 25, 2024 23:32
@ajbozarth ajbozarth added the enhancement New feature or request label Nov 25, 2024
Signed-off-by: Alex Bozarth <[email protected]>
@SWilson4
Copy link
Member

SWilson4 commented Dec 5, 2024

A couple of months back I wrote a workflow to push multiarch images for our CI containers using the ARM runner (so no emulation required). Maybe a similar approach would be helpful here? See this workflow.

@ajbozarth
Copy link
Member Author

Maybe a similar approach would be helpful here? See this workflow.

I knew there was a way to do this with docker, but I hadn't thought of a way to leverage it in GitHub workflows until looking at yours. Having separate jobs like that would be overkill, but I can add a dependent job (ie a job that waits for the matrix builds to finish) that just runs the docker manifest command to connect the separately tagged images. This would still require deciding how we want to separately tag the platform images though.

@baentsch
Copy link
Member

baentsch commented Dec 7, 2024

A couple of months back I wrote a workflow to push multiarch images for our CI containers using the ARM runner (so no emulation required). Maybe a similar approach would be helpful here? See this workflow.

Much better: There's hardly any performance/runtime differential between x64 and arm64. Thanks for that pointer @SWilson4 !

The alternate solution would be to tag each platform image differently, and choose one to also tag latest. This could be done with a few conditionals in the workflow. (we would need to determine the tag to use and how to derive it from the matrix value in the workflow)

That would be sensible.

There's also the option to only partially address #213 and build/test both platforms but only push the amd64 image. This would require the least edits to the workflows.

That would be less sensible: If we (spend the cycles to) build it, we should also push it.

If not addressed, the current workflows would push the arm64 image second (since it takes longer to build using the emulator), overwriting the amd64 image

That also would not be sensible (nor necessarily always be the case if we'd be able to use fast arm64 runners): Given the hint by @SWilson4 above, I'm very much against keeping using an emulator (see open-quantum-safe/tsc#5) to begin with.

The current build-in-parallel-and-push-after-jobs-merge workflow in linux.yml already successfully does what we're discussing here, no? See for example https://github.com/open-quantum-safe/oqs-demos/actions/runs/12034148163

--> Could that approach be used in all jobs (after switching from emulation to real HW)?

@baentsch
Copy link
Member

baentsch commented Dec 7, 2024

PS: Works like a charm: See https://github.com/open-quantum-safe/oqs-demos/actions/runs/12212192366. But does require CI code duplication (edit/add: Different "runs-on" jobs). Question: Can one specify different runners for different parts of a matrix build? Edit/add: Simple reference to "oqs-arm64" doesn't seem to cut it:

platform:
- linux/amd64
# - linux/arm/v6
# - linux/arm/v7
- oqs-arm64
runs-on: ${{ matrix.platform }}
Other proposals welcome.

@ajbozarth
Copy link
Member Author

Based on @baentsch comments I realized I missed the real takeaway from @SWilson4 example: the runners. After a bit of research I found why I had not previously considered that solution: GitHub does not currently provide linux/arm64 runners.

After some more research and testing I was able to update the workflows to leverage runners instead of QEMU. This was only possible due to the existence of the oqs-arm64 self-hosted runner.

I also update the push to tag the images latest-x86_64 or latest-arm64 and then created a followup job to create and push a manifest connecting the two images to latest (or whatever tag was inputed by a manual call). In this case I was actually able to split out the job into a separate workflow, something that I was unable to do with earlier steps due to minor differences between demos.

Implementation Note:

For the new push job I wanted the if check at the top level so it would skip the job when not pushing rather than "running" the job and not doing anything. The downside is that at the top level the env is not accessible, so I had to duplicate the value of env.push

Potential security risks:

While researching runners I ran into the issue that forks do not have access to self-hosted runners, meaning that when the workflows run on a forks main branch they will never start the oqs-arm64 runner job and should timeout after 6 hours. It will still run on a PR though, but the docs warn of security issues depending on how the self-hosted runner is set up. Since I have no access to the runner (I only know it exists because other workflows reference it) I can't tell if this is a concern.
It is worth noting that GitHub plans to make arm64 runners available in Q1, which would replace the self-hosted runner and any potential security issues it has.

Remaining ToDo

  • Add documentation

Remaining Open Discussion:

@ajbozarth
Copy link
Member Author

Small follow up to the above comment:

  • Looks like the openvpn test script is a bit finicky and sometime fails, this should be address in a follow up PR since it would need to be fixed in the script not the workflow.
  • This PR nearly fixes Automate and streamline docker image generation #284 by adding automatic releases when updates are pushed to main and enabling triggering of version releases, but it does not add automated releases of versioned or latest images upon releases of upstream projects (liboqs/oqsprovider). I would suggest that be handled upstream, or manually triggered. I can add Automate and streamline docker image generation #284 to the "Fixes" list if everyone thinks this dresses it enough.

@Hawazyn
Copy link
Contributor

Hawazyn commented Dec 9, 2024

Hi @ajbozarth,

Thank you for your patience, and I apologize for the delay. I've just finished the implementation, and you can find the docker-lint.yml file here. Please review it and let me know your feedback or suggestions.

@ajbozarth
Copy link
Member Author

I've just finished the implementation, and you can find the docker-lint.yml file here. Please review it and let me know your feedback or suggestions.

@Hawazyn I would suggest opening a PR for review

@baentsch
Copy link
Member

I would suggest that be handled upstream [...] add #284 to the "Fixes" list if everyone thinks this dresses it enough.

I'd be OK with the latter if you'd be willing to add code run at/to the upstreams to facilitate that, either by way of adding snippets to the documentation in the upstream release Wikis or by creating CI code triggered by upstream releases: OK for you @ajbozarth (may be part of the documentation ToDo?)

Remaining Open Discussion:

Is the definition of env.push correct? Comment detailing the issue: 

#321 (comment)

I'd think this would also be resolved if the corresponding trigger is done in the upstream CI at release time, no?

Signed-off-by: Alex Bozarth <[email protected]>
@ajbozarth
Copy link
Member Author

I'd be OK with the latter if you'd be willing to add code run at/to the upstreams to facilitate that

Once this PR is merged to main I can open PRs on liboqs and oqsprovider adding a small workflow to tigger the build.yml workflow.

Based on this I will need to update env.push to also not run when input.build_main is true (which it will be when called from liboqs and oqsprovider.

@ajbozarth
Copy link
Member Author

Once this PR is merged to main I can open PRs on liboqs and oqsprovider adding a small workflow to tigger the build.yml workflow.

I opened open-quantum-safe/oqs-provider#587 as a draft, but for some reason the new workflow didn't run, so I'll have to wait to debug it until this is merged. Once I get that working I will add the same workflow to liboqs

Copy link
Member

@baentsch baentsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for agreeing to add release triggers, @ajbozarth . Also very good this is now using the native arm64 runner. The only thing open then from my side is documentation: What's triggered when and how and for which demos? What's a blueprint ("bare minimum CI") for adding new demos? Where and when are (which versions of) the images pushed to?

@ajbozarth
Copy link
Member Author

The only thing open then from my side is documentation: What's triggered when and how and for which demos? What's a blueprint ("bare minimum CI") for adding new demos? Where and when are (which versions of) the images pushed to?

Where do you suggest I add this? Should it go in a new section on the README (if so where in the README) or should I create a new md file?

@ajbozarth
Copy link
Member Author

Where do you suggest I add this? Should it go in a new section on the README (if so where in the README) or should I create a new md file?

@baentsch further followup on my question: I noticed that our contributor docs are in the GitHub wiki and not a markdown file. Is there a motivation for this? The wiki can only be edited with write access and can't take PRs. I think that doc would be the best place to add the CI docs, but I can only contribute it if it's in a markdown file instead of the wiki. Would it be appropriate to move that content into a new CONTRIBUTING.md file?

Signed-off-by: Alex Bozarth <[email protected]>
@ajbozarth
Copy link
Member Author

@baentsch So I decided to go through my suggestion above about the CONTRIBUTING.md and move the wiki content to the new file and added a CI section to the bottom. This means we could get rid of the wiki page if you're ok with this change.

I also updated the README in a few places and added workflow badges to the table for each demo (I left the openssl3 one at the top since it's not in the table?).

Lastly I added a workflow template file for future use. Though I put it in the templates dir I did not add the metadata file to propagate it to the GitHub Action UI. It's just there as a starting point for new demos.

@ajbozarth
Copy link
Member Author

Update on the liboqs/oqs-provider upstream CI integration

This is much more complicated than I expected. In short there is no way to do what I wanted to, that is to add a workflow to liboqs/oqs-provider that runs the build.yml here and then shows success/failure in the CI for those repos. The method I intended to use try to run the workflow against the repo that runs it, meaning it will run the workflows against liboqs repo contents instead of oqs-demos repo.

The proper way of triggering a workflow from another repo is to make a repository_dispatch call. This is done by making a curl POST call from the liboqs workflow that will then trigger a workflow here, but won't track its results or show any helpful feedback in liboqs's CI. In addition repository_dispatch works completely different than workflow_call or workflow_dispatch and has incompatible inputs, meaning I would need to make a lot of edits to build.yml. Also repository_dispatch would require setting up a Personal Access Token for a maintainer that has write access to both repos to use in the workflow.

In Summary

Based on my research and understanding of what we want from this feature, it would actually be less work and hassle to either just run it manually after updates are made to liboqs/oqs-provider, or set up a cron job to run it (this would still require the edits to build.yml) once a week or something. @baentsch ?

@ajbozarth
Copy link
Member Author

ajbozarth commented Dec 10, 2024

Per my above comment I have pushed an update adding a cron job that will run all the workflows against liboqs and oqs-provider main once a week (Mon 1am). If we would rather use repository_dispatch instead I can revert the last commit and add that instead, but I believe this solution is cleaner and requires less work.

As for displaying the results, IIUC the badges I added to the README will only look at the latest run, so unless a new commit was push in the last week they will show the results of the weekly build against liboqs main. I believe this is an okay behavior (consistent with many other repos). This also means that we can add a badge for build.yml on liboqs and oqs-provider to display that week's results. If this is approved I will open PRs to add those badges after this is merged

Implementation Note: The explanation for the weird !contains() line can be found here: https://stackoverflow.com/a/73495922

@baentsch
Copy link
Member

Where do you suggest I add this? Should it go in a new section on the README (if so where in the README) or should I create a new md file?

@baentsch further followup on my question: I noticed that our contributor docs are in the GitHub wiki and not a markdown file. Is there a motivation for this? The wiki can only be edited with write access and can't take PRs. I think that doc would be the best place to add the CI docs, but I can only contribute it if it's in a markdown file instead of the wiki. Would it be appropriate to move that content into a new CONTRIBUTING.md file?

This is precisely the same issue/suggestion made #329 (comment). The reason for not doing it earlier was the simple lack of interest by more people to contribute. This has now changed with you and @Hawazyn being active, so the switch to a file is goodness and follows the same switch we did in liboqs and oqsprovider (for the same reasons :-). So, Thanks for doing it.

@baentsch
Copy link
Member

Based on my research and understanding of what we want from this feature, it would actually be less work and hassle to either just run it manually after updates are made to liboqs/oqs-provider, or set up a cron job to run it (this would still require the edits to build.yml) once a week or something. @baentsch ?

oqsprovider has a script to run for validating releases. It may be a good thing to add the above-mentioned "trigger-curl" to that (and add such script to liboqs too: @SWilson4 ?). Doing it by "manual-edit-then-cron" (if I understand the approach) sounds not quite right.

@Hawazyn
Copy link
Contributor

Hawazyn commented Dec 11, 2024

@ajbozarth This PR is quite extensive, and I really appreciate the time and effort you’ve put into it—it’s clear how much thought and care has gone into your work. While I’m eager to contribute, I feel a bit overwhelmed. If it’s feasible and doesn’t disrupt progress, could you kindly assign me specific tasks along with brief guidance? This would help me support your effort more effectively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
5 participants