CI/CD to build and sanity check docker container and release executables #231

CMCDragonkai · 2021-08-29T10:46:51Z

The final built executables coming out of vercel/pkg are not tested when uploaded as pre-release executables.

We should run a final test on those, maybe a simplified test that runs just the --version or --help page to ensure that the executable actually runs on the target environment. This can help us know whether the final packaged executables are actually working mostly.

This requires our CI/CD to expand to include all environments:

Linux
Windows - https://about.gitlab.com/blog/2020/01/21/windows-shared-runner-beta/
Mac - @joshuakarp knows the details about this in Secret Vault Schema #4

We have these stages:

  stages:
    - check
    - build
    - release

This might work as an extra stage: qa to mean "quality assurance". And it needs to use raw environments (not based on our Nix image that we use to build things).

Tasks

- Investigate adding an extra stage qa that enables Linux, Windows, and Mac environment runners, that means this stage runs on an entirely different runner compared to check, build and release.
- Download the executables from the release point. This tests whether our release point is up and running. If the release point is down, the test should retry after some time, as this depends on network availability.
- Run a sanity check over the executables such as --help and --version to get the help page and the version page respectively.
- Ensure that this job is interruptible.

Regarding docker:

- Create the docker image with nix-build ./release.nix -A docker
- test that this container works with docker run -it name-version:hash.
- example: docker run -it matrixai_typescript-demo-lib-1.0.0:f3wf1wxvkdbbmrhxx9rhr5qw2skbrah6
- see TypeScript-Demo-Live README.md for more details
- the TMPDIR may be used by PK, if so, use this https://gist.github.com/CMCDragonkai/94e6550302aa7ebc175c06819927605a
- inspect the image contents for correctness by using this https://gist.github.com/CMCDragonkai/d7a652ada27289ab59e28bc914c942f6
- do we need a cacert and tzdata?
  - cacert may be needed in case we want to contact external websites, but this is not a thing for the PK agents hosted on testnet.polykey.io
  - tzdata shouldn't be necessary since we don't operate on any timezones, we assume OS time is UTC time

The text was updated successfully, but these errors were encountered:

CMCDragonkai · 2021-08-29T10:48:00Z

Remember Gitlab provides many CI/CD runners. But I suspect this will require running a separate runner on the last stage. I'm not sure if the configuration allows this. But we must investigate this.

Also consider what additional costs there is, and whether it is free for Open Source, and we have replicated this to the Open Source group.

CMCDragonkai · 2021-08-29T10:56:49Z

Should also do a hash check to ensure that the expected hashes of the released executables are in fact what we expect. That means during the release stage, we should be calculating the hashes from nix-build and then publishing those as well on the release page. Note that the hashes used in the naming currently are nix hashes which are hashes of the input derivation expression, not the build outputs.

CMCDragonkai · 2021-08-29T11:03:01Z

This maybe relevant too https://nixos.org/manual/nix/unstable/command-ref/new-cli/nix3-store-make-content-addressable.html

Checking the hash of the release is just about ensuring integrity but it doesn't ensure authenticity. A signature would ensure authenticity. However combining the hash with an "authentic" trusted source is sufficient to ensure authenticity and integrity. Of course end-to-end authenticity is enhanced with a proper signature applied.

To dog-food Polykey, it would make sense to do a code-sign from the Polykey organisation gestalt rather than GPG since we stopped using GPG.

CMCDragonkai · 2021-11-01T05:34:48Z

The CI/CD check here should actually be done on the container image build. We should also do a check against the release executable builds while we're at it.

Note that the release.nix is what has all the specifications. The current docker attribute bundles up the cacert package. For the testnet and mainnet. I believe the cacert won't be required for NAT functionality as contacting external websites is only needed in the identities domain.

If however we are going to join all of the testnet nodes as part of the Matrix AI gestalt and mainnet could be in a different gestalt, it may in fact be required. We can keep the dependency in just for now.

CMCDragonkai · 2021-11-01T06:08:46Z

Useful gist notes about using docker and nix:

CMCDragonkai · 2021-11-05T01:06:30Z

Pushing this Nov 11 finish from Nov 10 as an additional last sanity check should be done once #269 is done. But this can be started without #269.

CMCDragonkai · 2021-11-05T06:25:58Z

Make sure to check that bundled files JSON and protobuf files are being put into the ./dist directory properly. Except of course the ./package.json.

See: MatrixAI/TypeScript-Demo-Lib#34 (comment) for discussion and fix in TypeScript-Demo-Lib.

If there are future files that need to be copied over, it's also important to consider the npm postbuild hook to copy over those non-transpiled files. These stages of "building" should be documented.

CMCDragonkai · 2021-11-07T03:49:09Z

Checking the build executables can be done in a number of ways automatically.

Application Emulation

The idea here is to directly execute the built artifacts in the same CI/CD container environment which currently is a "nix" container running on Linux. The advantage here is probably less resources being used, and all of this could be tested on our local development machines which are NixOS (or linux with Nix).

Disadvantage is that it's not a full test, we don't know for sure what might happen on the other platforms.

Linux - linux executables will be compiled for non-NixOS, this means we would need to be able to run the executables and simulate a non-NixOS environment: https://unix.stackexchange.com/questions/522822/different-methods-to-run-a-non-nixos-executable-on-nixos
Windows - WINE is the proper of doing this, it can do application emulation and we should be able to run the pk CLI via wine
MacOS - this is the toughest one, but there's a project called https://github.com/darlinghq/darling which aims to do WINE but for mac, we could try this and it might work
iOS/Android - not sure but the CLI isn't intended for these locked down platforms, although android does have an emulation ecosystem

Example: https://jake-shadle.github.io/xwin/#2-specify-runner This guy uses WINE to test a cross-compiled Rust executable.

System Emulation/Virtualisation

This relies on virtualisation, such as using QEMU to run a full system. This is closer to the real deal, but more complex and expensive.

Atm, I don't believe the containers inside Gitlab CI/CD are allowed to execute QEMU so we don't automate it that way. Instead this will rely on the CI/CD system itself to provide these extra environments. Gitlab has MacOSX and Windows guests available to be used, and these can then run tests against our built executables. Note that this means the tests are not running inside the Nix container anymore. Which leads us to our next problem.

What kind of tests?

These tests that run against the final built executable are not like the tests we have currently inside our Jest testing. Jest itself is just a test runner, it uses Node executes JS at the end of the day. The JS itself right now is importing code directly from src and testing this. Even now our tests/bin tests don't actually do a child process execution. They are lighterweight by calling main function directly and mocking the STDOUT/STDERR... etc. And even if they were to actually do a proper process exec, it's again done against the src/ code, and not against the final built executables.

So for testing the final executables, the tests have to shell out out and execute the commands and then testing what is the expected side-effects. This can be done in jest by using child process exec, and all we are doing is using the JS to run child processes.

If we use a library like https://github.com/raingerber/jest-shell-matchers, this allows us to create expectations and matchers that can execute external executables.

However there are 2 challenges with this:

We may need to copy some or alot of the tests in tests/bin to instead test external executables. Some of the tests can work without any change since all we do is swap out the usage of main with executing an external executable.
Using this means all the platforms we're doing tests in will need to be able to run jest and node. That is the test utilities has to work in that environment.

If abandon jest entirely, then we would need a new test harness, and then platform-specificity is an issue. For example if we were to write tests in a shell language like bash, this may work easily on Linux and MacOS, but not as easily in Windows which doesn't have bash. We could create platform specific ways for every platform, Linux and Mac can use bash, while Windows uses powershell, but this just ends up duplicating our tests/bin 2-times.

We know that node and jest can work on Windows, Mac and Linux. But we won't be able to use Nix in most of these extra environments, since these are meant to be non-Nix platforms. So it's just a matter of installing npm and node on those platforms and proceeding from there. I'm sure the CI/CD scripts have some suggested way of installing Node on those platforms. Windows may have chocolatey, and Mac may have homebrew. See the listed methods here: https://nodejs.org/fa/download/package-manager/

Also Gitlab docs may have some suggested ways of doing things too.

Portability of `tests/bin`

So does this mean we have to write some new tests like tests/bin-external? Or could we re-use some of the tests in tests/bin?

Surely we can swap out execution of main with direct execution of the real thing right?

That may be the case, but if we are using jest-mock-process, then that won't work, since the real process is not using the Node process mocks.

However if all of the mocking used by https://github.com/EpicEric/jest-mock-process is done within a wrapper function like pkWithStdio, then it may be possible to swap that out with the real implementation in which case the side-effects are real, and STDOUT, STDERR and exit code can be captured as well. Then a parameter can be passed into tests/bin to run the same tests on the external executable.

I'm sure some tests may not work though, and this requires a careful review.

tegefaulkes · 2021-11-08T07:06:50Z

I've done a little research into gitlabs CI/CD. It might be possible to test the built executables in native shell enviroments using a combination of tags, artefacts and dependencies.

We can us tags to specify a type of runner to run the job in. It seems like we can specify a runner that is a Linux bash shell or a windows Powershell this way.

Using job artefacts and dependencies we can ensure that the required built executable is copied between jobs.

I need to do some more research and testing for this but my impression is that it might be possible.

CMCDragonkai · 2021-11-08T07:28:15Z

It would be best to first try it in the TS-Demo-Lib. You can create new branches and run the CI/CD manually there.

Also beware of Mac:

First thing would be non-NixOS executables and Windows which will the easiest to test.

joshuakarp · 2021-11-09T05:05:06Z

The most recent CI/CD pipeline (after merging #284) has shown that the nix build job is failing, notably with similar errors to the following:

src/agent/GRPCClientAgent.ts:75:10 - error TS2742: The inferred type of 'vaultsGitInfoGet' cannot be named without a reference to '@matrixai/polykey/node_modules/@grpc/grpc-js'. This is likely not portable. A type annotation is necessary.
75   public vaultsGitInfoGet(...args) {
            ~~~~~~~~~~~~~~~~

This also relates to #200, where we hope to provide an explicit argument list on the GRPCClient function signatures (replacing ...args).

joshuakarp · 2021-11-10T07:01:16Z

Changed from starting on 8th Nov to 12th Nov, based on arising complexity from async-init and CLI MR on Gitlab !213.

tegefaulkes · 2021-11-12T02:14:41Z

We can add a report from jest in the CI/CD. Might be usefull.
registry.gitlab.com/matrixai/engineering/maintenance/

tegefaulkes · 2021-11-12T02:49:29Z

Other things we can add to improve the speed of the pipeline is caching the nix/store and node_modules so they don't need to be downloaded for each job.

CMCDragonkai · 2021-11-12T03:20:33Z

The `/nix/store` is actually cached. Just in S3 so it's downloaded from S3 to here. As for node modules, I believe that is actually cached as well, since it is in the `/nix/store`. At least for the commands doing `nix-build`. The `nix-shell` ones is indeed doing `node_modules` stuff. However the CI/CD is a bit complicated right now, best not to change the CI/CD infrastructure just yet until we worked out all the requirements.

…

On 11/12/21 1:49 PM, Brian Botha wrote: Other things we can add to improve the speed of the pipeline is caching the |nix/store| and |node_modules| so they don't need to be downloaded for each job. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#231 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAE4OHMSFTSTZQ37EII2XP3ULR6EJANCNFSM5DADQM5Q>.

tegefaulkes · 2021-11-15T05:53:54Z

I have most of the ci/cd config worked out now.
Currently I have a problem with js-polykey building with pkg.it seems that the fd-lock package is causing the issue with node-gyp-build. with the error.

> [email protected] install /nix/store/m7kvh5qjjgm7yfw9n6cccr7b7bc763wm-node-dependencies-_at_matrixai_slash_polykey-1.0.0/@matrixai/polykey/node_modules/fd-lock
> node-gyp-build

sh: /nix/store/m7kvh5qjjgm7yfw9n6cccr7b7bc763wm-node-dependencies-_at_matrixai_slash_polykey-1.0.0/@matrixai/polykey/node_modules/.bin/node-gyp-build: /usr/bin/env: bad interpreter: No such file or directory

We've seen this problem before and AFAIK it should be fixed. so I don't know why it's happening with fd-lock. I dont remember how it was fixed however.

I've tried including fd-lock inside of the Typescript-demo-lib and added a test for it. there it seems to build and run just fine. I've compared the nix build process between the two repos and I can't find any differences.

Do you have any ideas @CMCDragonkai

CMCDragonkai · 2021-11-15T06:08:34Z

I had a look through and couldn't find any issues where we ever talked about this. But it does seem familiar. Is it workable inside a nix-shell?

tegefaulkes · 2021-11-15T06:24:21Z

Running pkg inside the shell seems to work. trying to do a nix-build of just the application gives the same error.
This seems to suggest that the problem is not with the pkg but when installing the modules during the build process.

I think this has something to do with the installation process trying to build the prebuilds instead of downloading them.

tegefaulkes · 2021-11-15T07:16:29Z

In fact wasn't this fixed by preventing node-gyp-build from building anything in the first place?
I think since node2nix is providing everything then there is no need to actually build the modules?
Shouldn't dontNpmInstall = true be fixinging this since it's preventing the install/build with NPM?

CMCDragonkai · 2021-11-15T07:21:34Z

It is definitely possible for node-gyp-build to really build the C++ code. I reckon that's leveldown and such is done right? Does this nix-build inside TS-Demo-Lib?

tegefaulkes · 2021-11-16T04:02:48Z

If I fix it to 4.2.3 it works. 4.3.0 doesn't work.

tegefaulkes · 2021-11-16T06:22:07Z

But for now it seems nix-build runs fine in the CI/CD environments so I'm looking at what progress I can make there.

tegefaulkes · 2021-11-16T06:42:27Z

GRPCCientClient and GRPCClientAgent may need explicit return types for streams. Currently the compiler is complaining about this with the error

src/client/GRPCClientClient.ts:272:10 - error TS2742: The inferred type of 'vaultsLog' cannot be named without a reference to '@matrixai/polykey/node_modules/@grpc/grpc-js'. This is likely not portable. A type annotation is necessary.
272   public vaultsLog(...args) {
             ~~~~~~~~~
src/client/GRPCClientClient.ts:352:10 - error TS2742: The inferred type of 'keysCertsChainGet' cannot be named without a reference to '@matrixai/polykey/node_modules/@grpc/grpc-js'. This is likely not portable. A type annotation is necessary.

tegefaulkes · 2021-11-16T07:28:32Z

js-polykey is now building in ci/cd

joshuakarp · 2021-11-16T08:32:00Z

That's awesome news

tegefaulkes · 2021-11-16T08:56:12Z

It is building and copying over the artifacts but the tests are failing now. It's just missing some of the required files so that's an easy fix. Right now its relating to this issue MatrixAI/TypeScript-Demo-Lib#34 where the .json files are missing.

CMCDragonkai · 2021-11-16T10:41:15Z

Nice!!!

CMCDragonkai · 2021-11-16T10:41:26Z

What was the problem?

tegefaulkes · 2021-11-17T00:42:24Z

The problem just seems to be when running nix-build locally. it just seems to work in the runners. I don't know why yet.

CMCDragonkai · 2021-11-17T01:35:11Z

Try cleaning your Nix cache and see what happens. Details here: https://nixos.org/manual/nix/unstable/package-management/garbage-collection.html

Note it will have to reinstall when you do nix-shell.

You should do it when you're not in any nix-shells.

tegefaulkes · 2021-11-17T01:47:32Z

Good news, the js-polykey tests are succeeding now. Only the docker and macos tests needs to be fixed now.

tegefaulkes · 2021-11-17T02:01:24Z

Try cleaning your Nix cache and see what happens. Details here: https://nixos.org/manual/nix/unstable/package-management/garbage-collection.html

Note it will have to reinstall when you do nix-shell.

You should do it when you're not in any nix-shells.

Nope, that doesn't fix it.

tegefaulkes · 2021-11-17T03:16:38Z

Docker test is working now.

tegefaulkes · 2021-11-17T03:47:30Z

Macos test is working now. just need to work out a small fix for it.

tegefaulkes · 2021-11-17T04:09:11Z

All of the CI/CD tests are passing now!
The tests are just running Polykey without any arguments and expecting it to run with an exit code of 0. as it stands now we're just testing that commander runs. We want to do at minimum a test that starts a polykey agent and stops it.

tegefaulkes · 2021-11-17T08:29:56Z

I added a flag to pk agent start to do a test start and stop of polykey. I'm using this in the tests now.

tegefaulkes · 2021-11-18T03:51:10Z

Ok, right now the ground work has been laid out an we have basic tests running for the following enviroments.

running with npm run polykey in node
running the application build in node
running the docker image in docker.
running the packaged linux build in a linux environment .
running the packaged windows build in a windows environment.
running the packaged macos build in a mac environment.

The tests so far just run a quick test I made in agent start that starts and stops polykey with full logging. if it exits with a code 0 we assume the test succeeded.

The docker test is just running the docker image with no flags for now so it's just printing out the help page.

I also expanded the test jobs out into multiple runners, so testing should complete quicker now.

tegefaulkes · 2021-11-18T04:00:59Z

Note: It's possible to take the output from the build stage and use it in the publish stage. this should be addressed at some point.

tegefaulkes · 2021-11-18T05:30:53Z

Also note, Nix-build fails when building locally but runs fine in the CI/CD. so far as I can tell it's failing due to the version of node-gyp-build. when node-gyp-build is 4.2.3 nix build patches the shebang and it runs fine. but if it's version 4.3.0 then nix doesn't patch the shebang and the build fails. As to why this is happening I am not sure and I'm running out if ideas to test.

There are comments above where I am troubleshooting it.

joshuakarp · 2021-11-29T02:24:40Z

Adjusting end date to Wednesday December 1 (from November 19 - delayed from refactoring work in #283), to give time for review and merge.

CMCDragonkai · 2021-12-13T00:30:11Z

Should incorporate the cached compiler path: #296 (comment)

CMCDragonkai · 2021-12-14T06:41:30Z

The networking tests has been changed since async start was integrated. I'm removing these changes so that I can see what the original code was meant to do and fix these problems. This is why it's not a good idea to use be using beforeEach and afterEach for things that is the unit under test. @tegefaulkes.

CMCDragonkai · 2021-12-14T06:42:13Z

@tegefaulkes all other tests which has this behaviour needs to be reverted back to the way it was before with the unit under test actually being in each test function.

CMCDragonkai · 2021-12-14T06:51:20Z

I remember there was incorrect usage of getters in ForwardProxy as well. Getters should be used only for mostly static properties that are readable but mutable by the inside. Otherwise getProxyHost() is the correct way of doing this to indicate that this involved side-effects.

CMCDragonkai · 2022-01-09T07:31:50Z

This is now all working. Successful build and QA sanity check here: https://gitlab.com/MatrixAI/open-source/js-polykey/-/pipelines/444133744

Note that due to staging, and expected test failures from nodes and related functionality, right now master is expected to have a failed pipeline. However in qa-testing when skipping testing, all of this worked.

Furthermore, QA sanity check is very basic. All it is checking is whether the program itself can run. By running the help command. That's it. This is a good first step.

Subsequently we can reuse our test-suite that we are using in tests/bin to also apply it in the other platforms. This will require some abstractions to build later. But for now this is done. In production we'll just do manual testing in the other platforms later.

CMCDragonkai · 2022-01-09T07:32:32Z

In the process of working on this, we discovered issues regarding licenses and --public vs --public-packages option used by pkg. Worth understanding for future: #292 (comment)

CMCDragonkai added the procedure Action that must be executed label Aug 29, 2021

CMCDragonkai assigned tegefaulkes Aug 30, 2021

CMCDragonkai added the epic Big issue with multiple subissues label Nov 1, 2021

CMCDragonkai mentioned this issue Nov 1, 2021

Testnet Node Deployment (testnet.polykey.io) #194

Closed

4 tasks

CMCDragonkai changed the title ~~CI/CD to sanity check on final built executables in all environments~~ CI/CD to sanity check on docker container and release executables Nov 1, 2021

CMCDragonkai changed the title ~~CI/CD to sanity check on docker container and release executables~~ CI/CD to build and sanity check docker container and release executables Nov 1, 2021

CMCDragonkai mentioned this issue Nov 5, 2021

Pkg cannot resolve JSON modules MatrixAI/TypeScript-Demo-Lib#34

Closed

This was referenced Nov 10, 2021

Testnet securely maintain a pool of recovery codes #285

Closed

Update testnet.polykey.io to point to the list of IPs running seed keynodes #177

Closed

NAT-Traversal Testing with testnet.polykey.io #159

Closed

tegefaulkes mentioned this issue Nov 18, 2021

Adding QA jobs and expanding test jobs for CI/CD. #292

Merged

14 tasks

CMCDragonkai mentioned this issue Dec 12, 2021

Session Bin Test Fixes #296

Merged

12 tasks

CMCDragonkai mentioned this issue Dec 20, 2021

Supporting trusted seed nodes (nodes domain refactoring, static sources for seed nodes) #289

Merged

26 tasks

CMCDragonkai closed this as completed in #292 Jan 9, 2022

CI/CD to build and sanity check docker container and release executables #231

CI/CD to build and sanity check docker container and release executables #231

Comments

CMCDragonkai commented Aug 29, 2021 • edited Loading

Tasks

CMCDragonkai commented Aug 29, 2021

CMCDragonkai commented Aug 29, 2021 • edited Loading

CMCDragonkai commented Aug 29, 2021

CMCDragonkai commented Nov 1, 2021

CMCDragonkai commented Nov 1, 2021

CMCDragonkai commented Nov 5, 2021

CMCDragonkai commented Nov 5, 2021 • edited Loading

CMCDragonkai commented Nov 7, 2021

Application Emulation

System Emulation/Virtualisation

What kind of tests?

Portability of tests/bin

tegefaulkes commented Nov 8, 2021

CMCDragonkai commented Nov 8, 2021

joshuakarp commented Nov 9, 2021

joshuakarp commented Nov 10, 2021

tegefaulkes commented Nov 12, 2021

tegefaulkes commented Nov 12, 2021

CMCDragonkai commented Nov 12, 2021 via email

tegefaulkes commented Nov 15, 2021

CMCDragonkai commented Nov 15, 2021

tegefaulkes commented Nov 15, 2021

tegefaulkes commented Nov 15, 2021

CMCDragonkai commented Nov 15, 2021

tegefaulkes commented Nov 16, 2021

tegefaulkes commented Nov 16, 2021

tegefaulkes commented Nov 16, 2021

tegefaulkes commented Nov 16, 2021

joshuakarp commented Nov 16, 2021

tegefaulkes commented Nov 16, 2021

CMCDragonkai commented Nov 16, 2021

CMCDragonkai commented Nov 16, 2021

tegefaulkes commented Nov 17, 2021

CMCDragonkai commented Nov 17, 2021

tegefaulkes commented Nov 17, 2021

tegefaulkes commented Nov 17, 2021

tegefaulkes commented Nov 17, 2021

tegefaulkes commented Nov 17, 2021

tegefaulkes commented Nov 17, 2021

tegefaulkes commented Nov 17, 2021

tegefaulkes commented Nov 18, 2021

tegefaulkes commented Nov 18, 2021

tegefaulkes commented Nov 18, 2021 • edited Loading

joshuakarp commented Nov 29, 2021 • edited Loading

CMCDragonkai commented Dec 13, 2021

CMCDragonkai commented Dec 14, 2021

CMCDragonkai commented Dec 14, 2021

CMCDragonkai commented Dec 14, 2021

CMCDragonkai commented Jan 9, 2022

CMCDragonkai commented Jan 9, 2022

CMCDragonkai commented Aug 29, 2021 •

edited

Loading

CMCDragonkai commented Aug 29, 2021 •

edited

Loading

CMCDragonkai commented Nov 5, 2021 •

edited

Loading

Portability of `tests/bin`

tegefaulkes commented Nov 18, 2021 •

edited

Loading

joshuakarp commented Nov 29, 2021 •

edited

Loading