[WIP] Adding Cloud Pipelines #90

vsoch · 2019-11-05T13:15:04Z

do not merge this is a draft pull request!

This PR will encompass early work to add Google Cloud Pipelines to Snakemake. Specifically this means that:

the user selects --google-life-sciences provider (boolean) when running snakemake
GOOGLE_APPLICATION_CREDENTIALS and GOOGLE_PROJECT should be exported
a content hash named tarball of the working directory is uploaded to Google Storage
the worker uses a custom image to download and extract the working directory, and then run the script for the step in question.
the storage tarballs are deleted after the workflow completes, only if the user hasn't specified a flag to keep the cache.

We have several technical questions that still need answering, which I'll summarize here:

upload of environment variables vs. secrets
another method to discover application credentials and project (without exporting the envar)
how to provide an entrypoint + command with && (we've tried almost all variations of this and the second command seems to run outside of the container, or the entire thing is parsed together and the result is command not found.
privileged containers OR custom addition of permissions (to allow running Singularity)
difference between action and resource environment
it would be cool if Google provided a function to choose a machine type based on these criteria, seems like it would be a common thing to need!

A few notes for some choices I made for the implementation:

As discussed in our meeting, I was able to use the Google Discovery Build function against the manifest endpoint to create a client.
regions is another variable that could be derived from some call to the API (to get a default region) but most likely the requester for storage objects would want to choose a region that is closest to the objects, which might not be where he or she is. The default regions are all zones in the US, which I chose based on guessing that the majority of Snakemake / GCP users are in the US. This might be entirely wrong, or a bad default.
the machine type is selected based on comparing the requested memory and cores to those available, and first eliminating the ones that are too small, and then choosing the smallest of those remaining. If the user provides a machine type prefix, we further filter down the choices to that.
Google Storage (for public objects) often has a 503 error (yesterday it popped up several times for me in testing) and if this happens again it might be a separate issue that the GS remote should look into handling - for most Google Cloud API stuffs retrying is essential.
speaking of retrying - the functions here are run by a client function I wrote to retry three times, and only then error. This is because BrokenPipeError (and other random failures) are common enough to warrant it.

I suspect this will fail linting and I'll push again with fixes :)

Signed-off-by: Vanessa Sochat <[email protected]>

…eractive code bits to figure out where the entrypoint for an executor run is Signed-off-by: Vanessa Sochat <[email protected]>

Signed-off-by: Vanessa Sochat <[email protected]>

…ke into add/google-cloud-pipelines

…alk through and test for basic example Signed-off-by: Vanessa Sochat <[email protected]>

…dir to storage cache, and then the instance can retrieve it. Signed-off-by: Vanessa Sochat <[email protected]>

…ribute Signed-off-by: Vanessa Sochat <[email protected]>

…rom pip and install from this branch Signed-off-by: Vanessa Sochat <[email protected]>

Signed-off-by: Vanessa Sochat <[email protected]>

…achine type prefix, ready for draft PR Signed-off-by: Vanessa Sochat <[email protected]>

Signed-off-by: Vanessa Sochat <[email protected]>

vsoch · 2019-11-05T13:24:50Z

@johanneskoester this is a little confusing with black, because I installed the exact version that is run in the test suites, and there seems to be a difference in result. Here is what I'm doing locally:

$ black --check snakemake tests/*.py
All done! ✨ 🍰 ✨
51 files would be left unchanged.
$ black --version
black, version 19.3b0

Note that the version is the same installed there. I also tried installing with conda in case it made a difference (it didn't). What am I missing?

…oogle cloud project on command line (derived from storage client or optional environment to override that) Signed-off-by: Vanessa Sochat <[email protected]>

vsoch · 2019-11-05T14:43:40Z

That's so weird! The black formatting passed this time. I don't think I made any changes to that file. Must be ghosties.

johanneskoester · 2019-11-05T14:58:33Z

@vsoch did you select a draft PR? I still have a merge button here and the icon is green instead of grey in the list.

vsoch · 2019-11-05T15:41:37Z

oops I just totally forgot :/ Merging is still blocked so is it okay? I'll add WIP to the heading so others know.

johanneskoester · 2019-11-05T15:50:16Z

Yeah, no problem ;-).

johanneskoester · 2019-11-05T15:50:41Z

xref #91

…ences Signed-off-by: Vanessa Sochat <[email protected]>

…uggestions for Signed-off-by: vsoch <[email protected]>

vsoch · 2020-04-30T18:15:08Z

@johanneskoester I'll want to ask for your suggestions for how to reduce "cognitive complexity" of functions - I made them modular at a level I thought made sense, and most of the length for the ones flagged is due to really long WorkflowError messages. I'm happy to refactor in any way you think is better.

Co-authored-by: Johannes Köster <[email protected]>

…ge to be uploaded once Signed-off-by: vsoch <[email protected]>

vsoch · 2020-05-08T19:54:45Z

Not sure what is going on with the build container image step, but it's been going for over 2 hours.

johanneskoester · 2020-05-08T21:55:22Z

That issue is resolved now.

johanneskoester · 2020-05-08T21:59:13Z

@johanneskoester I'll want to ask for your suggestions for how to reduce "cognitive complexity" of functions - I made them modular at a level I thought made sense, and most of the length for the ones flagged is due to really long WorkflowError messages. I'm happy to refactor in any way you think is better.

I have looked at them again, and I think they are indeed ok to remain like this. Sorry for the noise.

johanneskoester · 2020-05-08T22:03:52Z

GCloud credentials are now in place. I think you should be able to activate your test case here. In line with the kubernetes and tibanna tests, please move your test case in a separate file tests/test_google_life_sciences.py which shall be called in a separate job step.

vsoch · 2020-05-08T22:20:38Z

@johanneskoester what is the name of the envar secret - did you just call it GOOGLE_APPLICATION_CREDENTIALS? I'm adding the step in main.yml now, should be able to push to test as soon as I know that name!

vsoch · 2020-05-08T22:22:03Z

E.g., this is what I'd guess, let me know if I'm off

    env:
      AWS_AVAILABLE: ${{ secrets.AWS_ACCESS_KEY_ID }}
      GCP_AVAILABLE: ${{ secrets.GCP_SA_KEY }}
      GOOGLE_APPLICATION_CREDENTIALS: ${{ secrets.GOOGLE_APPLICATION_CREDENTIALS }}

…rrect envar name Signed-off-by: vsoch <[email protected]>

Signed-off-by: vsoch <[email protected]>

vsoch · 2020-05-08T22:37:03Z

Dearest sonarcloud - you've had a change of heart! Because before you said my code was smelly. Maybe it's good smelly now :)

.github/workflows/main.yml

Reported to already be set without needing specified here. Co-authored-by: Johannes Köster <[email protected]>

Co-authored-by: Johannes Köster <[email protected]>

sonarcloud · 2020-05-10T20:59:34Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities (and 2 Security Hotspots to review)
0 Code Smells

No Coverage information
0.1% Duplication

vsoch · 2020-05-11T01:18:41Z

@johanneskoester the test did not run still - I can't see settings/secrets so I'll need your guidance about what needs to be checked.

johanneskoester · 2020-05-11T08:20:03Z

You are on a fork. github does not expose secrets to forks.

johanneskoester · 2020-05-11T08:21:07Z

Let us go on in PR #384

* starting figuring out how to not remove wrappers Signed-off-by: Vanessa Sochat <[email protected]> * early work to add gcloud executor Signed-off-by: Vanessa Sochat <[email protected]> * getting basic bucket creation and test upload working, and adding interactive code bits to figure out where the entrypoint for an executor run is Signed-off-by: Vanessa Sochat <[email protected]> * updating to show debugging Signed-off-by: Vanessa Sochat <[email protected]> * moving change of working directory to after parsing config files Signed-off-by: Vanessa Sochat <[email protected]> * removing old cleanup wrappers logic Signed-off-by: Vanessa Sochat <[email protected]> * removing old cleanup wrappers logic Signed-off-by: Vanessa Sochat <[email protected]> * work on base skeleton - pipeline requests are created and I need to walk through and test for basic example Signed-off-by: Vanessa Sochat <[email protected]> * lots of work on exector for Google Life Sciences - we upload the workdir to storage cache, and then the instance can retrieve it. Signed-off-by: Vanessa Sochat <[email protected]> * release 5.7.4 has a bug looking for a workflow to have a workflow attribute Signed-off-by: Vanessa Sochat <[email protected]> * more bug fixes to google science pipeline, also want to use version from pip and install from this branch Signed-off-by: Vanessa Sochat <[email protected]> * adding docker base example to google cloud tests Signed-off-by: Vanessa Sochat <[email protected]> * omg successful run! ✨ Signed-off-by: Vanessa Sochat <[email protected]> * restoring to original (working) execution with custom image, adding machine type prefix, ready for draft PR Signed-off-by: Vanessa Sochat <[email protected]> * fixing formatting errors with black Signed-off-by: Vanessa Sochat <[email protected]> * using same version of black as the testing Signed-off-by: Vanessa Sochat <[email protected]> * testing again after reformatting, removing need for user to specify google cloud project on command line (derived from storage client or optional environment to override that) Signed-off-by: Vanessa Sochat <[email protected]> * adding warning about environment not being secret for google life sciences Signed-off-by: Vanessa Sochat <[email protected]> * updating Google Life Sciences executor to programatically select machineType and add verbose comments about envars not being secrets and selection of machinetypes Signed-off-by: Vanessa Sochat <[email protected]> * Update snakemake/executors.py Co-Authored-By: Johannes Köster <[email protected]> * changing default arguments for regions to be included in arg parser Signed-off-by: Vanessa Sochat <[email protected]> * adding hashlib function to snakemake/common.py, tweaks to default arguments and function names Signed-off-by: Vanessa Sochat <[email protected]> * renaming google_life_sciences to google_lifesciences and using job resources disk_mb with padding of 10gb Signed-off-by: Vanessa Sochat <[email protected]> * custom entrypoint / cmd is working for snakemake base! Signed-off-by: Vanessa Sochat <[email protected]> * imports went away? Signed-off-by: Vanessa Sochat <[email protected]> * updating common and executors to fix some sonarcube linting issues Signed-off-by: Vanessa Sochat <[email protected]> * trivial use of exec_job to fix SonarCloud "bug" Signed-off-by: Vanessa Sochat <[email protected]> * updating machine type selection to take first in list before filtering to smallest Signed-off-by: Vanessa Sochat <[email protected]> * why was stylesheet changed? Signed-off-by: Vanessa Sochat <[email protected]> * why was stylesheet changed also in utils? Signed-off-by: Vanessa Sochat <[email protected]> * updating life science executor to only include source files and working directory .snakemake scripts, also accidentally black formatted python scripts in tests, should not be an issue Signed-off-by: Vanessa Sochat <[email protected]> * not sure why WorkflowError isnt defined for a file I didnt edit, how did it pass other CI? Signed-off-by: Vanessa Sochat <[email protected]> * fixing bug with merge with master - an extra few lines were kept with a function (and should not have been) Signed-off-by: Vanessa Sochat <[email protected]> * refactoring to not require additional scripts (passing exec_job directly to entrypoint) and then uploading only one archive Signed-off-by: Vanessa Sochat <[email protected]> * e2 prefix machines dont seem to work for google life sciences api Signed-off-by: Vanessa Sochat <[email protected]> * gcp life sciences cannot currently support m1 or e2 instance types, adding m1 to be filtered out for machine selection Signed-off-by: Vanessa Sochat <[email protected]> * import of time should then use time.sleep Signed-off-by: Vanessa Sochat <[email protected]> * bug that archive files silently not being added, and adding more robust function to derive relative snakefile path Signed-off-by: Vanessa Sochat <[email protected]> * bump current container image to v5.10.0 and add debug statements to show Signed-off-by: Vanessa Sochat <[email protected]> * updating Google Life Sciences to use get_container_image for latest container version Signed-off-by: vsoch <[email protected]> * adding default-resources setting for google-lifesciences and better message when resources exceeded. Signed-off-by: vsoch <[email protected]> * be more specific about resource limits for LHS Signed-off-by: vsoch <[email protected]> * locations api now needs to be used since there is more than one location, * no longer works Signed-off-by: vsoch <[email protected]> * adding parameter for location, and default to region or prefix of region Signed-off-by: vsoch <[email protected]> * google import should not be at top of file, wont work locally Signed-off-by: vsoch <[email protected]> * import google for wrong function Signed-off-by: vsoch <[email protected]> * must use lowercase and not camel case for variables Signed-off-by: vsoch <[email protected]> * if resources are requested, update default resources Signed-off-by: vsoch <[email protected]> * updating event to use debug instead of info Signed-off-by: vsoch <[email protected]> * updating stderr event to use debug instead of error Signed-off-by: vsoch <[email protected]> * missing adding argument to --skip-script-cleanup to base job, need need to test Signed-off-by: vsoch <[email protected]> * adding cleaner implementataion for --skip-script-cleanup to be shared amongst executors Signed-off-by: vsoch <[email protected]> * google_lifesciences needs to be treated as a non local exec Signed-off-by: vsoch <[email protected]> * adding install of crc32c library for remote Signed-off-by: vsoch <[email protected]> * adding support for google-lifesciences gpu To add support for GPU, the user is limited to n1- general machine types, and we do this by way of setting the self._machine_type_prefix to be n1 (if it does not already start with n1). We can then choose an accelerator type that is greater than or equal to what the user has requested, and then choose from that set. With the current implementation the P100 type will be the first chosen, and this makes sense as (I think) it is the cheapest GPU option, something likely wanted by researchers running these at scale. In the future (when someone asks for it) we can add some flag that will allow the user to further specify a GPU accelerator type, but for now this seems reasonable. Once the accelerator is added to the VM, we can confirm that it creates an instance with GPU support Signed-off-by: vsoch <[email protected]> * snakefile in tests/common.py should switch between tempdir and original currently, some tests use the snakefile in the temporary working directory, and others user the original. There is a flag, no_tmpdir, that can distinguish this for the working directory specification, and the same needs to be done for the snakefile, as different tests have different expectations for its location. Signed-off-by: vsoch <[email protected]> * fixing test to not expect file to be in same directory Signed-off-by: vsoch <[email protected]> * Google Life Sciences executor must use relative paths for build package Currently, if the user puts a path for a snakefile that is not relative to the working directory where the build package is generated, the replace function (that replaces the workdir in each file path) will not work and the build package will be generated with a hard coded path on the user system. Since our options are to try and muck around with moving files, or hackily changing paths for the included files (not ideal) a much cleaner solution is to do similar to what a docker build does, and require context to be at or for some subfolder level of the working directory. Most runs that Ive seen will be for a Snakefile in a GitHub repo, meaning the working directory includes the snakefile, so I think this will be a rare use case. I still want to catch it, in case it does occur! Signed-off-by: vsoch <[email protected]> * remove unneeded line Signed-off-by: vsoch <[email protected]> * adding machine and gpu_model job resources this will add two optional job resources to the Google Life Sciences executor, including a machine (to mirror --machine-type-prefix) and a gpu_model (for the user to select from the cloud or vendor specific namespace of gpu models, e.g., nvidia-tesla-p4. If the user specifies a gpu_model without gpu or nvidia_gpu, the count will default to 1. The --machine-type-prefix command line arg also overrides the job resource, since this could be desired. The docs are also updated with these changes. Signed-off-by: vsoch <[email protected]> * typo model -> gpu_model Signed-off-by: vsoch <[email protected]> * string resources should be in quotes Per #329, a string defined resource like machine or gpu_model should include quotes around the name Signed-off-by: vsoch <[email protected]> * gcloud beta life sciences client change in behavior we used to be able to do gcloud beta lifesciences describe <id> but now a full project string is required for this to work. Signed-off-by: vsoch <[email protected]> * updating GoogleLifeSciences Executor to look for machine_type instead of machine definition in the job resources Signed-off-by: vsoch <[email protected]> * initial changes after review from Johannes Signed-off-by: vsoch <[email protected]> * fixing reasonable code smells, the cognitive complexity ones I need suggestions for Signed-off-by: vsoch <[email protected]> * Adding persistence.aux_path to be used for workflow tar.gz Co-authored-by: Johannes Köster <[email protected]> * adding tests for google life sciences, taking shot that I know the correct envar name Signed-off-by: vsoch <[email protected]> * Removing secrets.GOOGLE_APPLICATION_CREDENTIALS Reported to already be set without needing specified here. Co-authored-by: Johannes Köster <[email protected]> * changing env.GOOGLE_ to just use GCP_AVAILABLE. Co-authored-by: Johannes Köster <[email protected]> Co-authored-by: Johannes Köster <[email protected]> Co-authored-by: Vanessasaurus <[email protected]>

vsoch added 18 commits October 13, 2019 13:12

starting figuring out how to not remove wrappers

3fc7832

Signed-off-by: Vanessa Sochat <[email protected]>

early work to add gcloud executor

91aa899

Signed-off-by: Vanessa Sochat <[email protected]>

merge conflicts

6f5f6b4

Signed-off-by: Vanessa Sochat <[email protected]>

getting basic bucket creation and test upload working, and adding int…

b085ab9

…eractive code bits to figure out where the entrypoint for an executor run is Signed-off-by: Vanessa Sochat <[email protected]>

updating to show debugging

ab831cf

Signed-off-by: Vanessa Sochat <[email protected]>

moving change of working directory to after parsing config files

855788e

Signed-off-by: Vanessa Sochat <[email protected]>

removing old cleanup wrappers logic

330b566

Signed-off-by: Vanessa Sochat <[email protected]>

removing old cleanup wrappers logic

f5a55af

Signed-off-by: Vanessa Sochat <[email protected]>

Merge branch 'add/google-cloud-pipelines' of github.com:vsoch/snakema…

526457d

…ke into add/google-cloud-pipelines

work on base skeleton - pipeline requests are created and I need to w…

3172538

…alk through and test for basic example Signed-off-by: Vanessa Sochat <[email protected]>

lots of work on exector for Google Life Sciences - we upload the work…

bfce212

…dir to storage cache, and then the instance can retrieve it. Signed-off-by: Vanessa Sochat <[email protected]>

release 5.7.4 has a bug looking for a workflow to have a workflow att…

7a44eea

…ribute Signed-off-by: Vanessa Sochat <[email protected]>

more bug fixes to google science pipeline, also want to use version f…

d90966a

…rom pip and install from this branch Signed-off-by: Vanessa Sochat <[email protected]>

adding docker base example to google cloud tests

6a1b571

Signed-off-by: Vanessa Sochat <[email protected]>

omg successful run! ✨

7ea195f

Signed-off-by: Vanessa Sochat <[email protected]>

restoring to original (working) execution with custom image, adding m…

7adfc11

…achine type prefix, ready for draft PR Signed-off-by: Vanessa Sochat <[email protected]>

fixing formatting errors with black

5c6f549

Signed-off-by: Vanessa Sochat <[email protected]>

using same version of black as the testing

c44e916

Signed-off-by: Vanessa Sochat <[email protected]>

vsoch mentioned this pull request Nov 5, 2019

Inconsistent performance between GitHub CI and local psf/black#1140

Closed

testing again after reformatting, removing need for user to specify g…

1bbb0e1

…oogle cloud project on command line (derived from storage client or optional environment to override that) Signed-off-by: Vanessa Sochat <[email protected]>

vsoch changed the title ~~Adding Cloud Pipelines~~ [WIP] Adding Cloud Pipelines Nov 5, 2019

vsoch added 2 commits November 13, 2019 08:28

Merge branch 'master' into add/google-cloud-pipelines

f36adbb

adding warning about environment not being secret for google life sci…

acf5867

…ences Signed-off-by: Vanessa Sochat <[email protected]>

fixing reasonable code smells, the cognitive complexity ones I need s…

251e020

…uggestions for Signed-off-by: vsoch <[email protected]>

vsoch and others added 2 commits May 8, 2020 10:53

Adding persistence.aux_path to be used for workflow tar.gz

8bbfe55

Co-authored-by: Johannes Köster <[email protected]>

merging with snakemake upstream master and refactoring workflow packa…

2403bdd

…ge to be uploaded once Signed-off-by: vsoch <[email protected]>

Merge branch 'master' into add/google-cloud-pipelines

5df399e

Merge branch 'master' into add/google-cloud-pipelines

3f910d4

johanneskoester approved these changes May 8, 2020

View reviewed changes

vsoch added 2 commits May 8, 2020 16:33

adding tests for google life sciences, taking shot that I know the co…

9cf766c

…rrect envar name Signed-off-by: vsoch <[email protected]>

adding changes from main.yml

fb85fef

Signed-off-by: vsoch <[email protected]>

johanneskoester requested changes May 10, 2020

View reviewed changes

.github/workflows/main.yml Outdated Show resolved Hide resolved

.github/workflows/main.yml Outdated Show resolved Hide resolved

vsoch and others added 2 commits May 10, 2020 14:58

Removing secrets.GOOGLE_APPLICATION_CREDENTIALS

599bf8b

Reported to already be set without needing specified here. Co-authored-by: Johannes Köster <[email protected]>

changing env.GOOGLE_ to just use GCP_AVAILABLE.

75190cf

Co-authored-by: Johannes Köster <[email protected]>

johanneskoester changed the base branch from master to google-life-sciences May 11, 2020 08:19

johanneskoester merged commit 8fc06be into snakemake:google-life-sciences May 11, 2020

johanneskoester mentioned this pull request May 11, 2020

[WIP] Adding Google Life Science support #384

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Adding Cloud Pipelines #90

[WIP] Adding Cloud Pipelines #90

vsoch commented Nov 5, 2019 •

edited

Loading

vsoch commented Nov 5, 2019

vsoch commented Nov 5, 2019

johanneskoester commented Nov 5, 2019

vsoch commented Nov 5, 2019

johanneskoester commented Nov 5, 2019

johanneskoester commented Nov 5, 2019

vsoch commented Apr 30, 2020

vsoch commented May 8, 2020

johanneskoester commented May 8, 2020

johanneskoester commented May 8, 2020

johanneskoester commented May 8, 2020

vsoch commented May 8, 2020

vsoch commented May 8, 2020

vsoch commented May 8, 2020

sonarcloud bot commented May 10, 2020

vsoch commented May 11, 2020

johanneskoester commented May 11, 2020

johanneskoester commented May 11, 2020

[WIP] Adding Cloud Pipelines #90

[WIP] Adding Cloud Pipelines #90

Conversation

vsoch commented Nov 5, 2019 • edited Loading

vsoch commented Nov 5, 2019

vsoch commented Nov 5, 2019

johanneskoester commented Nov 5, 2019

vsoch commented Nov 5, 2019

johanneskoester commented Nov 5, 2019

johanneskoester commented Nov 5, 2019

vsoch commented Apr 30, 2020

vsoch commented May 8, 2020

johanneskoester commented May 8, 2020

johanneskoester commented May 8, 2020

johanneskoester commented May 8, 2020

vsoch commented May 8, 2020

vsoch commented May 8, 2020

vsoch commented May 8, 2020

sonarcloud bot commented May 10, 2020

vsoch commented May 11, 2020

johanneskoester commented May 11, 2020

johanneskoester commented May 11, 2020

vsoch commented Nov 5, 2019 •

edited

Loading