Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Adding Cloud Pipelines #90

Merged
Merged
Show file tree
Hide file tree
Changes from 89 commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
3fc7832
starting figuring out how to not remove wrappers
vsoch Oct 13, 2019
91aa899
early work to add gcloud executor
vsoch Oct 28, 2019
6f5f6b4
merge conflicts
vsoch Oct 31, 2019
b085ab9
getting basic bucket creation and test upload working, and adding int…
vsoch Oct 31, 2019
ab831cf
updating to show debugging
vsoch Nov 1, 2019
855788e
moving change of working directory to after parsing config files
vsoch Nov 1, 2019
330b566
removing old cleanup wrappers logic
vsoch Nov 1, 2019
f5a55af
removing old cleanup wrappers logic
vsoch Nov 1, 2019
526457d
Merge branch 'add/google-cloud-pipelines' of github.com:vsoch/snakema…
vsoch Nov 1, 2019
3172538
work on base skeleton - pipeline requests are created and I need to w…
vsoch Nov 1, 2019
bfce212
lots of work on exector for Google Life Sciences - we upload the work…
vsoch Nov 2, 2019
7a44eea
release 5.7.4 has a bug looking for a workflow to have a workflow att…
vsoch Nov 2, 2019
d90966a
more bug fixes to google science pipeline, also want to use version f…
vsoch Nov 3, 2019
6a1b571
adding docker base example to google cloud tests
vsoch Nov 3, 2019
7ea195f
omg successful run! :sparkles:
vsoch Nov 3, 2019
7adfc11
restoring to original (working) execution with custom image, adding m…
vsoch Nov 4, 2019
5c6f549
fixing formatting errors with black
vsoch Nov 5, 2019
c44e916
using same version of black as the testing
vsoch Nov 5, 2019
1bbb0e1
testing again after reformatting, removing need for user to specify g…
vsoch Nov 5, 2019
f36adbb
Merge branch 'master' into add/google-cloud-pipelines
vsoch Nov 13, 2019
acf5867
adding warning about environment not being secret for google life sci…
vsoch Nov 13, 2019
dc98494
Merge branch 'add/google-cloud-pipelines' of github.com:vsoch/snakema…
vsoch Nov 13, 2019
53aefa2
updating Google Life Sciences executor to programatically select mach…
vsoch Nov 13, 2019
0e30a02
Update snakemake/executors.py
vsoch Nov 14, 2019
6620a9c
changing default arguments for regions to be included in arg parser
vsoch Nov 14, 2019
5d809d8
Merge branch 'add/google-cloud-pipelines' of github.com:vsoch/snakema…
vsoch Nov 14, 2019
6a5422b
adding hashlib function to snakemake/common.py, tweaks to default arg…
vsoch Nov 14, 2019
d3da75a
renaming google_life_sciences to google_lifesciences and using job re…
vsoch Nov 14, 2019
23202f9
custom entrypoint / cmd is working for snakemake base!
vsoch Nov 15, 2019
cd6225e
Merge branch 'master' of github.com:snakemake/snakemake into add/goog…
vsoch Nov 26, 2019
657e658
imports went away?
vsoch Nov 26, 2019
73451d9
updating common and executors to fix some sonarcube linting issues
vsoch Nov 26, 2019
b005fb2
trivial use of exec_job to fix SonarCloud "bug"
vsoch Nov 26, 2019
67d6f84
updating machine type selection to take first in list before filterin…
vsoch Nov 30, 2019
0cf394b
why was stylesheet changed?
vsoch Nov 30, 2019
eb0b767
why was stylesheet changed also in utils?
vsoch Nov 30, 2019
581ac00
Merge branch 'master' into add/google-cloud-pipelines
johanneskoester Dec 10, 2019
6fa9d1f
updating life science executor to only include source files and worki…
vsoch Dec 11, 2019
05fe55e
not sure why WorkflowError isnt defined for a file I didnt edit, how …
vsoch Dec 11, 2019
363c5fd
Merge branch 'master' into add/google-cloud-pipelines
vsoch Jan 6, 2020
e2a4bbe
fixing bug with merge with master - an extra few lines were kept with…
vsoch Jan 6, 2020
a6b59d3
refactoring to not require additional scripts (passing exec_job direc…
vsoch Jan 6, 2020
fdf0a8a
e2 prefix machines dont seem to work for google life sciences api
vsoch Jan 11, 2020
68c3850
Merge branch 'master' into add/google-cloud-pipelines
vsoch Jan 11, 2020
bf7d451
gcp life sciences cannot currently support m1 or e2 instance types, a…
vsoch Jan 14, 2020
5e1e413
Merge branch 'master' into add/google-cloud-pipelines
vsoch Jan 29, 2020
2f65ab2
import of time should then use time.sleep
vsoch Jan 29, 2020
483e20b
Merge branch 'master' of github.com:snakemake/snakemake into add/goog…
vsoch Jan 29, 2020
b7a7523
bug that archive files silently not being added, and adding more robu…
vsoch Feb 1, 2020
80299e3
Merge branch 'master' of github.com:snakemake/snakemake into add/goog…
vsoch Feb 1, 2020
e4fae2a
bump current container image to v5.10.0 and add debug statements to show
vsoch Feb 1, 2020
8659643
updating Google Life Sciences to use get_container_image for latest c…
vsoch Feb 5, 2020
ae34d8d
adding default-resources setting for google-lifesciences and better m…
vsoch Feb 7, 2020
3aa9ae9
be more specific about resource limits for LHS
vsoch Feb 9, 2020
530418c
locations api now needs to be used since there is more than one locat…
vsoch Feb 13, 2020
2fd57fb
adding parameter for location, and default to region or prefix of region
vsoch Feb 13, 2020
babfe12
google import should not be at top of file, wont work locally
vsoch Feb 14, 2020
928baba
import google for wrong function
vsoch Feb 14, 2020
33aa4be
must use lowercase and not camel case for variables
vsoch Feb 18, 2020
a62d8dd
if resources are requested, update default resources
vsoch Feb 23, 2020
c740cf3
updating event to use debug instead of info
vsoch Feb 25, 2020
a00dde7
updating stderr event to use debug instead of error
vsoch Feb 25, 2020
2e84fcc
missing adding argument to --skip-script-cleanup to base job, need ne…
vsoch Mar 10, 2020
8d7ee28
adding cleaner implementataion for --skip-script-cleanup to be shared…
vsoch Mar 10, 2020
51b0a3f
Merge branch 'master' into add/google-cloud-pipelines
vsoch Mar 11, 2020
f7041f9
Merge branch 'master' into add/google-cloud-pipelines
vsoch Mar 16, 2020
3b9deab
google_lifesciences needs to be treated as a non local exec
vsoch Mar 16, 2020
9bb71bb
Merge branch 'master' of github.com:snakemake/snakemake into add/goog…
vsoch Mar 18, 2020
5b19a96
adding install of crc32c library for remote
vsoch Mar 18, 2020
7744ecc
adding support for google-lifesciences gpu
vsoch Apr 8, 2020
c9bc8f4
Merge branch 'master' of github.com:snakemake/snakemake into add/goog…
vsoch Apr 8, 2020
ae895dc
snakefile in tests/common.py should switch between tempdir and original
vsoch Apr 9, 2020
24dfd9e
fixing test to not expect file to be in same directory
vsoch Apr 13, 2020
b945be0
Google Life Sciences executor must use relative paths for build package
vsoch Apr 13, 2020
0827b4e
remove unneeded line
vsoch Apr 13, 2020
d12c438
adding machine and gpu_model job resources
vsoch Apr 15, 2020
935f10e
typo model -> gpu_model
vsoch Apr 15, 2020
14148b5
string resources should be in quotes
vsoch Apr 16, 2020
1260133
gcloud beta life sciences client change in behavior
vsoch Apr 16, 2020
81a5b25
updating GoogleLifeSciences Executor to look for machine_type instead…
vsoch Apr 17, 2020
5153ac5
merging with master
vsoch Apr 30, 2020
2ccbe94
initial changes after review from Johannes
vsoch Apr 30, 2020
251e020
fixing reasonable code smells, the cognitive complexity ones I need s…
vsoch Apr 30, 2020
8bbfe55
Adding persistence.aux_path to be used for workflow tar.gz
vsoch May 8, 2020
2403bdd
merging with snakemake upstream master and refactoring workflow packa…
vsoch May 8, 2020
5df399e
Merge branch 'master' into add/google-cloud-pipelines
johanneskoester May 8, 2020
3f910d4
Merge branch 'master' into add/google-cloud-pipelines
johanneskoester May 8, 2020
9cf766c
adding tests for google life sciences, taking shot that I know the co…
vsoch May 8, 2020
fb85fef
adding changes from main.yml
vsoch May 8, 2020
599bf8b
Removing secrets.GOOGLE_APPLICATION_CREDENTIALS
vsoch May 10, 2020
75190cf
changing env.GOOGLE_ to just use GCP_AVAILABLE.
vsoch May 10, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ jobs:
env:
AWS_AVAILABLE: ${{ secrets.AWS_ACCESS_KEY_ID }}
GCP_AVAILABLE: ${{ secrets.GCP_SA_KEY }}
GOOGLE_APPLICATION_CREDENTIALS: ${{ secrets.GOOGLE_APPLICATION_CREDENTIALS }}
vsoch marked this conversation as resolved.
Show resolved Hide resolved
steps:
- uses: actions/checkout@v1

Expand Down Expand Up @@ -69,6 +70,14 @@ jobs:
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1

- name: Test Google Life Sciences Executor
if: env.GOOGLE_APPLICATION_CREDENTIALS
vsoch marked this conversation as resolved.
Show resolved Hide resolved
run: |
# activate conda env
export PATH="/usr/share/miniconda/bin:$PATH"
source activate snakemake
pytest -s -v -x tests/test_google_lifesciences.py

- name: Run tests
env:
CI: true
Expand Down
166 changes: 166 additions & 0 deletions docs/executing/cluster-cloud.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ in order to avoid unnecessary charges.

.. _kubernetes:


Executing a Snakemake workflow via kubernetes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -103,6 +104,171 @@ a job intends to use, such that kubernetes can allocate it to the correct cloud
computing node.


Executing a Snakemake workflow via Google Cloud Life Sciences
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The `Google Cloud Life Sciences <https://cloud.google.com/life-sciences/docs/>`_
provides a rich application programming interface to design pipelines.
You'll first need to `follow instructions here <https://cloud.google.com/life-sciences/docs/quickstart>`_ to
create a Google Cloud Project and enable Life Sciences, Storage, and Compute Engine APIs,
and continue with the prompts to create credentials. You'll want to create
a service account for your host (it's easiest to give project Owner permissions),
and save the json credentials. You'll want to export the full path to this file to `GOOGLE_APPLICATION_CREDENTIALS`:

.. code-block:: console

$ export GOOGLE_APPLICATION_CREDENTIALS=$HOME/path/snakemake-credentials.json

If you lose the link to the credentials interface, you can `find it here <https://console.cloud.google.com/apis/credentials>`_.

Optionally, you can export `GOOGLE_CLOUD_PROJECT` as the name of your Google
Cloud Project. By default, the project associated with your application
credentials will be used.

.. code-block:: console

$ export GOOGLE_CLOUD_PROJECT=my-project-name

The dependencies that you'll need for snakemake are:

- gcc
- python dev
- google cloud python client libraries
- oauth2client


Data in Google Storage
::::::::::::::::::::::

Using this executor typically requires you to start with large data files
already in Google Storage, and then interact with them via the Google Storage
remote executor. An easy way to do this is to use the
`gsutil <https://cloud.google.com/storage/docs/uploading-objects>`_
command line client. For example, here is how we might upload a file
to storage using it:


.. code-block:: console

$ gsutil -m cp mydata.txt gs://snakemake-bucket/1/mydata.txt

The `-m` parameter enables multipart uploads for large files, so you
can remove it if you are uploading one or more smaller files.
And note that you'll need to modify the file and bucket names.
Note that you can also easily use the Google Cloud Console interface, if
a graphical interface is preferable to you.

Environment Variables
:::::::::::::::::::::

**Important:** Google Cloud Life Sciences uses Google Compute, and does
**not** encrypt environment variables. If you specify environment
variables with the envvars directive or --envvars they will **not** be secrets.


Container Bases
:::::::::::::::

By default, Google Life Sciences uses the latest stable version of
`snakemake/snakemake <https://hub.docker.com/r/snakemake/snakemake/tags>`_
on Docker Hub. You can choose to modify the container base with
the `--container-image` (or `container_image` from within Python),
however if you do so, your container must meet the following requirements:

- have an entrypoint that can execute a `/bin/bash` command
- have snakemake installed, either via `source activate snakemake` or already on the path
- also include snakemake Python dependencies for google.cloud

If you use any Snakemake container as a base, you should be good to go. If you'd
like to get a reference for requirements, it's helpful to look at the
`Dockerfile <https://github.com/snakemake/snakemake/blob/master/Dockerfile>`_
for Snakemake.

Requesting GPU
::::::::::::::

The Google Life Sciences API currently has support for
`nvidia <https://cloud.google.com/compute/docs/gpus#restrictions>`_
GPUs, meaning that you can ask for `nvidia_gpu` explicitly by adding `nvidia_gpu`
to your Snakefile resources for a step:


.. code-block:: yaml

rule a:
output:
"test.txt"
resources:
nvidia_gpu=1
shell:
"somecommand ..."



or you can set a general gpu count requirement, and an nvidia GPU will be used.


.. code-block:: yaml

rule a:
output:
"test.txt"
resources:
gpu=1
shell:
"somecommand ..."


If you want to specify a specific `gpu model <https://cloud.google.com/compute/docs/gpus#introduction>_`
(by name) you can add `gpu_model` to your resources:


.. code-block:: yaml

rule a:
output:
"test.txt"
resources:
gpu_model="nvidia-tesla-p100"
shell:
"somecommand ..."

You should use the lowercase identifiers like `nvidia-tesla-p100` and `nvidia-tesla-p4`
for this variable. If you don't specify a `gpu` or `nvidia_gpu` (with a count) but you do
specify a `gpu_model`, the count will default to 1.


Machine Types
:::::::::::::

To specify an exact `machine type <https://cloud.google.com/compute/docs/machine-types>_`
or a prefix to filter down to and then select based on other resource needs,
you can set a default resource on the command line, either for a prefix or
a full machine type:

.. code-block:: console

--default-resources machine_type="n1-standard"


If you want to specify the machine type as a resource, you can do that too:

.. code-block:: yaml

rule a:
output:
"test.txt"
resources:
machine_type="n1-standard-8"
shell:
"somecommand ..."


If you request a gpu, this requires the "n1" prefix and your preference from
the file or command line will be overridden. Note that the default resources
for Google Life Sciences (memory and disk) are the same as for Tibanna.


Executing a Snakemake workflow via Tibanna on Amazon Web Services
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
Loading