-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make nextstrain build scripts consistent #868
Conversation
@@ -217,6 +217,8 @@ services: | |||
genepinet: | |||
aliases: | |||
- nextstrain.genepinet.localdev | |||
volumes: | |||
- ./src/backend:/usr/src/app |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means we can run a nextstrain container locally with our local code changes in it.
@@ -39,6 +39,8 @@ RUN mkdir /ncov && \ | |||
git remote add origin https://github.com/nextstrain/ncov.git && \ | |||
git fetch origin master && \ | |||
git reset --hard FETCH_HEAD | |||
RUN mkdir -p /ncov/auspice | |||
RUN mkdir -p /ncov/logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure these directories always exist when we do a phylo run, so we don't get failures when trying to copy debug logs to s3
export aws="aws --endpoint-url ${BOTO_ENDPOINT_URL}" | ||
else | ||
export aws="aws" | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes it so we can run this script in local-dev (which needs the --endpoint-url flag) more easily.
# fetch aspen config | ||
genepi_config="$(aws secretsmanager get-secret-value --secret-id $GENEPI_CONFIG_SECRET_NAME --query SecretString --output text)" | ||
genepi_config="$($aws secretsmanager get-secret-value --secret-id $GENEPI_CONFIG_SECRET_NAME --query SecretString --output text)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using the $aws variable we set above with optional --endpoint-url flags to run aws cli commands.
aspen_s3_db_bucket="$(jq -r .S3_db_bucket <<< "$genepi_config")" | ||
key_prefix="phylo_run/${S3_FILESTEM}/${WORKFLOW_ID}" | ||
s3_prefix="s3://${aspen_s3_db_bucket}/${key_prefix}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set our prefixes once so we don't have to accidentally get them wrong later on in this script
|
||
aws configure set region $AWS_REGION | ||
|
||
if [ ! -z "${BOTO_ENDPOINT_URL}" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to this file are ~identical to the changes to the ondemand script, they just make the two scripts more similar.
@@ -34,7 +34,8 @@ ${local_aws} secretsmanager update-secret --secret-id genepi-config --secret-str | |||
"DB_rw_username": "user_rw", | |||
"DB_rw_password": "password_rw", | |||
"DB_address": "database.genepinet.localdev", | |||
"S3_external_auspice_bucket": "genepi-external-auspice-data" | |||
"S3_external_auspice_bucket": "genepi-external-auspice-data", | |||
"S3_db_bucket": "genepi-db-data" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some config info in local dev that we can use in the nextstrain run scripts
@@ -94,6 +95,8 @@ ${local_aws} ssm put-parameter --name /genepi/local/localstack/pangolin-ondemand | |||
|
|||
echo "Creating s3 buckets" | |||
${local_aws} s3api head-bucket --bucket genepi-external-auspice-data || ${local_aws} s3 mb s3://genepi-external-auspice-data | |||
${local_aws} s3api head-bucket --bucket genepi-db-data || ${local_aws} s3 mb s3://genepi-db-data | |||
${local_aws} s3api head-bucket --bucket genepi-gisaid-data || ${local_aws} s3 mb s3://genepi-gisaid-data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create the buckets we'll need so we can run builds in local dev.
s3_resource.Bucket(gisaid_s3_bucket).Object(processed_sequences_s3_key).put(Body="") | ||
s3_resource.Bucket(gisaid_s3_bucket).Object(processed_metadata_s3_key).put(Body="") | ||
s3_resource.Bucket(gisaid_s3_bucket).Object(aligned_sequences_s3_key).put(Body="") | ||
s3_resource.Bucket(gisaid_s3_bucket).Object(aligned_metadata_s3_key).put(Body="") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually write some gisaid files to s3 so our nextstrain scripts can download them in local dev.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I love that you added support for local testing, awesome job!
Yeah, this doesn't get us to 100% local-testability yet, but it's a first step... 🤷♀️ |
|
||
# upload the tree to S3 | ||
key="phylo_run/${build_date}/${S3_FILESTEM}/${WORKFLOW_ID}/ncov.json" | ||
aws s3 cp /ncov/auspice/ncov_aspen.json "s3://${aspen_s3_db_bucket}/${key}" | ||
key="${key_prefix}/ncov_aspen.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this key is used in the save.py code below, not sure what it is for. just checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it tells save.py which path to write to the DB so we can find it and make it available to auspice
key="phylo_run/${build_date}/${S3_FILESTEM}/${WORKFLOW_ID}/ncov.json" | ||
aws s3 cp /ncov/auspice/ncov_aspen.json "s3://${aspen_s3_db_bucket}/${key}" | ||
key="${key_prefix}/ncov_aspen.json" | ||
$aws s3 cp /ncov/auspice/ncov_aspen.json "s3://${aspen_s3_db_bucket}/${key}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sometimes we get > 1 JSON in 1 folder, so in the older code JSONs are renamed in this step (the builds.yaml needs too). Is the current set up, each tree run will have its own folder and this won't happen anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new path will create a folder prefix for every phylo tree:
https://github.com/chanzuckerberg/aspen/pull/868/files#diff-510b082ad0018f47841b06cb3ebcc9910c1e3487976975fcb68ff759f89b0dbbR29
So the files will be like /phylo_run/Tuolomne Contextual/12345/ncov_aspen.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll add this to my tree wiki!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also change the way we format these paths -- if there's anything that you think is easier to understand, let me know!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's great as you put here! this is going to make the tree debug a lot easier. i spend a lot of time clicking through the 1234 folders every time TnT
if [ ! -e /ncov/data/sequences_aspen.fasta ]; then | ||
cp /ncov/data/references_sequences.fasta /ncov/data/sequences_aspen.fasta; | ||
cp /ncov/data/references_metadata.tsv /ncov/data/metadata_aspen.tsv; | ||
fi; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Hahaha I was reviewing a merged PR? shhhhhh nobody tells anyone |
Summary:
<rdev link>
Demos:
We were writing debug information to different S3 destinations for ondemand vs scheduled runs, and also using a different set of variables to generate these paths in two scripts that are (mostly) identical. This updates the scripts to use the same S3 paths and variables wherever possible, so that they're more alike than different.
I've also updated some of our local dev configuration to make more of these scripts work properly (and be more testable) in the local dev environment.
Notes:
Checklist:
<base branch>