Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use iac config for scaledown cron #347

Merged
merged 21 commits into from
Nov 21, 2023
Merged

Conversation

alexvpickering
Copy link
Contributor

@alexvpickering alexvpickering commented Oct 14, 2023

Description

Stage #347.

Also need to enable the cron task manually for biomage deployments.

Details

URL to issue

N/A

Link to staging deployment URL (or set N/A)

N/A

Links to any PRs or resources related to this PR

Integration test branch

master

Merge checklist

Your changes will be ready for merging after all of the steps below have been completed.

Code updates

Have best practices and ongoing refactors being observed in this PR

  • Migrated any selector / reducer used to the new format.

Manual/unit testing

  • Tested changes using InfraMock locally or no tests required for change, e.g. Kubernetes chart updates.
  • Validated that current unit tests for code work as expected and are sufficient for code coverage or no unit tests required for change, e.g. documentation update.
  • Unit tests written or no unit tests required for change, e.g. documentation update.

Integration testing

You must check the box below to run integration tests on the latest commit on your PR branch.
Integration tests have to pass before the PR can be merged. Without checking the box, your PR
will not pass the required status checks for merging.

  • Started end-to-end tests on the latest commit.

Documentation updates

  • Relevant Github READMEs updated or no GitHub README updates required.
  • Relevant Wiki pages created/updated or no Wiki updates required.

Optional

  • Staging environment is unstaged before merging.
  • Photo of a cute animal attached to this PR.

@alexvpickering alexvpickering added the safe to run safe to run Marks the PR as safe to run checks on. IMPORTANT only add if from a trusted source. label Oct 14, 2023
@codecov
Copy link

codecov bot commented Oct 14, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (94812c4) 98.23% compared to head (b5f14cf) 98.19%.

❗ Current head b5f14cf differs from pull request most recent head ea0bbfe. Consider uploading reports for the commit ea0bbfe to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #347      +/-   ##
==========================================
- Coverage   98.23%   98.19%   -0.04%     
==========================================
  Files          22       22              
  Lines         906      943      +37     
==========================================
+ Hits          890      926      +36     
- Misses         16       17       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Alex Pickering <[email protected]>
…worker into use-scaledown-config

Signed-off-by: Alex Pickering <[email protected]>
Signed-off-by: Alex Pickering <[email protected]>
Signed-off-by: Alex Pickering <[email protected]>
Signed-off-by: Alex Pickering <[email protected]>
Signed-off-by: Alex Pickering <[email protected]>
@@ -1,4 +1,5 @@
{{- if (eq .Values.kubernetes.env "staging") }}
# workerScaleToZero isn't picked up - needs to be in master?
{{- if or (eq .Values.myAccount.workerScaleToZero "true") (eq .Values.kubernetes.env "staging") }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Values.myAccount.workerScaleToZero doesn't seem to be transmitted from iac correctly. Current guess is that iac changes need to be in master. May also be configured incorrectly

Comment on lines 13 to 15
default_timeout = 60 * 10
timeout = os.getenv("WORKER_TIMEOUT", default = str(default_timeout))
timeout = int(timeout) if timeout else default_timeout
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

necessary for if WORKER_TIMOUT is empty string. os.getenv(..., default=) only uses default if variable is None

Signed-off-by: Alex Pickering <[email protected]>
Signed-off-by: Alex Pickering <[email protected]>
Signed-off-by: Alex Pickering <[email protected]>
Signed-off-by: Alex Pickering <[email protected]>
Signed-off-by: Alex Pickering <[email protected]>
@@ -1,13 +1,12 @@
# pull official base image
FROM python:3.8-buster AS builder
FROM python:3.8-slim-buster AS builder
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smaller base image


# set working directory
WORKDIR /src

# install app dependencies
COPY requirements.txt ./
RUN pip3 install -r requirements.txt

RUN pip3 install --no-cache-dir -r requirements.txt
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smaller layer

overall, compressed image for python container in ECR goes from ~630mb --> 100mb

@@ -16,7 +16,6 @@ PyYAML==5.4
requests==2.24.0
responses==0.12.0
sinfo==0.3.1
umap-learn==0.4.6
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed on python image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw this dependency in an unrelated context and was confused by it. I suppose it's a vestigial remnant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah there are probably other that may not be needed either but doesn't seem so important since image size will be ~100MB now

Signed-off-by: Alex Pickering <[email protected]>
@@ -1,4 +1,4 @@
{{- if (eq .Values.kubernetes.env "staging") }}
{{- if or (eq .Values.kubernetes.env "staging") (eq .Values.myAccount.accountId "160782110667") }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enabling cron for production in HMS

Comment on lines 1 to 26
import os

ACCOUNT_ID = {
'BIOMAGE': '242905224710',
'HMS': '160782110667',
}

domain_specific = {
'HMS': {'timeout': 40 * 60},
'BIOMAGE': {},
'BIOMAGE_PRIVATE': {}
}


def get_domain_specific():
if os.environ.get('NODE_ENV') == 'test':
return domain_specific['TEST']

aws_account_id = os.getenv('AWS_ACCOUNT_ID')

if aws_account_id == ACCOUNT_ID['HMS']:
return domain_specific['HMS']
elif aws_account_id == ACCOUNT_ID['BIOMAGE']:
return domain_specific['BIOMAGE']
else:
return domain_specific['BIOMAGE_PRIVATE']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modeled off api getDomainSpecificContent.js

Comment on lines +73 to +76
# install umap-learn for RunUMAP
ENV WORKON_HOME=/src/.virtualenvs
RUN R -q -e "reticulate::virtualenv_create('r-reticulate')" && \
R -q -e "reticulate::virtualenv_install('r-reticulate', 'umap-learn==0.5.3', pip_options='--no-cache-dir')"
Copy link
Contributor Author

@alexvpickering alexvpickering Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

motivation is reducing image size (before: ~1010 MB, after: 884MB) which speeds up worker pod being ready. also a bit simpler

Together with python image size reduction (~530MB) worker ready time is down from ~2:30 --> 1:50

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finally removing conda! I tried something similar months ago but got stuck. We can port this to the pipeline as well

Signed-off-by: Alex Pickering <[email protected]>
Signed-off-by: Alex Pickering <[email protected]>
Comment on lines +1 to +23
import os

ACCOUNT_ID = {
'BIOMAGE': '242905224710',
'HMS': '160782110667',
}

domain_specific = {
'HMS': {'production': {'timeout': 40 * 60}},
'BIOMAGE': {},
'BIOMAGE_PRIVATE': {}
}


def get_domain_specific():
aws_account_id = os.getenv('AWS_ACCOUNT_ID')

if aws_account_id == ACCOUNT_ID['HMS']:
return domain_specific['HMS']
elif aws_account_id == ACCOUNT_ID['BIOMAGE']:
return domain_specific['BIOMAGE']
else:
return domain_specific['BIOMAGE_PRIVATE']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modeled after getDomainSpecificContent.js in api

Signed-off-by: Alex Pickering <[email protected]>
Signed-off-by: Alex Pickering <[email protected]>
Comment on lines -81 to -83
# link system libstdc++ to conda installed version
RUN rm /usr/lib/x86_64-linux-gnu/libstdc++.so.6 && \
ln -s /src/r-miniconda/envs/r-reticulate/lib/libstdc++.so.6 /usr/lib/x86_64-linux-gnu/libstdc++.so.6
Copy link
Contributor Author

@alexvpickering alexvpickering Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think any equivalent of this is necessary anymore as libstdc++.so.6 isn't in the environment folder.

can't remember exactly what error under what set of conditions this fixed but RunUMAP works fine.

# cleanup setup files
RUN echo "python3-pip" >> sysdeps_run.txt && \
RUN echo "python3-pip python3-venv python3-dev" >> sysdeps_run.txt && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not declare these in the sysdeps_build_debian.txt instead of the dynamically generated sysdeps_run.txt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@alexvpickering alexvpickering Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sysdeps_build_debian.txt are dependencies for building the R-packages. Once they have been built, the run-time system dependencies for R packages are determined by get_sysdeps_run.R. This strategy reduces the overall image size. There are however some additional dependencies that are necessary for non R-package things:

  • python3-pip for the development build (not needed for prod but installed here for code simplicity)
  • python3-venv to setup the venv for umap-learn
  • python3-dev to have a copy of python that reticulate can use

Copy link
Member

@gerbeldo gerbeldo Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it in the pipeline PR and it does not work (a pip install fails due to not finding pip), for some reason I don't understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look in the pipeline and see if I can get it set up as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not tested but builds for me: hms-dbmi-cellenics/pipeline#340

Copy link
Member

@gerbeldo gerbeldo Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll test the pipeline PR.

Regarding build time dependencies, are they removed downstream, after building the R packages? That relates to my initial question, why do we need python3-pip installed twice?

Copy link
Contributor Author

@alexvpickering alexvpickering Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the common build doesn't start from the end of the builder, but rather from a fresh rocker/r-ver. It transfers over already built packages etc (COPY --from=builder /src /src) and then installs only the system dependencies that are needed to run packages (less hefty than packages needed to build all the R/python packages). The approach is a common one to reduce the size of Docker images.

@alexvpickering alexvpickering merged commit 640a671 into master Nov 21, 2023
11 checks passed
@cosa65 cosa65 deleted the use-scaledown-config branch November 22, 2023 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safe to run safe to run Marks the PR as safe to run checks on. IMPORTANT only add if from a trusted source.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants