-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker-compose improvements #92
Docker-compose improvements #92
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sandertan - changes look great!
Good stuff consolidating all the config into a single place and removing unnecessary settings. The default docker-compose.yml is supposed to use the latest stable release image with default settings. If you have time you can update that compose file also, otherwise I'll hopefully get to it sometime this week.
for the nice to haves - lets get these current changes in an cleanup the compose files, and address each one of these nice to haves via gh issues - we can then split the work thanks! |
@tomolopolis I added the changes to |
I've rebuilt locally on your branch, looks fine - thanks! |
@tomolopolis I moved some nice to haves to separate issues. I'm not sure using multi-stage docker builds would be a significant improvement (https://docs.docker.com/build/building/multi-stage/, https://www.snopoke.com/2020/11/10/multi-stage-docker-build/ and https://stackoverflow.com/q/54246600/4141535), so I did not add that. |
@sandertan - thanks! a multi-stage build should reduce the size a bit, I might experiment and see if it's worthwhile. |
@tomolopolis @sandertan , I have been experimenting with optimising One of the first things I have been looking at is https://github.com/CogStack/MedCATtrainer/blob/master/webapp/Dockerfile. So far I have managed to drop the uncompressed size of the image from 7.71GB to 6.13GB and there are more optimisations to be had. I am yet to test if everything works in spite of these modifications - and will do that in the coming days/weeks. The following are some of the findings that may help with your experiments as well. Multi-stage buildThe frontend build lends itself nicely to a separate stage. This saves ~450MB by avoiding everything we don't need in FROM node:16-slim as node_image
WORKDIR /home
COPY ./frontend /home/frontend
RUN cd /home/frontend \
&& npm install -g npm@latest \
&& npm install \
&& npm run build and then later on just copy the FROM python:3.10
# do other stuff here
COPY --from=node_image /home/frontend/dist /home/frontend/dist
# continue more layers Omit pip cacheThe following lines retain ~900MB of downloaded pip packages in cache. # Install requirements for backend
WORKDIR /home/
RUN pip install pip --upgrade
RUN pip install -r requirements.txt
ARG SPACY_MODELS="en_core_web_md"
RUN for SPACY_MODEL in ${SPACY_MODELS}; do python -m spacy download ${SPACY_MODEL}; done This can be improved as follows: ARG SPACY_MODELS="en_core_web_md"
RUN pip install pip --upgrade --no-cache-dir \
&& pip install -r requirements.txt --no-cache-dir \
&& for SPACY_MODEL in ${SPACY_MODELS}; do python -m spacy download ${SPACY_MODEL}; done This improves the size of this layer by 900MB from 6.1GB to 5.2GB. Other improvementsI guess each of these could be raised as a separate issue - but in keeping with the title of this PR, I will list the proposals here and we can spin off into separate issues if necessary. Run as non-rootIt looks like everything in the container gets installed as RUN groupadd -g 999 appuser \
&& useradd -r -u 999 -g appuser appuser
USER appuser
# Copy project
WORKDIR /home
# See below for why this line needs modifying
COPY --chown=appuser:appuser . .
COPY --from=node_image --chown=appuser:appuser /home/frontend/dist /home/frontend/dist Django Admin user creationCurrently the Django admin user gets created with credentials in plain text in Copy only required folders/filesCurrently all contents under Environment variablesCurrently environment variables in docker-compose require the presence of a |
@vvcb Thanks for the extensive list of suggestions! I'm currently less involved in the this project at our institute, so I'll let Tom respond to your PR. |
docker-compose
files other thandocker-compose-dev.yml
Hi @tomolopolis, thanks a lot for all the recent updates to MedCAT Trainer. Especially the addition of Solr for the concept lookup looks nice!
We're starting a new annotation project for HPO concepts in Dutch text, and I had to make some changes to make the deployment fit our use case. It would be nice to have our changes, which add configuration options, in the master branch; this'll make it easier for us to update MedCAT trainer in the future.
Included in this PR
docker-compose
-files in this repository, some of which don't work anymore with current master I think. The only compose file that builds a new Dockerfile isdocker-compose-dev.yml
, so I made my changes only there. What is your plan with the other compose files? Having multiple compose files is difficult to maintain. I can add these changes to other compose files if you want..env-example
-file and documentation on how this should be used with Docker Compose. This makes it relatively easy to pass on environment variables to docker-compose, similar to what I added for CogStack-Nifi. For maintainers it would be nice to have the least amount of places where configuration can be changed, but I think this file is needed to configure variables at compose-level.en_core_web_md
. Alternatively, we can set the default to all 3 (sm, md & lg) English language models.requirements.txt
and then copying the other backend files, Docker can use the cache during building. When nothing is changed to the requirements file, the cached step is used, which greatly reduces the time it tames to build the image.MCTRAINER_PORT
.expose:
, as this does not do anything anymore (see last edit at https://stackoverflow.com/a/40801773/4141535)container_name
for Solr. Within the docker compose network, the containers can find each other using the service name.Not included in this PR
Nice to have in future
MedCATtrainer/webapp/run.sh
Line 19 in 296d4d7
envs/env
-file andenvironment:
in the compose. There's also some discrepancy in how different compose files do this. We could move the environment variables fromenvs/env
to the proposed.env-example
-file and load them in the container with something like:configs/base.txt
). With the recent versions of MedCAT containing the config within the CDB file, do you agree it would be easier to use this as the default? This is important for using non-English models, because it can set the spaCy model.solr:8983
within the Docker network.Let me know what you like/dislike. I can make separate GitHub issues of the nice to have-items if you prefer.