Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide docker image(s) #2774

Closed
casperdcl opened this issue Nov 10, 2019 · 33 comments
Closed

provide docker image(s) #2774

casperdcl opened this issue Nov 10, 2019 · 33 comments
Labels
build Issues/features related to building dvc install packages. discussion requires active participation to reach a conclusion enhancement Enhances DVC feature request Requesting a new feature p3-nice-to-have It should be done this or next sprint

Comments

@casperdcl
Copy link
Contributor

casperdcl commented Nov 10, 2019

Provide docker images

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Nov 10, 2019
@casperdcl casperdcl added build Issues/features related to building dvc install packages. enhancement Enhances DVC feature request Requesting a new feature labels Nov 10, 2019
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Nov 10, 2019
@ghost
Copy link

ghost commented Nov 10, 2019

@casperdcl , I'm not sure how much benefit would bring, giving that it is not that hard to create a docker image for DVC: docker run --name dvc -ti python pip install dvc && docker container commit --change 'CMD ["dvc"]' dvc dvc

Also, testing this would be the same as testing DVC on a linux machine (or at least, it should be)

I doubt that users want to use it as a base image, since it doesn't provide anything else than DVC.

I'd prefer to not maintain this one, to be honest

@pared
Copy link
Contributor

pared commented Nov 11, 2019

I would also argue, that when doing some data science stuff in docker, its probably easier to install DVC in your own image, rather than try to adjust DVC image to you requirements (like installing TF/pytorch/...).

@casperdcl
Copy link
Contributor Author

@pared that was my point about

  • also would make it easier for end-users to incorporate into their own docker images

as in people could copy-paste from our docker file into theirs...

I assumed it's more complex than just a pip install to get all the features (at least some apt-get installs as well?)

@pared
Copy link
Contributor

pared commented Nov 11, 2019

@casperdcl ok, I didn't quite get it.

as in people could copy-paste from our docker file into theirs...

That would surely help to build custom image.

@efiop
Copy link
Contributor

efiop commented Nov 11, 2019

@casperdcl Hm, apt-gets would depend on the base image that you are using. If it is absolutely bare then yeah, you will need to install python and whatnot. But for regular things like ubuntu you don't have to install anything special, except, I guess, git. We have a docker image that we are using for testing here https://github.com/iterative/dvc-test/blob/master/docker/ubuntu/16.04/Dockerfile . Not sure if we still need to install libffi explicitly 🤔

@casperdcl
Copy link
Contributor Author

@efiop the setup.py file seems to imply more deps are required - I was referring to a full pip install dvc[all,ssh_gssapi,tests]

@casperdcl
Copy link
Contributor Author

@efiop can confirm libffi not required on ubuntu:18.04

@efiop
Copy link
Contributor

efiop commented Nov 11, 2019

@casperdcl Ah, got it. yeah, gssapi requires some dev tools to compile stuff as well as tests, so the dockerfile for that would be more complicated. We could provide it in the docs maybe? I'm not quite sure about building and testing it. When used for regular tests, it would always run linux inside, instead of running on the host system, which is a problem, as we already have a lack of native-testing on windows and osx, and that might prevent us from discovering bugs when developing. Though a nice thing about it is that you don't have to setup a test env on your machine, that is true 🙂

@casperdcl
Copy link
Contributor Author

Yes I did start making a few flavours of docker images for testing (alpine, ubuntu LTS, 2.7, 3.6 etc) ages ago which are probably sitting in a git stash on one of any number of machines. maybe.

Now I just use a conda env for dvc testing.

@efiop efiop added the p3-nice-to-have It should be done this or next sprint label Nov 11, 2019
@casperdcl
Copy link
Contributor Author

btw totally fine with this issue being closed - don't actually have any strong opinions about it.

@efiop
Copy link
Contributor

efiop commented Nov 11, 2019

@casperdcl No reason to close it. Docker images(or at least dockerfiles) would be nice to have, for sure. 🙂

@nettoyoussef
Copy link

Provide docker images

I just came here looking exactly for that.
Every time I see a comment like:
"It's highly recommended using virtual environment or pipx (on Python 3.6+) to encapsulate your local environment."

I instantly go looking for a docker file for me to easily test the software.

That is because I use different software (mostly R), and don't use python outside docker (because python environment libraries change a lot, and nobody seems to agree which one is best - conda, virtualenv, pyenv, pipenv etc - which, to complicate further, have different functionality).

@efiop
Copy link
Contributor

efiop commented Nov 18, 2019

@nettoyoussef Thanks for the feedback! Do you need pre-built images, or a Dockerfile in our docs would do?

@dmpetrov
Copy link
Member

@casperdcl thank you for this idea!

I agree with @pared and @MrOutis that it is easy to create your own docker image and it might create additional supporting overhead for us.

However, prebuild docker gives value to users and @nettoyoussef showed some example. It can attract users' attention and improve usability despite a simple implementation.

But then documentation plays the major role. Can we make a good documentation page or even a small blog post that explains the motivation behind using docker image instead of installed tool and when it is needed? Why don't we start with doc/blog-post and then implement the docker image.

@nettoyoussef
Copy link

nettoyoussef commented Nov 19, 2019

@nettoyoussef Thanks for the feedback! Do you need pre-built images, or a Dockerfile in our docs would do?

Thank you for being so helpful.

Personally, a Dockerfile would suffice. The community, however, maybe would benefit more from a pre-built image.

Instead of building one from Ubuntu, you could make your life easier and, e.g., start from a miniconda image. I think this can be easy to automate, and maybe you can even delegate this to other teams - a partnership with rocker for example.

Since from what I read DVC is not attached to any particular library/language it also makes sense to separate concerns, i.e., you can use the same DVC image with any project, instead of installing it with any particular environment.

It also makes easier to implement it in existing projects - since you don't have to rebuild the images just to try it.

@ghost
Copy link

ghost commented Nov 22, 2019

Here's an initial draft to add to DVC's docs a section about using Docker: iterative/dvc.org#811

Feel free to edit it accordingly 🙂

@ghost ghost removed their assignment Nov 26, 2019
@ghost ghost added the discussion requires active participation to reach a conclusion label Nov 26, 2019
@ghost
Copy link

ghost commented Nov 26, 2019

After several iterations trying to provide useful information for Docker users, there's no agreement in what should be that info and the way we should present it (e.g. provide an image? docs are enough?)

Let's keep the discussion open for now :)
Thanks a lot, @casperdcl , @shcheklein , @jorgeorpinel , @efiop for reviewing the previous efforts.
If you could dump your opinion on this one it would help a lot to reach a conclusion.

@jorgeorpinel
Copy link
Contributor

Well, like you said in iterative/dvc.org#811 (comment)

The only essential parts are FROM python and RUN pip install dvc

So I don't see the point of providing such a simple Dockerfile that anyone familiar with Docker can easily create. Maybe just a small section in the installation guide to provide Docker tips such as using python:3.7 and dvc[all].

@dmpetrov
Copy link
Member

dmpetrov commented Mar 29, 2020

There are two DVC-docker images for CI/CD project which are going to be maintained:

  1. https://hub.docker.com/repository/docker/dvcorg/dvc-cml
  2. https://hub.docker.com/repository/docker/dvcorg/dvc-cml-gpu

The docker files code is here (gpu PR is not merged yet): https://github.com/iterative/dvc-cml

Does it make sense to extend these images to cover the needs of this issue? What needs to be added or changed in the images?

@casperdcl
Copy link
Contributor Author

casperdcl commented Mar 29, 2020

maybe I'm missing something but it looks like they're using index.js (node) rather than the Dockerfile (docker). Seems like the GH Action should really use the docker image (e.g. https://github.com/casperdcl/covid-19-box).

On a related note I like where this is heading https://github.com/iterative/dvc-cml/wiki/Tensorflow-Mnist-for-Github-Actions

@dmpetrov
Copy link
Member

@casperdcl it is using docker files. Index.js is here just support GH users who don’t want or cannot use docker. You can find it in the workflow files.

@casperdcl
Copy link
Contributor Author

Right. Seems a bit odd to provide a nodejs action for public use via the standard uses: syntax, but in our workflow use our own docker version. Unrelated to this issue (#2774) though.

@dmpetrov
Copy link
Member

@casperdcl what would be your suggestions for that project? How to organize it in the right way?

@casperdcl
Copy link
Contributor Author

casperdcl commented Mar 31, 2020

  1. In action.yml, use Dockerfile
  2. Move as much of the other non-essential root clutter to subdirs
  3. The Dockerfile's entrypoint itself can use node and/or any other software that users demand support for
  4. In our workflow,
    • use node/black/flake8/etc directly to run linting/tests (as currently)
    • uses: ./ to run/test the (docker) action

Advantages:

  • a single CI run of our provided action installs and uses/tests everything (docker wraps node, etc.)
  • local installs and our workflow tests can continue to manually install prerequisites and use just node (as currently)
  • both cloud (other people's workflows as well as our own) and local installs can use the convenient docker wrapper/action (with all of its advantages such as dependency management & reproducability)

Surely should discuss this in an issue on that repo though?

@shcheklein
Copy link
Member

@casperdcl

Move as much of the other non-essential root clutter to subdirs

you mean prettier configs, etc, etc?

The Dockerfile's entrypoint itself can use node and/or any other software that users demand support for

could you elaborate?

uses: ./ to run/test the (docker) action

same here, could you elaborate?

@casperdcl
Copy link
Contributor Author

you mean prettier configs, etc, etc?

er, just a general principle removing as much as possible. Some tools expect files to be in the root so we're mostly stuck there, ofc.

you mean prettier configs, etc, etc?

It's cumbersome for us to maintain multiple, well, entrypoints to our actual code. If we want to support both docker and directly running in node, it's best to have docker be a thin wrapper around node (i.e. in the Dockerfile, use ENTRYPOINT npm, CMD run or similar).

This way we can use the docker wrapper for the action. Thus running the action will test our docker wrapper as well as the underlying entrypoint. The additional advantage is that all deps are guaranteed installed in the docker container.

@shcheklein
Copy link
Member

I feel that I'm still missing something :)

Docker entrypoint for the image we provide already does this, right? It already runs Node. And image itself has JS bundle pre-installed.

There are no very strong reason to support direct docker-less action, but it's a separate topic.

@casperdcl
Copy link
Contributor Author

yes I was making several minor points, I think we're all missing small things but nothing major :)

@DavidGOrtega
Copy link

DavidGOrtega commented Apr 6, 2020

Hi! we are pushing to use the code through Docker, the main reasons are:

  • Complexity abstraction (The container acts as a wrapper of the whole stack)
  • User experience is the same (or very close) in Gitlab and Github
  • Stack is preinstalled so its faster to execute
  • Less friction and ready to go for GPU
  • The image will also act as a self runner, again, with the same user experience in Gitlab and Github.

The MAIN reason why the js action is maintained is because there is no way MACOS or WINDOWS can run specific native tools in docker. So if a user would be using i.e. CoreML with Xcode the only way to make this work available for them is through the purely Github Js action and only in Github

@hsharrison
Copy link

Hi, we're using the dvcorg/cml image, but it's creating a pain point that there are no version tags, only latest. It would be great if images were tagged by version so we could pin it and enforce the same version as in local dev environments.

@skshetry
Copy link
Member

skshetry commented Jan 4, 2021

Hi, @hsharrison. Could you please mention your issue in the CML's repo, specifically on iterative/cml#217? This way, it'll be easier and faster for you. Thanks.

@DavidGOrtega
Copy link

👋 @hsharrison feel free to open a ticket there. Could you please provide also whats your pain point?
We decided to not tag it (despite that we have already a ticket for that created by me) since we always try to do CML backwards compatible.

@hsharrison
Copy link

My mistake, sorry for the noise.

@efiop efiop closed this as completed May 3, 2021
@iterative iterative locked and limited conversation to collaborators May 3, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
build Issues/features related to building dvc install packages. discussion requires active participation to reach a conclusion enhancement Enhances DVC feature request Requesting a new feature p3-nice-to-have It should be done this or next sprint
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants