Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] install: add docker usage documentation #811

Closed
wants to merge 6 commits into from
Closed

Conversation

ghost
Copy link

@ghost ghost commented Nov 22, 2019

IMPORTANT NOTES (please read, then delete):

  • Have you followed the guidelines in
    Contributing Documentation?

  • Please use the title to provide a clear one-line present-tense summary of the
    changes introduced in the PR. For example: "Introduce the first version of the
    collection editor.".

  • Please make sure to mention "Fix #bugnum" (if applicable) in the description
    of the PR. This enables GitHub to link the PR to the corresponding bug.

Related to iterative/dvc#2774

@ghost ghost requested a review from jorgeorpinel November 22, 2019 18:22
@shcheklein shcheklein temporarily deployed to dvc-org-pr-811 November 22, 2019 18:22 Inactive
@ghost ghost requested a review from efiop November 22, 2019 18:22
@ghost ghost mentioned this pull request Nov 22, 2019
Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 🎉 !

First comments:

  • please, check the contribution guide and install the pre-commit to pass style checks
  • most likely section should be under the install section
  • please do a branch on the main repo so that it can be auto-deployed to review it
  • don't forget to update sidebar.json to enable it

Will read it an leave more comments a bit later :)

&& echo "zstyle ':completion:*' verbose yes" >> /root/.zshrc

# Install DVC with s3 enabled
RUN pip install "dvc[s3]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why s3 only?

probably not the best idea to install into the global space - we advertise a pretty bad practice with this

why don't we use conda, like @casperdcl suggested in one of the tickets?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically $CONDA_PREFIX/etc/conda/activate.d/ and deactivate.d/

Copy link
Author

@ghost ghost Nov 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not the best idea to install into the global space - we advertise a pretty bad practice with this

@shcheklein , why do you consider it a bad idea?
I don't see any problem at all, to be honest; a docker container should only be dealing with one stuff, anyways.

why s3 only?

@shcheklein , I was about to use dvc[all] but pyarrow requires CPython and it is not supporten on Python 3.8 yet; I decided to leave it as an exercise to the reader, the idea is not to provide the best Dockerfile, but just an example to display the bigger picture.

why don't we use conda, like @casperdcl suggested in one of the tickets?

There's a note already about installing DVC through conda, an example with two Dvcfiles seems excessive. I'm trying to cover just the basics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think the best Dockerfile is exactly what we need to advocate. Otherwise we're saying "oh yeah we know of the existence of this thing you call docker and you can probably use it with our stuff; we've made a page about it on our website but couldn't be bothered to make it fully correct. Here haz some vaguely not-quite-correct musing of how it might sort of look. k's thx"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python 3.7 instead of 3.8? Don't see any need to use bleeding edge...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casperdcl , I see your point; but I don't think there's a correct image 🙁 . It may vary from user to user, this is sort of like a baseline.
Also, this Dockerfile is for development only, "production" is such a hard thing to get right.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casperdcl , don't get me wrong, I'd be happy to have a more sound version of this image)

open to discussion and commits, as always)!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrOutis so why should it be Python 3.8 then, let's install 3.7.

why do you consider it a bad idea?

Probably not a big deal for Docker - since it's already an isolated environment. It's just publishing it this way in docs can be considered as an instruction to do the same outside Docker, so was thinking if we should use apt-get for example, or conda or some other sane way to deliver the tool. May be it's just my bias - "never ever do sudo pip install something" :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein , we are running with root 😅 , we would need to start by creating a specific user for running the application, installing sudo and then add the user to the sudo group. Also, allowing to sudo without password, so editing the sudoers list 😬

Not sure if this is the best practice, would need to research

Copy link
Contributor

@jorgeorpinel jorgeorpinel Nov 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Ivan meant to NOT ever use sudo before pip (on global env). And suggested installing DVC under a virtualenv, even if its a Docker image.

I don't think that would be useful though, because you kind of expect your entire container to have DVC available everywhere if you're installing it that way.

@@ -0,0 +1,108 @@
# Using Docker

Currently we don't provide any official Docker image, however you can easily
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have the Docker file ready, why don't we create an Image?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein , being honest, I think that having an official image would bring more harm than benefits (maintenance burden, releases, etc.).

This is only an example for the ones who are completely lost at the topic. It is not intended to serve as a reference for best practices on writing Docker images using dvc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's much point documenting anything other than a reference for best practices... 😕

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

publishing a complete Dockerfile like this is also a burden - we'll need to keep it updated, right?

if we see some value in Docker I would create that image (it's quite straightforward to make it part of CI) or put a Dockerfile under Git and make a doc on how to start using it. It's quite strange to publish a complete Dockerfile this way.

@casperdcl not sure I got what way your argument is going :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes trying to explain my ethos here: #811 (comment)

you make sure your files are still writtable. For more information:
https://docs.docker.com/engine/security/userns-remap/

### Dockerfile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be beneficial to just provide a link to the file and put it into Git so that we can iterate with a regular process?

the section should be more focused on philosophy (like above ^^) that explains certain decision, how to start using it, proving links, etc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be beneficial to just provide a link to the file and put it into Git so that we can iterate with a regular process?

for the user I don't think it would bring significant value, and for us, well, I don't mind keeping the Dockerfile example on the docs)

the section should be more focused on philosophy (like above ^^) that explains certain decision, how to start using it, proving links, etc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the user I don't think it would bring significant value

At least it's just easier to get that file, instead of copying it. It's the way code files are usually distributed, right? :)

Copy link
Author

@ghost ghost Nov 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein , all right, let's do this)

where should I put the example Dockerfile?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DVC core? At least it's a better place than the website and docs :) Let's imagine we decide to package an image via CI - we will be doing it probably in DVC core, or in a separate repo. Probably not in dvc.org.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not sure that we want to do this, but let's try, anyways 😅

static/docs/user-guide/index.md Outdated Show resolved Hide resolved
@casperdcl casperdcl changed the title user guide: add docker usage documentatino user guide: add docker usage documentation Nov 22, 2019
@shcheklein shcheklein temporarily deployed to dvc-org-pr-811 November 22, 2019 22:08 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-pr-811 November 22, 2019 22:08 Inactive
@casperdcl

This comment has been minimized.

@ghost

This comment has been minimized.

@casperdcl

This comment has been minimized.

@ghost
Copy link
Author

ghost commented Nov 22, 2019

@casperdcl, here's whats stopping me from submitting an official image:

  • Should we create a user specifically to run dvc ?
  • How to map the user's id and its group id to the user running the container?
  • How and who will manage the release process? Should it be automated?
  • What base image should we use and why? (python, conda, alpine, flavor-of-the-month, etc.)
  • Is it going to be used interactively (like an SSH session) or running a container for each invocation (ENTRYPOINT ["dvc"])?
  • Is this image going to be used on Development or Production? Does it matter?

🙂

@casperdcl
Copy link
Contributor

casperdcl commented Nov 22, 2019

I'm not fussed about an image atm, but would like to see some code for an image. Priorities:

  1. install dvc includine deps
  2. install dvc[all] including deps
  3. install optional extras (zsh syntax highlighting, inputrc tweaks, etc)

lower priority:

  1. mess around with UIDs
  2. provide an image (can be automated on travis, including building a second image with dvc[tests] before deploy etc)

@ghost ghost changed the title user guide: add docker usage documentation [WIP] install: add docker usage documentation Nov 23, 2019
@shcheklein
Copy link
Member

@MrOutis

💯 percent. I would expect even that the value we provide with this Docker/image (they are the same for me more or less since we'll have to support them) be the first and most important part of this document and should determine exact place and the structure.

It seems also to me that Django is not a correct example. I know that people package command line tools with Docket to run them from the host machine w/o dealing with setup and stuff. In case of DVC it's not clear if it's needed.

Comment on lines +52 to +53
```dockerfile
FROM python:3.7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain why 3.7, since the first example is just python.

Comment on lines +75 to +76
# Set Zsh as the default shell
SHELL ["/usr/bin/zsh", "-c"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the default? (Bash)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume to accommodate the custom zsh syntax config... Speaking of which, it would be nice to provide equivalent bash config if possible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which goes to my point. Probably an unnecessary complication for a Dockerfile sample.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casperdcl , bash is not as customizable as zsh, completion is really basic 😬

Copy link
Contributor

@jorgeorpinel jorgeorpinel Nov 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree given the argument about providing a better experience, but I think most people are used to Bash, so not sure. It's not like we recommend Zsh over Bash in our docs. This one may be unnecessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +100 to +101

# vim: ft=dockerfile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with vim. What does this do? Just curious.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forces Dockerfile syntax highlighting for the whole file when editing in vim

Copy link
Contributor

@jorgeorpinel jorgeorpinel Nov 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks!

In that case I would remove this. Why provide something editor-specific?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgeorpinel , it is nice to have it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it doesn't hurt but still, this is not related to the shell experience, or DVC, or Docker, so I would remove it. Should we vote @shcheklein @casparjespersen? So far it seems like it's Ramon for, myself against.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +107 to +108
$ TAG="dvc:development"
$ docker build -t $TAG .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need for tag?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 yep, building a container with a tag is a good practice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK! But how about just using development or dev? dvc will probably be in the image name. (I read https://cloud.google.com/solutions/best-practices-for-building-containers#properly_tag_your_images.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. Maybe use an example image name somewhere in this doc?

@casperdcl
Copy link
Contributor

casperdcl commented Nov 23, 2019

I feel a little more clarification is required on my part.

Personally I do not envision actually running a dvc docker image in order to dvc init/add/status etc as an alternative to installing within a python env (e.g. iterative/dvc#2760 (comment)). Perhaps it may be useful in a larger image as a tool to manage automated pushing/pulling data in the same way as git can be used as a replacement for wget/curl, but that's not my main concern right now.

Personally I would find a docker image useful (and would like it maintained) to be confident of how to install things. At the moment the first thing I tend to look at to figure out how to install something properly (regardless of how good a project's install documentation thinks it is) is .travis.yml/.circleci/config.yml. If available I definitely look at Dockerfiles instead. @tensorflow/dockerfiles/gpu.Dockerfile for example is small and IMHO can't be beaten by any amount of installation documentation -- and oh my goodness there is a lot of it -- even if I never use that file. I even made a minor change to that file which I think makes a world of a difference to ease native system installation.

Even if in our case dvc would have a tiny Dockerfile, it's still useful to potential end users to demonstrate how easy it is to install. And it turns out it's not completely trivial -- there are runtime non-python dependencies on at least libc, libffi, and git. There are a bunch of build dependencies. There are python deps and dvc != dvc[all], both of which exclude both ssh_gssapi and hdfs. There are bash completion scripts and zsh syntax highlighting. inputrc tweaks. Probably also more things unlisted... At the very least a maintained Dockerfile is a good way for devs to concisely document major new features/dependencies in a more readable way than CI config files.

It's possible that a maintained snapcraft.yml (iterative/dvc#2778) satisfies what I'm looking for, but I still think a Dockerfile presents cleaner installation documentation.

Copy link
Contributor

@casperdcl casperdcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems odd that the whole file seems to be duplicated (install/docker and user-guide/using-docker) - isn't there a template or redirection mechanism which can be used instead?


Sorry that some of my comments appear in the relevant threads above as well as duplicated below... Silly GitHub UI design where Ctrl+Enter on a PR comment "starts a review" rather than appends to the selected conversation thread.

Comment on lines +75 to +76
# Set Zsh as the default shell
SHELL ["/usr/bin/zsh", "-c"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume to accommodate the custom zsh syntax config... Speaking of which, it would be nice to provide equivalent bash config if possible

Comment on lines +100 to +101

# vim: ft=dockerfile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forces Dockerfile syntax highlighting for the whole file when editing in vim

@@ -0,0 +1,108 @@
# Using Docker

Currently we don't provide any official Docker image, however you can easily
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes trying to explain my ethos here: #811 (comment)

@ghost
Copy link
Author

ghost commented Nov 25, 2019

Thank you all for your input and interest!

@shcheklein
Copy link
Member

... but I still think a Dockerfile presents cleaner installation documentation.

@casperdcl we have installers on Windows, Mac OS (including brew) that take care of everything. Do they solve the problem? Linux is a bit more complicated (due to variety of systems as far as I understand), but apt and rpm packages should solve most of the problems as well, right?

I'm just trying understand better the potential value the Dockerfile/image would create. Also, @MrOutis I still have a concern that if we don't integrate it into CI it would cost a lot of confusion to users pretty soon since they will pretty soon see an outdated version of it.

@ghost
Copy link
Author

ghost commented Nov 25, 2019

Even if in our case dvc would have a tiny Dockerfile, it's still useful to potential end users to demonstrate how easy it is to install. And it turns out it's not completely trivial -- there are runtime non-python dependencies on at least libc, libffi, and git. There are a bunch of build dependencies. There are python deps and dvc != dvc[all], both of which exclude both ssh_gssapi and hdfs. There are bash completion scripts and zsh syntax highlighting. inputrc tweaks. Probably also more things unlisted... At the very least a maintained Dockerfile is a good way for devs to concisely document major new features/dependencies in a more readable way than CI config files.

@casperdcl:

  • inputrc is usually like that on a desktop environment (i.e. ubuntu), we need to tweak it because the Docker doesn't provide it
  • libc, libffi, git are dependencies that are super common (already provided in some operative systems), installation is easy.
  • We already provide instructions for autocompletion, both Bash and Zsh: https://dvc.org/doc/install/completion
  • AFAIK, you don't need hdfs or ssh installed to use dvc[all]

but I still think a Dockerfile presents cleaner installation documentation.

Maybe we can start by improving the current installation docs? 🙂

@ghost
Copy link
Author

ghost commented Nov 25, 2019

I'm just trying understand better the potential value the Dockerfile/image would create.

@shcheklein , it doesn't bring any value to me, and if it is only for documentation purposes, I'd say that is better to pinpoint what could be improved in our current installation docs.

Comment on lines +106 to +107
```console
$ TAG="dvc:development"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May need to use ```dvc here given our docs engine automatic highlighting styles. Please check how each looks running the docs locally.

@shcheklein
Copy link
Member

it doesn't bring any value to me, and if it is only for documentation purposes, I'd say that is better to pinpoint what could be improved in our current installation docs.

so, why are we doing this PR then? :)

I can imagine that if we had a Dockerfile (with comments) in Git that we run as part of CI and just put a link to it in the relevant (install/linux) doc it would be better (in terms of support) and more useful than creating a separate document with an embedded file that we would have to support in a peculiar way.

@ghost
Copy link
Author

ghost commented Nov 25, 2019

so, why are we doing this PR then? :)

I thought I was the only one that who didn't see value on this one

@casperdcl
Copy link
Contributor

@shcheklein yes, if we have a recommended way to install dvc for each OS that is a single installer file or command (such as apt get install dvc) which is guaranteed to install every single possible optional subfeature and dependencies then I won't care about installation instructions or Dockerfiles.

If any amount of user setup is required (pre-install deps, config, auto completion etc) then I only want to look at 1 code file. This could be a single shell script that sets up absolutely everything, or a Dockerfile. If you don't think either is a good idea I will continue to look at .travis.yml no matter how great anyone thinks the installation instructions on the website are now or at any point in future :)

@shcheklein
Copy link
Member

@casperdcl thanks! kk, got your point. So it's a nice addition for the installation docs. And probably even apt-get does not cover everything. How do we decide what image do we provide it for? Ubuntu or something else? It also takes care of Linux only, right?

I still think that dvc.org is not the right place to host the Dockerfile. It's the right place to have a link to an image or a Git-controlled Dockerfile that is being regularly tested if we go this way.

@casperdcl
Copy link
Contributor

libc, libffi, git are dependencies that are super common

That is in no way the same as saying they are guaranteed to exist. apt get install dvc would definitely have to include these as deps. This should be the same here.

I'll repeat my earlier sentiment that I'm not sure we need a Dockerfile at all (whether embedded in a webpage or otherwise). But if we do provide one it should be complete and well-written (which is not the same as saying correct).

@casperdcl
Copy link
Contributor

casperdcl commented Nov 25, 2019

@shcheklein I agree in the sense I'd recommend a Dockerfile in the main repo (possibly in a subfolder) and have it added to CI tests (both of which I can easily help with); but I think the meta-discussion of "do we really need docker anything" needs to be resolved first.

@shcheklein
Copy link
Member

@casperdcl @MrOutis so, considering that users asked this, that apt-get does not cover everything (yet?), I'm in favor of adding this. Not in this way though, Ramon. Let's do this in the core repo. Otherwise it just breaks the point and becomes yet another not-ever-reproducible piece of Markdown. Also, we should clearly communicate what it's created for when we put a link to it - "end-to-end reproducible instruction to install everything you need to start with DVC" or may be @casperdcl can advise on a better message.

@ghost
Copy link
Author

ghost commented Nov 26, 2019

@ghost
Copy link
Author

ghost commented Nov 26, 2019

I'll remove myself from the assignment and continue to work on other stuff.
The idea was to ignite the discussion with a more hands-on approach.
I'll close this PR in favor if this one: iterative/dvc#2844
@casperdcl , do you want to continue the work on this? 🙂

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants