-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] install: add docker usage documentation #811
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks 🎉 !
First comments:
- please, check the contribution guide and install the pre-commit to pass style checks
- most likely section should be under the install section
- please do a branch on the main repo so that it can be auto-deployed to review it
- don't forget to update sidebar.json to enable it
Will read it an leave more comments a bit later :)
&& echo "zstyle ':completion:*' verbose yes" >> /root/.zshrc | ||
|
||
# Install DVC with s3 enabled | ||
RUN pip install "dvc[s3]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why s3 only?
probably not the best idea to install into the global space - we advertise a pretty bad practice with this
why don't we use conda, like @casperdcl suggested in one of the tickets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically $CONDA_PREFIX/etc/conda/activate.d/
and deactivate.d/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not the best idea to install into the global space - we advertise a pretty bad practice with this
@shcheklein , why do you consider it a bad idea?
I don't see any problem at all, to be honest; a docker container should only be dealing with one stuff, anyways.
why s3 only?
@shcheklein , I was about to use dvc[all]
but pyarrow
requires CPython
and it is not supporten on Python 3.8 yet; I decided to leave it as an exercise to the reader, the idea is not to provide the best Dockerfile, but just an example to display the bigger picture.
why don't we use conda, like @casperdcl suggested in one of the tickets?
There's a note already about installing DVC through conda, an example with two Dvcfiles seems excessive. I'm trying to cover just the basics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think the best Dockerfile
is exactly what we need to advocate. Otherwise we're saying "oh yeah we know of the existence of this thing you call docker and you can probably use it with our stuff; we've made a page about it on our website but couldn't be bothered to make it fully correct. Here haz some vaguely not-quite-correct musing of how it might sort of look. k's thx"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python 3.7 instead of 3.8? Don't see any need to use bleeding edge...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@casperdcl , I see your point; but I don't think there's a correct image 🙁 . It may vary from user to user, this is sort of like a baseline.
Also, this Dockerfile is for development only, "production" is such a hard thing to get right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@casperdcl , don't get me wrong, I'd be happy to have a more sound version of this image)
open to discussion and commits, as always)!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MrOutis so why should it be Python 3.8 then, let's install 3.7.
why do you consider it a bad idea?
Probably not a big deal for Docker - since it's already an isolated environment. It's just publishing it this way in docs can be considered as an instruction to do the same outside Docker, so was thinking if we should use apt-get for example, or conda
or some other sane way to deliver the tool. May be it's just my bias - "never ever do sudo pip install something" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shcheklein , we are running with root
😅 , we would need to start by creating a specific user for running the application, installing sudo
and then add the user to the sudo
group. Also, allowing to sudo without password, so editing the sudoers list 😬
Not sure if this is the best practice, would need to research
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Ivan meant to NOT ever use sudo
before pip
(on global env). And suggested installing DVC under a virtualenv
, even if its a Docker image.
I don't think that would be useful though, because you kind of expect your entire container to have DVC available everywhere if you're installing it that way.
@@ -0,0 +1,108 @@ | |||
# Using Docker | |||
|
|||
Currently we don't provide any official Docker image, however you can easily |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have the Docker file ready, why don't we create an Image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shcheklein , being honest, I think that having an official image would bring more harm than benefits (maintenance burden, releases, etc.).
This is only an example for the ones who are completely lost at the topic. It is not intended to serve as a reference for best practices on writing Docker images using dvc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's much point documenting anything other than a reference for best practices... 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
publishing a complete Dockerfile like this is also a burden - we'll need to keep it updated, right?
if we see some value in Docker I would create that image (it's quite straightforward to make it part of CI) or put a Dockerfile under Git and make a doc on how to start using it. It's quite strange to publish a complete Dockerfile this way.
@casperdcl not sure I got what way your argument is going :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes trying to explain my ethos here: #811 (comment)
you make sure your files are still writtable. For more information: | ||
https://docs.docker.com/engine/security/userns-remap/ | ||
|
||
### Dockerfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be beneficial to just provide a link to the file and put it into Git so that we can iterate with a regular process?
the section should be more focused on philosophy (like above ^^) that explains certain decision, how to start using it, proving links, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be beneficial to just provide a link to the file and put it into Git so that we can iterate with a regular process?
for the user I don't think it would bring significant value, and for us, well, I don't mind keeping the Dockerfile example on the docs)
the section should be more focused on philosophy (like above ^^) that explains certain decision, how to start using it, proving links, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the user I don't think it would bring significant value
At least it's just easier to get that file, instead of copying it. It's the way code files are usually distributed, right? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shcheklein , all right, let's do this)
where should I put the example Dockerfile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DVC core? At least it's a better place than the website and docs :) Let's imagine we decide to package an image via CI - we will be doing it probably in DVC core, or in a separate repo. Probably not in dvc.org.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not sure that we want to do this, but let's try, anyways 😅
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@casperdcl, here's whats stopping me from submitting an official image:
🙂 |
I'm not fussed about an image atm, but would like to see some code for an image. Priorities:
lower priority:
|
💯 percent. I would expect even that the value we provide with this Docker/image (they are the same for me more or less since we'll have to support them) be the first and most important part of this document and should determine exact place and the structure. It seems also to me that Django is not a correct example. I know that people package command line tools with Docket to run them from the host machine w/o dealing with setup and stuff. In case of DVC it's not clear if it's needed. |
```dockerfile | ||
FROM python:3.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explain why 3.7, since the first example is just python
.
# Set Zsh as the default shell | ||
SHELL ["/usr/bin/zsh", "-c"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change the default? (Bash)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume to accommodate the custom zsh
syntax config... Speaking of which, it would be nice to provide equivalent bash
config if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which goes to my point. Probably an unnecessary complication for a Dockerfile sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@casperdcl , bash
is not as customizable as zsh
, completion is really basic 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree given the argument about providing a better experience, but I think most people are used to Bash, so not sure. It's not like we recommend Zsh over Bash in our docs. This one may be unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to iterative/dvc#2844 (review)
|
||
# vim: ft=dockerfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not familiar with vim
. What does this do? Just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forces Dockerfile syntax highlighting for the whole file when editing in vim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks!
In that case I would remove this. Why provide something editor-specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorgeorpinel , it is nice to have it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it doesn't hurt but still, this is not related to the shell experience, or DVC, or Docker, so I would remove it. Should we vote @shcheklein @casparjespersen? So far it seems like it's Ramon for, myself against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to iterative/dvc#2844 (review)
$ TAG="dvc:development" | ||
$ docker build -t $TAG . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need for tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 yep, building a container with a tag is a good practice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK! But how about just using development
or dev
? dvc
will probably be in the image name. (I read https://cloud.google.com/solutions/best-practices-for-building-containers#properly_tag_your_images.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s. Maybe use an example image name somewhere in this doc?
I feel a little more clarification is required on my part. Personally I do not envision actually running a Personally I would find a docker image useful (and would like it maintained) to be confident of how to install things. At the moment the first thing I tend to look at to figure out how to install something properly (regardless of how good a project's install documentation thinks it is) is Even if in our case It's possible that a maintained |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems odd that the whole file seems to be duplicated (install/docker
and user-guide/using-docker
) - isn't there a template or redirection mechanism which can be used instead?
Sorry that some of my comments appear in the relevant threads above as well as duplicated below... Silly GitHub UI design where Ctrl+Enter
on a PR comment "starts a review" rather than appends to the selected conversation thread.
# Set Zsh as the default shell | ||
SHELL ["/usr/bin/zsh", "-c"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume to accommodate the custom zsh
syntax config... Speaking of which, it would be nice to provide equivalent bash
config if possible
|
||
# vim: ft=dockerfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forces Dockerfile syntax highlighting for the whole file when editing in vim
@@ -0,0 +1,108 @@ | |||
# Using Docker | |||
|
|||
Currently we don't provide any official Docker image, however you can easily |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes trying to explain my ethos here: #811 (comment)
Thank you all for your input and interest! |
@casperdcl we have installers on Windows, Mac OS (including brew) that take care of everything. Do they solve the problem? Linux is a bit more complicated (due to variety of systems as far as I understand), but I'm just trying understand better the potential value the Dockerfile/image would create. Also, @MrOutis I still have a concern that if we don't integrate it into CI it would cost a lot of confusion to users pretty soon since they will pretty soon see an outdated version of it. |
Maybe we can start by improving the current installation docs? 🙂 |
@shcheklein , it doesn't bring any value to me, and if it is only for documentation purposes, I'd say that is better to pinpoint what could be improved in our current installation docs. |
```console | ||
$ TAG="dvc:development" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May need to use ```dvc here given our docs engine automatic highlighting styles. Please check how each looks running the docs locally.
so, why are we doing this PR then? :) I can imagine that if we had a |
I thought I was the only one that who didn't see value on this one |
@shcheklein yes, if we have a recommended way to install dvc for each OS that is a single installer file or command (such as If any amount of user setup is required (pre-install deps, config, auto completion etc) then I only want to look at 1 code file. This could be a single shell script that sets up absolutely everything, or a Dockerfile. If you don't think either is a good idea I will continue to look at |
@casperdcl thanks! kk, got your point. So it's a nice addition for the installation docs. And probably even apt-get does not cover everything. How do we decide what image do we provide it for? Ubuntu or something else? It also takes care of Linux only, right? I still think that dvc.org is not the right place to host the Dockerfile. It's the right place to have a link to an image or a Git-controlled Dockerfile that is being regularly tested if we go this way. |
That is in no way the same as saying they are guaranteed to exist. I'll repeat my earlier sentiment that I'm not sure we need a Dockerfile at all (whether embedded in a webpage or otherwise). But if we do provide one it should be complete and well-written (which is not the same as saying correct). |
@shcheklein I agree in the sense I'd recommend a Dockerfile in the main repo (possibly in a subfolder) and have it added to CI tests (both of which I can easily help with); but I think the meta-discussion of "do we really need docker anything" needs to be resolved first. |
@casperdcl @MrOutis so, considering that users asked this, that apt-get does not cover everything (yet?), I'm in favor of adding this. Not in this way though, Ramon. Let's do this in the core repo. Otherwise it just breaks the point and becomes yet another not-ever-reproducible piece of Markdown. Also, we should clearly communicate what it's created for when we put a link to it - "end-to-end reproducible instruction to install everything you need to start with DVC" or may be @casperdcl can advise on a better message. |
I'll remove myself from the assignment and continue to work on other stuff. |
IMPORTANT NOTES (please read, then delete):
Have you followed the guidelines in
Contributing Documentation?
Please use the title to provide a clear one-line present-tense summary of the
changes introduced in the PR. For example: "Introduce the first version of the
collection editor.".
Please make sure to mention "Fix #bugnum" (if applicable) in the description
of the PR. This enables GitHub to link the PR to the corresponding bug.
Related to iterative/dvc#2774