-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the design doc of Docker build #1627
Conversation
paddle/scripts/docker/README.md
Outdated
|
||
We'd like to give users choices of GPU and no-GPU, because the GPU version image is much larger than then the no-GPU version. | ||
|
||
We'd like to give users choices of AVX and no-AVX, because some cloud providers don't provide AVX-enabled VMs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtime check SIMD design docs for containers #1607
paddle/scripts/docker/README.md
Outdated
``` | ||
|
||
其中 paddlectl 应该是我们自己写的一个脚本,调用kubectl来在Kubernetes机群上启动一个job的。 | ||
1. Run the Paddle Book |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The production image better to be smaller, and not all trainning jobs require paddle book to run. May be a good choice to put paddle book into "develop" docker image? For the book may only be useful when developing or learning to develop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use book
tag as well. Feels like book is more for users trying to learn deep learning than developers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
I think we should have a new image paddlepaddle/book:<version>
, not just a new tag to paddlepaddle/paddle:<version>
.
However, we currently have a git submodule in paddle
repo that points to book
. Indeed, its that book
depends on paddle
. So we should have a git submodule in book
repo that points to paddle
. After we change this, we could have a Dockerfile in book
repo. This Dockerfile builds Docker image paddlepaddle/book:<version>
basing on paddlepaddle/paddle:<version>
built from paddle
repo. @gangliao
paddle/scripts/docker/README.md
Outdated
|
||
```bash | ||
docker run -it paddle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paddle production image default command can be paddle version
which prints out the version and exit. Users can mount a volume contains their real trainning job or build their own docker image based on the production image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
I am adding a section describing how to write a PP program using the development package and how to release it using the production image.
paddle/scripts/docker/README.md
Outdated
|
||
1. Run on Kubernetes | ||
|
||
We can push the production image to a DockerHub server, so developers can run distributed training jobs on the Kuberentes cluster: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"a DockerHub server" -> DockerHub. DockerHub is know as a service, push to a server may confuse? Not sure about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"a DockerHub server" => Docker Registry such as DockerHub?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Rename it to Docker registry service.
paddle/scripts/docker/README.md
Outdated
docker build -t paddle:dev . # Suppose that the Dockerfile is in the root source directory. | ||
``` | ||
|
||
1. Build the source code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we build paddle when building the develop docker image? So developers will get to work immediately before they make some modifications and rebuild paddle again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost impossible. We build the development image using
docker build -t paddle:dev .
Because docker build
doesn't allow the mounting of source code directory in host filesystem to the Docker image temporarily, we cannot build PaddlePaddle with possible local changes into the development image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Since developers has to follow the build instructions at README.md, this will be clear enough for users.
- nvcc | ||
- Python | ||
- sphinx | ||
- woboq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woboq看起来不是必须的开发工具。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to have it when I am facing the currently messy C++ codebase. I agree that we might remove it after our re-implementation work.
- Python | ||
- sphinx | ||
- woboq | ||
- sshd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the sshd
service is not necessary, docker exec
can also allow user get into the container with multiple terminals, and in kubernetes cluster, it is inconvenient to allocate a port for PaddlePaddle pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to have it. I noticed that many people are developing using a Linux box with GPUs. If we don't have sshd in the container, developers would have to do two steps: (1) ssh to the box, and (2) run docker exec.
paddle/scripts/docker/README.md
Outdated
|
||
1. Run on Kubernetes | ||
|
||
We can push the production image to a DockerHub server, so developers can run distributed training jobs on the Kuberentes cluster: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"a DockerHub server" => Docker Registry such as DockerHub?
不是很理解为什么我们的production image一定要是用编译时生成的Dockerfile来进一步生成? production image在我看来应该非常简单。
|
@reyoung 因为我连安装 deb 这个包的Dockerfile都不想放在 git repo里。 我们应该会有一个build.sh,它通过执行一个Dockerfile template来生成对应不同tag的Dockerfiles。我理解最好这个模板本身就在build.sh里,而不是多个tags对应的Dockerfiles在git repo里。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes #1625