Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the design doc of Docker build #1627

Merged
merged 2 commits into from
Mar 17, 2017

Conversation

wangkuiyi
Copy link
Collaborator

@wangkuiyi wangkuiyi commented Mar 16, 2017

Fixes #1625


We'd like to give users choices of GPU and no-GPU, because the GPU version image is much larger than then the no-GPU version.

We'd like to give users choices of AVX and no-AVX, because some cloud providers don't provide AVX-enabled VMs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime check SIMD design docs for containers #1607

```

其中 paddlectl 应该是我们自己写的一个脚本,调用kubectl来在Kubernetes机群上启动一个job的。
1. Run the Paddle Book
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The production image better to be smaller, and not all trainning jobs require paddle book to run. May be a good choice to put paddle book into "develop" docker image? For the book may only be useful when developing or learning to develop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use book tag as well. Feels like book is more for users trying to learn deep learning than developers.

Copy link
Collaborator Author

@wangkuiyi wangkuiyi Mar 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

I think we should have a new image paddlepaddle/book:<version>, not just a new tag to paddlepaddle/paddle:<version>.

However, we currently have a git submodule in paddle repo that points to book. Indeed, its that book depends on paddle. So we should have a git submodule in book repo that points to paddle. After we change this, we could have a Dockerfile in book repo. This Dockerfile builds Docker image paddlepaddle/book:<version> basing on paddlepaddle/paddle:<version> built from paddle repo. @gangliao


```bash
docker run -it paddle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paddle production image default command can be paddle version which prints out the version and exit. Users can mount a volume contains their real trainning job or build their own docker image based on the production image.

Copy link
Collaborator Author

@wangkuiyi wangkuiyi Mar 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

I am adding a section describing how to write a PP program using the development package and how to release it using the production image.


1. Run on Kubernetes

We can push the production image to a DockerHub server, so developers can run distributed training jobs on the Kuberentes cluster:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a DockerHub server" -> DockerHub. DockerHub is know as a service, push to a server may confuse? Not sure about that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a DockerHub server" => Docker Registry such as DockerHub?

Copy link
Collaborator Author

@wangkuiyi wangkuiyi Mar 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Rename it to Docker registry service.

docker build -t paddle:dev . # Suppose that the Dockerfile is in the root source directory.
```

1. Build the source code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we build paddle when building the develop docker image? So developers will get to work immediately before they make some modifications and rebuild paddle again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea!

Copy link
Collaborator Author

@wangkuiyi wangkuiyi Mar 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost impossible. We build the development image using

docker build -t paddle:dev .

Because docker build doesn't allow the mounting of source code directory in host filesystem to the Docker image temporarily, we cannot build PaddlePaddle with possible local changes into the development image.

Copy link
Contributor

@typhoonzero typhoonzero Mar 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Since developers has to follow the build instructions at README.md, this will be clear enough for users.

- nvcc
- Python
- sphinx
- woboq
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woboq看起来不是必须的开发工具。

Copy link
Collaborator Author

@wangkuiyi wangkuiyi Mar 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have it when I am facing the currently messy C++ codebase. I agree that we might remove it after our re-implementation work.

- Python
- sphinx
- woboq
- sshd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the sshd service is not necessary, docker exec can also allow user get into the container with multiple terminals, and in kubernetes cluster, it is inconvenient to allocate a port for PaddlePaddle pod.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have it. I noticed that many people are developing using a Linux box with GPUs. If we don't have sshd in the container, developers would have to do two steps: (1) ssh to the box, and (2) run docker exec.


1. Run on Kubernetes

We can push the production image to a DockerHub server, so developers can run distributed training jobs on the Kuberentes cluster:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a DockerHub server" => Docker Registry such as DockerHub?

@reyoung
Copy link
Collaborator

reyoung commented Mar 16, 2017

不是很理解为什么我们的production image一定要是用编译时生成的Dockerfile来进一步生成?

production image在我看来应该非常简单。

  1. Develop image可以编译出deb安装包。
  2. production image只是FROM ubuntu:14.04然后apt install这个deb安装包即可。不应该再有过多的逻辑。

@wangkuiyi
Copy link
Collaborator Author

wangkuiyi commented Mar 16, 2017

@reyoung 因为我连安装 deb 这个包的Dockerfile都不想放在 git repo里。

我们应该会有一个build.sh,它通过执行一个Dockerfile template来生成对应不同tag的Dockerfiles。我理解最好这个模板本身就在build.sh里,而不是多个tags对应的Dockerfiles在git repo里。

Copy link
Contributor

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gangliao gangliao mentioned this pull request Mar 17, 2017
5 tasks
Copy link
Contributor

@gangliao gangliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wangkuiyi wangkuiyi merged commit 2f3097b into PaddlePaddle:develop Mar 17, 2017
@wangkuiyi wangkuiyi deleted the docker-redesign branch March 19, 2017 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants