-
Notifications
You must be signed in to change notification settings - Fork 811
Updated Docker images #89
Comments
Hi, Thanks for updating these. We've just pushed a fix for #83, can you try building the gpu backend based image off of the latest master? Now that we're trying to infer if an appropriate GPU is present as part of the basic build (falling back to CPU), I'm wondering if separate cpu/gpu docker images are still needed? |
The CPU image just builds neon on Ubuntu Core 14.04, but the CUDA images also include the CUDA SDK (with v6.5, v7.0 and v7.5 available with Included below is a stack trace from trying to run
|
I've had the same issue when updating to 1.0. Could it be that your cuda_path/lib64 is not in your LD_LIBRARY_PATH ? |
Both PATH and LD_LIBRARY_PATH have been set up. I also confirmed with a manual check right now. |
We've uncovered an issue with the GPU build procedure on some machines that we're currently looking into. There's a good chance your kernels didn't get built (check |
It is missing - the |
Does Docker hub build on AWS? Suspecting the reason you don't end up with kernels is because your build machine doesn't have a Maxwell capable GPU? Until we build in this support (see #80) we'll need to do a better job of detecting and warning the user that this is the case. |
Probably, if not something similar - which is why I changed Some kind of flag/manual make option to force the install, with a warning, would work. Although, if you have to use the |
FWIW I have the same issues and error message with missing kernels on our Maxwell GPU with CUDA 7.0 |
We just pushed some changes that should address the kernel build issue that was introduced with the fix for #83. We've also modified things to build the kernels even on machines that don't have a Maxwell GPU, which should help with building the GPU based docker images. You still won't be able to run the GPU backend without a Maxwell GPU though. To remedy this we'll likely end up backporting nervanagpu kernels, and don't plan on resurrecting the cudanet backend. |
The builds have been failing for a while now, so after some investigation it looks like this is the result of the Automated Build limits. Judging from the timings between creating builds and the exceptions being thrown, the builds are hitting the 2 hour limit. Any ideas? Perhaps something in my Dockerfile can be changed? That said, the builds were failing before I added |
wow 2 hours is pretty crazy, any idea what part of the build dominates? I tried to login and view the build details page for a recent run but the logs panel was empty for me. I just ran a clean GPU based build of neon and that took about 7 minutes start to finish ( |
Docker Support sent me an email about it: "We are seeing your build is failing due to too much memory consumption and crashing again and again". It appears that it could fail earlier, but the exception is only thrown after 2 hours. To go over the limits of the Automated Builds:
The suggested solution is to break this into several Automated Builds, but I'm not sure that would even solve the problem. There are basically 2 steps - installing a few Ubuntu packages, and doing the actual make. The first step has never been an issue for software with similar requirements. In the latest failed build I removed the |
Speed ups should at least partially address #89
We've just pushed an update to neon which should remove the unnecessary virtualenv python dependency install as part of a system-wide install. There's also a new As to the memory consumption issue, what we think may be going on here is that the |
On the suggestion from Docker Support I created a virtual machine with the same resource limits as their machines, but my builds were succeeding. I contacted them and they tried to see what the issue is (response below), but yes it seems that you've identified what's causing the memory consumption issue. Let me know once the throttling is in place so that I can try again.
|
Defaults to 25, can be adjusted via --max_concurrent. Should fix the remainder of #89
Ok latest push throttles default number of concurrent kernel build processes to 10 (there were upwards of ~50 launching at the same time without this limit). Hopefully this should be sufficient for the docker hub environment but if not you can further refine from the top level via: Try playing around with that and let us know if you're still seeing issues. |
Great - thanks to that commit the automated builds are now succeeding on the Docker Hub! I've set up weekly builds for both the CPU and CUDA versions, so you can add them to the docs if you want. FYI the following error gets thrown for both the
|
I've updated my Docker builds for version 1.0 - one for the cpu backend, and a new one for the gpu backend. The GPU images referenced in the 0.9 docs are still available, but with a note about deprecation.
I've tested the cpu version with
neon examples/mnist_mlp.yaml
andpython examples/mnist_mlp.py
, and it appears fine. However, the gpu version builds the cpu version because of #83. When building the code to check for the GPU capabilities, please keep #19 in mind.The text was updated successfully, but these errors were encountered: