-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU CI Setup #1045
GPU CI Setup #1045
Conversation
Signed-off-by: Michael Dolan <[email protected]>
Signed-off-by: Michael Dolan <[email protected]>
If I understand this correctly, there's a pre-created CodeBuild project called The |
Yes, that describes the intended current approach.
Good point. @tykeal any thoughts on @jfpanisset suggested workflow? |
buildspec.yml
Outdated
-DOCIO_BUILD_DOCS=OFF \ | ||
-DOCIO_BUILD_TESTS=ON \ | ||
-DOCIO_BUILD_GPU_TESTS=ON \ | ||
-DOCIO_BUILD_PYTHON=ON \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file is dedicated to the AWS GPU build so, the Python build & test could be disabled.
That's great. |
We haven't created the CodeBuild project yet. We'll need several different parameters while defining it. As for using
Take a look at https://docs.aws.amazon.com/codebuild/latest/userguide/create-project.html#create-project-cli to see all the configuration that is in the CLI template for setting one of these up! If y'all feel that we should allow creation and destruction of the CodeBuild projects on the fly I can ask that the IAM be updated to include the needed rights. The issue I see with that is that if multiple PRs end up triggering the build at the same time, then if we're doing a create / destroy things could are rather likely to fail somewhere. |
I'm OK with using one pre-built CodeBuild project to start. We can always make it more dynamic in the future if needed. @tykeal do you know if CodeBuild supports a dynamic docker container path? If we could specify the docker tag I don't anticipate there being a lot else to configure. If that's not possible for now, it could be something to investigate in the future. What else do you need to get the CodeBuild project setup? Happy to provide that info. |
Signed-off-by: Michael Dolan <[email protected]>
@michdolan I need to know what container image you need to use as it needs to be specified. I can use any container in the Amazon ECR or a different registry. I'm assuming you want to use the images created by @aloysbaillet but I need to know the container coordinates to be able to get the basic project up. |
Sure thing. Here is the image to use: Thanks @tykeal ! |
Do you know if there a way to pass the container to CodeBuild via the buildspec.yml file? That would be an ideal setup if it was supported, long term at least. |
I'm not aware of a way to do it in the buildspec. I'm not using the sha itself, just aswf/ci-ocio:2020 from my reading of the docs that should be all I need. It should do the standard container thing and follow the label as it moves. |
Signed-off-by: Michael Dolan <[email protected]>
Signed-off-by: Michael Dolan <[email protected]>
As per other discussions, adding: -DOCIO_USE_HEADLESS to the build options in buildspec.yml and merging the fixes for CMake GLEW detection from PR #1112 should allow the GPU code to build and run on CodeBuild. |
…orkflow Signed-off-by: Michael Dolan <[email protected]>
Signed-off-by: Michael Dolan <[email protected]>
@jfpanisset I ran a build on GPU tests error with: |
That's disappointing. A couple of differences I can see with the tests I ran:
whereas if I'm looking at the right version: it seems you may be setting DISPLAY=:0
https://www.khronos.org/registry/EGL/extensions/EXT/EGL_EXT_platform_x11.txt says:
My guess would be that removing the DISPLAY environment variable should prevent EGL from trying to "obtain an EGLDisplay backed by an X11 screen" and should hopefully allow EGL to work without an X11 server present. |
I can definitively state that the CodeBuild environment type for OCIO is LINUX_GPU_CONTAINER. It's the only option for GPU when setting up a CodeBuild environment. |
Good catch on the DISPLAY @jfpanisset . I left that in by mistake after some earlier experimentation. I'll try removing that to restore the NULL value. |
It worked! Removing GPU CI will need to be run as part of a nightly build due to permissions, but the tests ran successfully and passed. |
That's great news indeed. Turns out that the DISPLAY environment variable and its interaction with EGL was discussed in the original PR #1047 that added EGL support, it would probably be worth documenting what's necessary to get a working GPU CI setup for OCIO. I'll try to capture some of this for the ASWF Sample Project. |
Signed-off-by: Michael Dolan <[email protected]>
Signed-off-by: Michael Dolan <[email protected]>
Instead of doing a nightly build I have the GPU CI job running on commit to any OpenColorIO (v2) branch (this does not work with pull request CI jobs that originate from forks). That should keep our AWS usage minimal while getting continuous validation following merge. Hopefully we can find a solution to get this working in PRs in the future. |
Signed-off-by: Michael Dolan <[email protected]>
Signed-off-by: Michael Dolan <[email protected]>
Initial work for getting GPU CI up and running via AWS CodeBuild. The CodeBuild project doesn't exist yet, so this will fail currently, but the PR can be used for setup testing purposes.
I also updated all the GH Actions jobs to detect available threads from the system when running cmake build (which will be even easier in CMake 3.12 with the new
parallel
support). The AWS GPU instances have 32 CPU threads (if I'm reading the spec correctly) so we can also leverage that to build much more quickly. I use a 24 thread machine at home and can build OCIO in around ~1 minute. Our GH Actions VMs all have 2 threads and build in ~10 minutes on Linux.