-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Headless rendering #1047
Headless rendering #1047
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work.
The CI build currently demonstrates that the pull request does not break the compilation but it does not demonstrate that the EGL & GPU unit tests work on headless machines (yes, you did it on your machine but the CI must also validate it).
To do so, you have to change a little bit the script controlling the CI build:
- .github/workflows/ci_workflow.yml, line 69 adds
build-gpu: 'ON'
, and addbuild-gpu: 'OFF'
for all others. - .github/workflows/ci_workflow.yml, line 205 replaces
-DOCIO_BUILD_GPU_TESTS=OFF
by-DOCIO_BUILD_GPU_TESTS=${{ matrix.build-gpu }}
Hoping it will successfully build the first Linux build using EGL, and GPU unit tests will all succeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were able to have a successful build on your local machine but the two 'headless' builds failed on the GPU unit tests!
The Linux crashed right from the start:
Start 4: test_gpu
4: Test command: /__w/OpenColorIO/OpenColorIO/_build/tests/gpu/test_gpu_exec
4: Test timeout computed to be: 10000000
4:
4: EGL could not be initialized.
4/5 Test #4: test_gpu .........................***Failed 0.01 sec
And the macOS failed with the usual errors with 'emulation'
test 4
Start 4: test_gpu
4: Test command: /Users/runner/work/OpenColorIO/OpenColorIO/_build/tests/gpu/test_gpu_exec
4: Test timeout computed to be: 10000000
4:
4: GL Vendor: Apple Inc.
4: GL Renderer: Apple Software Renderer
4: GL Version: 2.1 APPLE-17.10.22
4: GLSL Version: 1.20
4:
4: OpenColorIO_Core_GPU_Unit_Tests
4:
4: [ 1/147] [CDLOp / clamp_fwd_v1_legacy_shader ] - FAILED -
4: Maximum error: 0.06166547909 at pixel: 3 on component 0 larger than epsilon.
4: scr = {0, 0, 0, inf}
4: cpu = {0.06166547909, 0, 0.06061341241, inf}
4: gpu = {0, 0, 0.06061341241, inf}
4: absolute tolerance=9.999999975e-07
Thank you for the feedback. |
src/libutils/oglapphelpers/oglapp.h
Outdated
@@ -113,6 +118,10 @@ class OglApp | |||
// Helper to print GL info. | |||
void virtual printGLInfo() const noexcept; | |||
|
|||
// Return a pointer of either ScreenApp or HeadlessApp depending on the | |||
// OCIO_HEADLESS_ENABLED preprocessor | |||
static OglAppRcPtr createOglAppPtr(const char * winTitle, int winWidth, int winHeight); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to mention that any static method starts with an upper case and that the returned type does not add anything to understand the method purpose. So, the method name should be CreateOglApp()
Hi @ChinYing-Li, The Linux & macOs build break is (here):
Note: Do not forget that you can use Docker to build on different Linux flavors & compilers when investigations are needed or for build fix validations. |
Hi, Thanks for the feedback. |
Hi,
Yes it is.
But Docker with Mesa could help to 'finalize' the pull request. For short-term:
Is there another way (i.e. not CUDA) to detect Nvidia graphic card? |
Hi @ChinYing-Li Could you list all what you did to make it work (i.e. package installed, etc.)? |
@ChinYing-Li , thanks for all the great work on this! I notice that your print out of the GPU unit test run in issue #1039 show all tests passing but the test results that Patrick pasted above show failures. Both his test and yours seem to have been done on a Mac. The difference seems to be that Patrick was not running the same GL/EGL version as you. I also have a Mac and could try running your tests. Could you provide some info about what you did to install EGL/Mesa on your Mac? Thanks in advance! |
@hodoulp Currently, the CI Linux CentOS build fails to initialize EGL, and I suspect that its Nvidia GPU and the graphics driver may the culprit. Several possibilities:
I will post my steps of installing Mesa (EGL backend) shortly. |
@doug-walker
I just made an OSX build of OCIO, and the result of OSX build's screen-rendering GPU unit tests is below:
|
@ChinYing-Li , thanks for clarifying that your earlier results were actually Ubuntu, even though they were obtained on Mac hardware. Your new results on MacOS seem similar to Patrick's results and are similar to our earlier failed attempts at getting the GPU tests to succeed on the Mac via a software emulated GPU. The timing of your work on this is great! @michdolan has been working on trying to get the GPU tests running on our CI infrastructure with help from @jfpanisset and the Linux Foundation. JF was the one that originally suggested that EGL may help with that effort. So I think once you and Patrick are happy with your work that we should merge it, since it might be helpful to get the GPU CI working on Linux. As discussed before, we will need to leave the alternate mechanism in place for Mac until an EGL solution is possible. |
The CI virtual CentOS machine can’t initialize glut in the case of ScreenApp or egl in the case of HeadlessApp.
However, the Questions:
Referencing my previous comment: I wonder if it’s possible to add |
Hi @ChinYing-Li Thanks a lot for the effort and your excellent work on that difficult problematic i.e. running GPU tests on headless machines. The start was to have a way (i.e. using EGL & Nvidia 'special' drivers as suggested by @jfpanisset) to run GPU unit tests on headless machine (i.e. the current CI infrastructure is only using 'virtual' machines). That's the challenge. However, we previously made several attempts to run the GPU unit tests without any code modification, always facing the X11 server issue as you discovered. An 'implicit' mandatory requirement of the 'challenge' is to guaranty that running on headless machines still produce valid results i.e. any code change in the core library could jeopardize the trust in CI build results. So, we put our hope on the EGL & Nvidia proposal. As you now struggle with the container used by the CI infrastructure, that's the right time to open the discussion to others. At the next TSC meeting (this Monday), we will talk to @michdolan who is part of the effort to have a physical machine on the CI infrastructure, about your effort and try to coordinate with his work. We will then have the complete status of the two efforts and be in a good position to discuss/define the next step(s). Note: As you put so much effort of that task I think that you should try to be at the meeting (https://zoom.us/j/924729729 at 12:30pm Eastern time). Note that the TSC meetings are open to anyone. But we always have meeting minutes few days later if you cannot be present. As @doug-walker mentioned, we still think that the pull request improves the GPU unit test infrastructure. So, I will review it to merge it when ready. |
@hodoulp Thank you for explaining the future steps; would love to help. As I am also new to EGL and headless rendering, my trajectory of investigation isn't as coherent as I would like it to be; Instead of the X server issue I suspected previously, now I think it could be the GPU-in-container or GPU-in-VM problem. I will share my findings in the meeting! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ChinYing-Li ,
You are now close to have the pull request ready to merge. There are still some cleanup to do i.e. nothing major.
.github/workflows/ci_workflow.yml
Outdated
@@ -65,6 +65,8 @@ jobs: | |||
build-type: Release | |||
build-shared: 'ON' | |||
build-docs: 'ON' | |||
build-gpu: 'ON' | |||
use-headless: 'ON' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the new approach, you must now have all build-gpu
& use-headless
be disabled.
CMakeLists.txt
Outdated
set(OCIO_GL_ENABLED OFF) | ||
endif() | ||
# OpenGL_egl_Library is defined iff GLVND is supported (CMake 10+) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two distinct blocks of code here so you should add an empty line between line 95 and 96.
CMakeLists.txt
Outdated
set(OCIO_EGL_HEADLESS OFF) | ||
else() | ||
add_compile_definitions(OCIO_HEADLESS_ENABLED) | ||
add_compile_definitions(GLEW_EGL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GLEW_EGL
is never used so it must be removed.
CMakeLists.txt
Outdated
@@ -221,3 +259,4 @@ if(OCIO_BUILD_DOCS) | |||
add_subdirectory(docs) | |||
endif() | |||
add_subdirectory(vendor) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extra line does not add anything please remove it.
} | ||
|
||
EGLint eglMajor, eglMinor; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this empty line to have the parameters close to where there are used.
Thank you for the review, and sorry that I did not carefully abide with the coding style; I will avoid this kind of mistake in the future. Some commits are squashed as appropriate; please let me know if more squashing is needed. |
Hi @ChinYing-Li , Your are really close. But there are still few things to do:
|
Signed-off-by: ChinYing-Li <[email protected]>
Signed-off-by: ChinYing-Li <[email protected]>
Made OglApp's destructor virtual Signed-off-by: ChinYing-Li <[email protected]>
Signed-off-by: ChinYing-Li <[email protected]>
Signed-off-by: ChinYing-Li <[email protected]>
Change the CI workflow to build with headless option Signed-off-by: ChinYing-Li <[email protected]>
Remove unused variables (#1039) Include glext.h and debug print (#1039) Check GLEW initialization (#1039) Signed-off-by: ChinYing-Li <[email protected]>
Signed-off-by: ChinYing-Li <[email protected]>
Remove unused variables (#1039) Define GLEW_EGL preprocessor for NVidia implementation (#1039) Signed-off-by: ChinYing-Li <[email protected]>
Add the factory method for creating OglAppRcPtr Modify CMakeLists (#1039) Signed-off-by: ChinYing-Li <[email protected]>
Signed-off-by: ChinYing-Li <[email protected]>
Signed-off-by: ChinYing-Li <[email protected]>
Signed-off-by: ChinYing-Li <[email protected]>
Signed-off-by: ChinYing-Li <[email protected]>
Signed-off-by: ChinYing-Li <[email protected]>
Thanks for the great work on this @ChinYing-Li ! |
* Modified CMakeLists.txt Signed-off-by: ChinYing-Li <[email protected]> * Support headless rendering in Linux build with EGL (AcademySoftwareFoundation#1039) Signed-off-by: ChinYing-Li <[email protected]> * Fix CMakeLists bugs Made OglApp's destructor virtual Signed-off-by: ChinYing-Li <[email protected]> * Remove bugs in app's CMakeLists (AcademySoftwareFoundation#1039) Signed-off-by: ChinYing-Li <[email protected]> * Modified CMakeLists and add factory function for OglAppRcPtr (AcademySoftwareFoundation#1039) Signed-off-by: ChinYing-Li <[email protected]> * Use imported target to find EGL (AcademySoftwareFoundation#1039) Change the CI workflow to build with headless option Signed-off-by: ChinYing-Li <[email protected]> * Modify CMakeLists to properly link EGL (AcademySoftwareFoundation#1039) Remove unused variables (AcademySoftwareFoundation#1039) Include glext.h and debug print (AcademySoftwareFoundation#1039) Check GLEW initialization (AcademySoftwareFoundation#1039) Signed-off-by: ChinYing-Li <[email protected]> * Add debug print for HeadlessApp initialization (AcademySoftwareFoundation#1039) Signed-off-by: ChinYing-Li <[email protected]> * Modify CMakeLists to accomodate system that support GLVND (AcademySoftwareFoundation#1039) Remove unused variables (AcademySoftwareFoundation#1039) Define GLEW_EGL preprocessor for NVidia implementation (AcademySoftwareFoundation#1039) Signed-off-by: ChinYing-Li <[email protected]> * Fix CMakeLists (AcademySoftwareFoundation#1039) Add the factory method for creating OglAppRcPtr Modify CMakeLists (AcademySoftwareFoundation#1039) Signed-off-by: ChinYing-Li <[email protected]> * Rename the factory method OglApp::CreateOglApp (AcademySoftwareFoundation#1039) Signed-off-by: ChinYing-Li <[email protected]> * Change workflow to check the GL vendor of CI Linux build (AcademySoftwareFoundation#1039) Signed-off-by: ChinYing-Li <[email protected]> * Add proper mechanism to detect GLVND support in CmakeLists (AcademySoftwareFoundation#1039) Signed-off-by: ChinYing-Li <[email protected]> * Reformat the code (AcademySoftwareFoundation#1039) Signed-off-by: ChinYing-Li <[email protected]> * Turn off GPU unit test in CI (AcademySoftwareFoundation#1039) Signed-off-by: ChinYing-Li <[email protected]>
Clean PR for #1039 .