Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Headless rendering #1047

Merged
merged 16 commits into from
Jul 9, 2020
Merged

Headless rendering #1047

merged 16 commits into from
Jul 9, 2020

Conversation

ChinYing-Li
Copy link
Contributor

Clean PR for #1039 .

Copy link
Member

@hodoulp hodoulp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work.

The CI build currently demonstrates that the pull request does not break the compilation but it does not demonstrate that the EGL & GPU unit tests work on headless machines (yes, you did it on your machine but the CI must also validate it).

To do so, you have to change a little bit the script controlling the CI build:

Hoping it will successfully build the first Linux build using EGL, and GPU unit tests will all succeed.

CMakeLists.txt Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
src/apps/ocioconvert/main.cpp Outdated Show resolved Hide resolved
src/apps/ociochecklut/CMakeLists.txt Outdated Show resolved Hide resolved
Copy link
Member

@hodoulp hodoulp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were able to have a successful build on your local machine but the two 'headless' builds failed on the GPU unit tests!

The Linux crashed right from the start:

    Start 4: test_gpu

4: Test command: /__w/OpenColorIO/OpenColorIO/_build/tests/gpu/test_gpu_exec
4: Test timeout computed to be: 10000000
4: 
4: EGL could not be initialized.
4/5 Test #4: test_gpu .........................***Failed    0.01 sec

And the macOS failed with the usual errors with 'emulation'

test 4
    Start 4: test_gpu

4: Test command: /Users/runner/work/OpenColorIO/OpenColorIO/_build/tests/gpu/test_gpu_exec
4: Test timeout computed to be: 10000000
4: 
4: GL Vendor:    Apple Inc.
4: GL Renderer:  Apple Software Renderer
4: GL Version:   2.1 APPLE-17.10.22
4: GLSL Version: 1.20
4: 
4:  OpenColorIO_Core_GPU_Unit_Tests
4: 
4: [  1/147] [CDLOp / clamp_fwd_v1_legacy_shader                ] - FAILED - 
4: Maximum error: 0.06166547909 at pixel: 3 on component 0 larger than epsilon.
4: scr = {0, 0, 0, inf}
4: cpu = {0.06166547909, 0, 0.06061341241, inf}
4: gpu = {0, 0, 0.06061341241, inf}
4: absolute tolerance=9.999999975e-07

CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
src/libutils/oglapphelpers/oglapp.h Outdated Show resolved Hide resolved
@ChinYing-Li
Copy link
Contributor Author

Thank you for the feedback.
The EGL could not be initialized. could potentially be solved by linking to libOpenGL.so in root CMakeLists.txt's if(${OCIO_IS_NVIDIA}) conditional. However, Cmake is not able to pick up GLU that comes with NVidia's GPU, and I am looking into this.

@@ -113,6 +118,10 @@ class OglApp
// Helper to print GL info.
void virtual printGLInfo() const noexcept;

// Return a pointer of either ScreenApp or HeadlessApp depending on the
// OCIO_HEADLESS_ENABLED preprocessor
static OglAppRcPtr createOglAppPtr(const char * winTitle, int winWidth, int winHeight);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to mention that any static method starts with an upper case and that the returned type does not add anything to understand the method purpose. So, the method name should be CreateOglApp()

@hodoulp
Copy link
Member

hodoulp commented Jun 30, 2020

Hi @ChinYing-Li,

The Linux & macOs build break is (here):

[ 73%] Linking CXX executable ociochecklut
Undefined symbols for architecture x86_64:
  "OpenColorIO_v2_0dev::getOglAppPtr(char const*, int, int)", referenced from:
      (anonymous namespace)::ProcessorWrapper::setGPU(std::__1::shared_ptr<OpenColorIO_v2_0dev::GPUProcessor const>, bool) in main.cpp.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [src/apps/ociochecklut/ociochecklut] Error 1
make[1]: *** [src/apps/ociochecklut/CMakeFiles/ociochecklut.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

Note: Do not forget that you can use Docker to build on different Linux flavors & compilers when investigations are needed or for build fix validations.

@ChinYing-Li
Copy link
Contributor Author

Hi, Thanks for the feedback.
After some investigation, I believe most problems is associated with the EGL implementation, which has more to do with the GPU vendor. The local build on my machine works since I am using Mesa's implementation, while the CI Linux CentOS build has a NVidia graphics card. Unfortunately I don't have a Nvidia GPU, so Docker doesn't seem to be a solution for testing (please let me know if I am wrong).
I am assuming that using the native EGL implementation from NVidia is more preferable (though Mesa can also be installed on machine with Nvidia GPU)

@hodoulp
Copy link
Member

hodoulp commented Jun 30, 2020

Hi,
That's the challenge when working on a task with an important part fo investigation. However, it works on your machine (i.e. the GPU unit tests succeeded on top of a Mesa implementation) so there is path to success somewhere.

I am assuming that using the native EGL implementation from NVidia is more preferable

Yes it is.

Unfortunately I don't have a Nvidia GPU, so Docker doesn't seem to be a solution for testing (please let me know if I am wrong).

But Docker with Mesa could help to 'finalize' the pull request.

For short-term:

  • The build break fix is only to remove the 'former' getOglAppPtr() declaration from oglapp.h
  • The Nvidia detection using CUDA, does not work on my Windows machine. The graphic card is:
GL Vendor:    NVIDIA Corporation
GL Renderer:  Quadro RTX 6000/PCIe/SSE2
GL Version:   4.6.0 NVIDIA 417.71
GLSL Version: 4.60 NVIDIA

Is there another way (i.e. not CUDA) to detect Nvidia graphic card?

@hodoulp
Copy link
Member

hodoulp commented Jun 30, 2020

Hi @ChinYing-Li
As you were able to successfully run the unit tests on your machine, you could check for differences between your machine and the CI build machine. Note that the GPU unit framework prints some GPU information before running that could potentially help.

Could you list all what you did to make it work (i.e. package installed, etc.)?

@doug-walker
Copy link
Collaborator

@ChinYing-Li , thanks for all the great work on this!

I notice that your print out of the GPU unit test run in issue #1039 show all tests passing but the test results that Patrick pasted above show failures. Both his test and yours seem to have been done on a Mac. The difference seems to be that Patrick was not running the same GL/EGL version as you. I also have a Mac and could try running your tests. Could you provide some info about what you did to install EGL/Mesa on your Mac? Thanks in advance!

@ChinYing-Li
Copy link
Contributor Author

ChinYing-Li commented Jul 1, 2020

@hodoulp
By "finalize", do you mean creating a working docker build with successful gpu unit tests?
And I agree that findCUDA isn't a good way for checking Nvidia GPU. The reason for checking the vendor is to see whether GLVND is supported. Some Linux also supports GLVND, and for some version of Cmake, "GLVND is currently the only way to get OpenGL 3+ functionality via EGL in a manner portable across vendors."

Currently, the CI Linux CentOS build fails to initialize EGL, and I suspect that its Nvidia GPU and the graphics driver may the culprit. Several possibilities:

  1. need to unset DISPLAY temporarily, as suggested by some posts: https://github.com/vispy/vispy/pull/1464/files

  2. The NVidia-provided driver simply have bugs. (their driver does have a record of EGL bugs) I will do more research on this.

I will post my steps of installing Mesa (EGL backend) shortly.

@ChinYing-Li
Copy link
Contributor Author

ChinYing-Li commented Jul 1, 2020

@doug-walker
I am using a MacBook Air with dual boot: OSX Catalina and Ubuntu 20.04.
RIght now I don't think Mesa can be installed in OSX environment.
The result of headless GPU unit tests on Ubuntu is as below:

cyli@cyli-MacBookAir:~/repo/OCIO_exp_build/tests/gpu$ ./test_gpu_exec
GPU unit tests used N19OpenColorIO_v2_0dev11HeadlessAppE

GL Vendor:    Intel Open Source Technology Center
GL Renderer:  Mesa DRI Intel(R) HD Graphics 5000 (HSW GT3)
GL Version:   3.0 Mesa 20.0.4
GLSL Version: 1.30

EGL Vendor:    Mesa Project
EGL Version:   1.4

OpenColorIO_Core_GPU_Unit_Tests

# test results ommitted 
#    .
#    .
#    .

[145/147] [RangeOp / arbitrary_1_no_clamp                    ] - PASSED - (MaxDiff: 1.19209e-07 at pix[43709][0])
[146/147] [RangeOp / arbitrary_2                             ] - PASSED - (MaxDiff: 1.19209e-07 at pix[29207][1])
[147/147] [RangeOp / arbitrary_2_no_clamp                    ] - PASSED - (MaxDiff: 4.76837e-07 at pix[58016][2])

0 tests failed

I just made an OSX build of OCIO, and the result of OSX build's screen-rendering GPU unit tests is below:

(base) liqinyingde-MacBook-Air:gpu liqinying$ ./test_gpu_exec

GL Vendor:    Intel Inc.
GL Renderer:  Intel HD Graphics 5000 OpenGL Engine
GL Version:   2.1 INTEL-14.6.18
GLSL Version: 1.20

 OpenColorIO_Core_GPU_Unit_Tests

[  1/147] [CDLOp / clamp_fwd_v1_legacy_shader                ] - PASSED - (MaxDiff: 4.76837e-07 at pix[55224][0])
[  2/147] [CDLOp / clamp_fwd_v1                              ] - PASSED - (MaxDiff: 4.76837e-07 at pix[55224][0])
[  3/147] [CDLOp / clamp_fwd_v2                              ] - PASSED - (MaxDiff: 9.53674e-06 at pix[45324][2])
[  4/147] [CDLOp / clamp_fwd_no_clamp_v2                     ] - PASSED - (MaxDiff: 2.26498e-05 at pix[61405][0])
[  5/147] [CDLOp / clamp_inv_v2                              ] - PASSED - (MaxDiff: 1.04904e-05 at pix[37886][2])
[  6/147] [CDLOp / clamp_inv_no_clamp_v2                     ] - FAILED - 
Maximum error: 2.74181366e-05 at pixel: 65518 on component 1
Large number error: 2.74181366e-05 at pixel: 10 on component 2.
scr = {0, 0, -inf, 0}
cpu = {-0.03703703731, 0.2090909034, -0.1549295783, 0}
gpu = {-0.03703703731, 0.2090909034, -inf, 0}

[  7/147] [CDLOp / clamp_fwd_v1_legacy_shader_Data_2         ] - PASSED - (MaxDiff: 9.53674e-07 at pix[61067][0])
[  8/147] [CDLOp / clamp_fwd_v1_Data_2                       ] - PASSED - (MaxDiff: 9.53674e-07 at pix[61067][0])
[  9/147] [CDLOp / clamp_fwd_v2_Data_2                       ] - PASSED - (MaxDiff: 1.31726e-05 at pix[42094][0])
[ 10/147] [CDLOp / clamp_inv_v2_Data_2                       ] - PASSED - (MaxDiff: 1.09673e-05 at pix[5][1])
[ 11/147] [CDLOp / clamp_fwd_no_clamp_v2_Data_2              ] - PASSED - (MaxDiff: 2.43187e-05 at pix[57356][0])
[ 12/147] [CDLOp / clamp_inv_no_clamp_v2_Data_2              ] - PASSED - (MaxDiff: 2.21729e-05 at pix[65535][2])
[ 13/147] [CDLOp / clamp_fwd_v2_Data_3                       ] - PASSED - (MaxDiff: 1.06096e-05 at pix[28596][0])
[ 14/147] [CDLOp / clamp_fwd_no_clamp_v2_Data_3              ] - PASSED - (MaxDiff: 4.22001e-05 at pix[59867][0])
[ 15/147] [CDLOp / clamp_inv_no_clamp_v2_Data_3              ] - PASSED - (MaxDiff: 1.18017e-05 at pix[65535][2])
[ 16/147] [Config / several_1D_luts_legacy_shader            ] - PASSED - (MaxDiff: 0.000164688 at pix[43629][0])
[ 17/147] [Config / several_1D_luts_generic_shader           ] - PASSED - (MaxDiff: 1.19209e-07 at pix[40756][0])
[ 18/147] [Config / arbitrary_generic_shader                 ] - PASSED - (MaxDiff: 5.72205e-06 at pix[21868][0])
[ 19/147] [Config / several_luts_generic_shader              ] - PASSED - (MaxDiff: 2.98023e-08 at pix[21885][1])
[ 20/147] [Config / with_underscores                         ] - PASSED - (MaxDiff: 2.98023e-08 at pix[51918][1])
[ 21/147] [ExposureContrast / style_linear_fwd               ] - PASSED - (MaxDiff: 9.56893e-06 at pix[57714][2])
[ 22/147] [ExposureContrast / style_linear_rev               ] - PASSED - (MaxDiff: 1.66554e-05 at pix[54748][2])
[ 23/147] [ExposureContrast / style_video_fwd                ] - PASSED - (MaxDiff: 9.57367e-06 at pix[43692][0])
[ 24/147] [ExposureContrast / style_video_rev                ] - PASSED - (MaxDiff: 1.74816e-05 at pix[55074][2])
[ 25/147] [ExposureContrast / style_log_fwd                  ] - PASSED - (MaxDiff: 1.97091e-07 at pix[52512][0])
[ 26/147] [ExposureContrast / style_log_rev                  ] - PASSED - (MaxDiff: 0 at pix[0][0])
[ 27/147] [ExposureContrast / style_linear_dynamic_parameter ] - PASSED - (MaxDiff: 1.11099e-05 at pix[63081][1])
[ 28/147] [ExposureContrast / dp_several_one_dynamic         ] - PASSED - (MaxDiff: 2.37304e-07 at pix[46549][2])
[ 29/147] [ExposureContrast / dp_several_both_dynamic        ] - PASSED - (MaxDiff: 2.38197e-07 at pix[43904][1])
[ 30/147] [FixedFunction / style_aces_redmod03_fwd           ] - FAILED - 
Maximum error: 1.490116119e-06 at pixel: 0 on component 0 larger than epsilon.
scr = {0.8999999762, 0.05000000075, 0.2199999988, 0.5}
cpu = {0.7967003584, 0.05000000075, 0.1993400753, 0.5}
gpu = {0.7966988683, 0.05000000075, 0.1993397623, 0.5}
absolute tolerance=9.999999975e-07
[ 31/147] [FixedFunction / style_aces_redmod03_inv           ] - FAILED - 
Maximum error: 1.907348633e-06 at pixel: 0 on component 0 larger than epsilon.
scr = {0.8999999762, 0.05000000075, 0.2199999988, 0.5}
cpu = {1.018126011, 0.05000000075, 0.2436252087, 0.5}
gpu = {1.018127918, 0.05000000075, 0.2436255813, 0.5}
absolute tolerance=9.999999975e-07
[ 32/147] [FixedFunction / style_aces_redmod10_fwd           ] - FAILED - 
Maximum error: 1.549720764e-06 at pixel: 0 on component 0 larger than epsilon.
scr = {0.8999999762, 0.05000000075, 0.2199999988, 0.5}
cpu = {0.77148211, 0.05000000075, 0.2199999988, 0.5}
gpu = {0.7714805603, 0.05000000075, 0.2199999988, 0.5}
absolute tolerance=9.999999975e-07
[ 33/147] [FixedFunction / style_aces_redmod10_inv           ] - FAILED - 
Maximum error: 2.264976501e-06 at pixel: 0 on component 0 larger than epsilon.
scr = {0.8999999762, 0.05000000075, 0.2199999988, 0.5}
cpu = {1.052301764, 0.05000000075, 0.2199999988, 0.5}
gpu = {1.052304029, 0.05000000075, 0.2199999988, 0.5}
absolute tolerance=9.999999975e-07
[ 34/147] [FixedFunction / style_aces_glow03_fwd             ] - PASSED - (MaxDiff: 0 at pix[0][0])
[ 35/147] [FixedFunction / style_aces_glow03_inv             ] - PASSED - (MaxDiff: 0 at pix[0][0])

#.  Omitting successful tests
# 
# 
# 

[130/147] [MatrixOps / matrix                                ] - FAILED - 
Maximum error: 9.536743164e-07 at pixel: 65322 on component 1 larger than epsilon.
scr = {1.990751266, 1.990762711, 1.990774155, 1.990785837}
cpu = {3.98152566, 7.16676712, 3.185235262, 5.574174881}
gpu = {3.98152566, 7.166766167, 3.185235262, 5.574174881}
absolute tolerance=4.999999987e-07
[131/147] [MatrixOps / scale                                 ] - PASSED - (MaxDiff: 0 at pix[0][0])
[132/147] [MatrixOps / offset                                ] - PASSED - (MaxDiff: 0 at pix[0][0])
[133/147] [MatrixOps / matrix_offset                         ] - FAILED - 
Maximum error: 9.536743164e-07 at pixel: 65322 on component 1 larger than epsilon.
scr = {1.990751266, 1.990762711, 1.990774155, 1.990785837}
cpu = {3.48152566, 6.91676712, 3.435235262, 5.574174881}
gpu = {3.48152566, 6.916766167, 3.435235262, 5.574174881}
absolute tolerance=4.999999987e-07
[134/147] [MatrixOps / matrix_inverse                        ] - PASSED - (MaxDiff: 2.38419e-07 at pix[57885][0])
[135/147] [MatrixOps / scale_inverse                         ] - PASSED - (MaxDiff: 0 at pix[0][0])
[136/147] [MatrixOps / offset_inverse                        ] - PASSED - (MaxDiff: 0 at pix[0][0])
[137/147] [MatrixOps / matrix_offset_inverse                 ] - PASSED - (MaxDiff: 2.38419e-07 at pix[41222][0])
[138/147] [MatrixOps / matrix_offset_generic_shader          ] - FAILED - 
Maximum error: 9.536743164e-07 at pixel: 65322 on component 1 larger than epsilon.
scr = {1.990751266, 1.990762711, 1.990774155, 1.990785837}
cpu = {3.98152566, 6.91676712, 3.435235262, 5.574174881}
gpu = {3.98152566, 6.916766167, 3.435235262, 5.574174881}
absolute tolerance=4.999999987e-07
[139/147] [MatrixOps / matrix_offset_inverse_generic_shader  ] - PASSED - (MaxDiff: 2.38419e-07 at pix[41222][0])
[140/147] [RangeOp / scale_with_low_and_high_clippings       ] - PASSED - (MaxDiff: 0 at pix[0][0])
[141/147] [RangeOp / scale_with_low_clipping                 ] - PASSED - (MaxDiff: 0 at pix[0][0])
[142/147] [RangeOp / scale_with_high_clipping                ] - PASSED - (MaxDiff: 0 at pix[0][0])
[143/147] [RangeOp / scale_with_low_and_high_clippings_2     ] - PASSED - (MaxDiff: 0 at pix[0][0])
[144/147] [RangeOp / arbitrary_1                             ] - PASSED - (MaxDiff: 5.96046e-08 at pix[32774][1])
[145/147] [RangeOp / arbitrary_1_no_clamp                    ] - PASSED - (MaxDiff: 1.19209e-07 at pix[43709][0])
[146/147] [RangeOp / arbitrary_2                             ] - PASSED - (MaxDiff: 1.19209e-07 at pix[29207][1])
[147/147] [RangeOp / arbitrary_2_no_clamp                    ] - PASSED - (MaxDiff: 4.76837e-07 at pix[58016][2])

8 tests failed

@ChinYing-Li ChinYing-Li mentioned this pull request Jul 1, 2020
@doug-walker
Copy link
Collaborator

@ChinYing-Li , thanks for clarifying that your earlier results were actually Ubuntu, even though they were obtained on Mac hardware. Your new results on MacOS seem similar to Patrick's results and are similar to our earlier failed attempts at getting the GPU tests to succeed on the Mac via a software emulated GPU.

The timing of your work on this is great! @michdolan has been working on trying to get the GPU tests running on our CI infrastructure with help from @jfpanisset and the Linux Foundation. JF was the one that originally suggested that EGL may help with that effort. So I think once you and Patrick are happy with your work that we should merge it, since it might be helpful to get the GPU CI working on Linux. As discussed before, we will need to leave the alternate mechanism in place for Mac until an EGL solution is possible.

@ChinYing-Li
Copy link
Contributor Author

The CI virtual CentOS machine can’t initialize glut in the case of ScreenApp or egl in the case of HeadlessApp.
The output from setup suggested that the docker is a nvidia-docker that has OpenGL and CUDA.

CUDA_VERSION=10.2.89
CUDA_PKG_VERSION=10-2-10.2.89-1
LD_LIBRARY_PATH=/usr/local/lib:/usr/local/lib64:/opt/rh/devtoolset-6/root/usr/lib64:/opt/rh/devtoolset-6/root/usr/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics
NVIDIA_REQUIRE_CUDA=cuda>=10.2 brand=tesla,driver>=384,driver<385 brand=tesla,driver>=396,driver<397 brand=tesla,driver>=410,driver<411 brand=tesla,driver>=418,driver<419

However, the nvidia-smi executable is not found in the container.

Questions:

  1. Have OCIO devs ran GPU unit tests on CI CentOS build successfully before under the current docker configuration? Right now even ScreenApp can’t be initialized.
  2. @hodoulp I’ve updated the root CMakeLists to properly determine the existence of GLVND support, can you please try this and run headless GPU unit tests on your machine?

Referencing my previous comment:
unset DISPLAY did not solve the problem.

I wonder if it’s possible to add display to NVIDIA_DRIVER_CAPABILITIES. https://github.com/NVIDIA/nvidia-container-runtime
It has been mentioned that both libEGL.so and libOpenGL.so do not have any X11 dependencies: NVIDIA/libglvnd#175
But addingdisplay to NVIDIA_DRIVER_CAPABILITIES might at least get ScreenApp to work.

@hodoulp
Copy link
Member

hodoulp commented Jul 3, 2020

Hi @ChinYing-Li

Thanks a lot for the effort and your excellent work on that difficult problematic i.e. running GPU tests on headless machines. The start was to have a way (i.e. using EGL & Nvidia 'special' drivers as suggested by @jfpanisset) to run GPU unit tests on headless machine (i.e. the current CI infrastructure is only using 'virtual' machines). That's the challenge.

However, we previously made several attempts to run the GPU unit tests without any code modification, always facing the X11 server issue as you discovered. An 'implicit' mandatory requirement of the 'challenge' is to guaranty that running on headless machines still produce valid results i.e. any code change in the core library could jeopardize the trust in CI build results. So, we put our hope on the EGL & Nvidia proposal.

As you now struggle with the container used by the CI infrastructure, that's the right time to open the discussion to others. At the next TSC meeting (this Monday), we will talk to @michdolan who is part of the effort to have a physical machine on the CI infrastructure, about your effort and try to coordinate with his work. We will then have the complete status of the two efforts and be in a good position to discuss/define the next step(s).

Note: As you put so much effort of that task I think that you should try to be at the meeting (https://zoom.us/j/924729729 at 12:30pm Eastern time). Note that the TSC meetings are open to anyone. But we always have meeting minutes few days later if you cannot be present.

As @doug-walker mentioned, we still think that the pull request improves the GPU unit test infrastructure. So, I will review it to merge it when ready.

@ChinYing-Li
Copy link
Contributor Author

@hodoulp Thank you for explaining the future steps; would love to help.
EGL, as described in Nvidia's blog post, should be able to render to a headless context without starting a X session.
I did some further investigation and added code referencing this post in the headless-nvidia-test branch to explicitly pick a device (gpu, that is) for rendering.
After calling eglQueryDevicesEXT(MAX_DEVICES, eglDevs, &numDevices), numDevice is 0: EGL isn't seeing any GPU or virtual GPU. I would guess this is the reason EGL isn't able to get a display.

As I am also new to EGL and headless rendering, my trajectory of investigation isn't as coherent as I would like it to be; Instead of the X server issue I suspected previously, now I think it could be the GPU-in-container or GPU-in-VM problem. I will share my findings in the meeting!

Copy link
Member

@hodoulp hodoulp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ChinYing-Li ,
You are now close to have the pull request ready to merge. There are still some cleanup to do i.e. nothing major.

@@ -65,6 +65,8 @@ jobs:
build-type: Release
build-shared: 'ON'
build-docs: 'ON'
build-gpu: 'ON'
use-headless: 'ON'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the new approach, you must now have all build-gpu& use-headless be disabled.

CMakeLists.txt Outdated
set(OCIO_GL_ENABLED OFF)
endif()
# OpenGL_egl_Library is defined iff GLVND is supported (CMake 10+)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two distinct blocks of code here so you should add an empty line between line 95 and 96.

CMakeLists.txt Show resolved Hide resolved
CMakeLists.txt Outdated
set(OCIO_EGL_HEADLESS OFF)
else()
add_compile_definitions(OCIO_HEADLESS_ENABLED)
add_compile_definitions(GLEW_EGL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GLEW_EGL is never used so it must be removed.

CMakeLists.txt Outdated
@@ -221,3 +259,4 @@ if(OCIO_BUILD_DOCS)
add_subdirectory(docs)
endif()
add_subdirectory(vendor)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra line does not add anything please remove it.

src/libutils/oglapphelpers/oglapp.h Outdated Show resolved Hide resolved
tests/gpu/GPUUnitTest.cpp Outdated Show resolved Hide resolved
tests/gpu/GPUUnitTest.cpp Outdated Show resolved Hide resolved
tests/gpu/GPUUnitTest.cpp Outdated Show resolved Hide resolved
tests/gpu/GPUUnitTest.cpp Outdated Show resolved Hide resolved
}

EGLint eglMajor, eglMinor;

Copy link
Member

@hodoulp hodoulp Jul 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this empty line to have the parameters close to where there are used.

@ChinYing-Li
Copy link
Contributor Author

Thank you for the review, and sorry that I did not carefully abide with the coding style; I will avoid this kind of mistake in the future. Some commits are squashed as appropriate; please let me know if more squashing is needed.

@hodoulp
Copy link
Member

hodoulp commented Jul 8, 2020

Hi @ChinYing-Li ,

Your are really close. But there are still few things to do:

  1. Update your branch using github
  2. In your branch:
    a) git pull
    b) Perform the DCO changes https://github.com/AcademySoftwareFoundation/OpenColorIO/pull/1047/checks?check_run_id=849966276
    c) In the file .github/workflows/ci_workflow.yml, you forgot to disable some build-gpu& use-headless
    d) git commit -sam "blabla"
    e) git push

@ChinYing-Li ChinYing-Li changed the base branch from master to RB-1.1 July 8, 2020 20:41
@ChinYing-Li ChinYing-Li changed the base branch from RB-1.1 to master July 8, 2020 20:41
Signed-off-by: ChinYing-Li <[email protected]>
Made OglApp's destructor virtual

Signed-off-by: ChinYing-Li <[email protected]>
Change the CI workflow to build with headless option

Signed-off-by: ChinYing-Li <[email protected]>
Remove unused variables (#1039)

Include glext.h and debug print (#1039)

Check GLEW initialization (#1039)

Signed-off-by: ChinYing-Li <[email protected]>
@doug-walker
Copy link
Collaborator

Thanks for the great work on this @ChinYing-Li !

@hodoulp hodoulp merged commit fca6cf0 into AcademySoftwareFoundation:master Jul 9, 2020
michdolan pushed a commit to michdolan/OpenColorIO that referenced this pull request Jul 13, 2020
* Modified CMakeLists.txt

Signed-off-by: ChinYing-Li <[email protected]>

* Support headless rendering in Linux build with EGL (AcademySoftwareFoundation#1039)

Signed-off-by: ChinYing-Li <[email protected]>

* Fix CMakeLists bugs
Made OglApp's destructor virtual

Signed-off-by: ChinYing-Li <[email protected]>

* Remove bugs in app's CMakeLists (AcademySoftwareFoundation#1039)

Signed-off-by: ChinYing-Li <[email protected]>

* Modified CMakeLists and add factory function for OglAppRcPtr (AcademySoftwareFoundation#1039)

Signed-off-by: ChinYing-Li <[email protected]>

* Use imported target to find EGL (AcademySoftwareFoundation#1039)
Change the CI workflow to build with headless option

Signed-off-by: ChinYing-Li <[email protected]>

* Modify CMakeLists to properly link EGL (AcademySoftwareFoundation#1039)

Remove unused variables (AcademySoftwareFoundation#1039)

Include glext.h and debug print (AcademySoftwareFoundation#1039)

Check GLEW initialization (AcademySoftwareFoundation#1039)

Signed-off-by: ChinYing-Li <[email protected]>

* Add debug print for HeadlessApp initialization (AcademySoftwareFoundation#1039)

Signed-off-by: ChinYing-Li <[email protected]>

* Modify CMakeLists to accomodate system that support GLVND (AcademySoftwareFoundation#1039)

Remove unused variables (AcademySoftwareFoundation#1039)

Define GLEW_EGL preprocessor for NVidia implementation (AcademySoftwareFoundation#1039)

Signed-off-by: ChinYing-Li <[email protected]>

* Fix CMakeLists (AcademySoftwareFoundation#1039)
Add the factory method for creating OglAppRcPtr

Modify CMakeLists (AcademySoftwareFoundation#1039)

Signed-off-by: ChinYing-Li <[email protected]>

* Rename the factory method OglApp::CreateOglApp (AcademySoftwareFoundation#1039)

Signed-off-by: ChinYing-Li <[email protected]>

* Change workflow to check the GL vendor of CI Linux build (AcademySoftwareFoundation#1039)

Signed-off-by: ChinYing-Li <[email protected]>

* Add proper mechanism to detect GLVND support in CmakeLists (AcademySoftwareFoundation#1039)

Signed-off-by: ChinYing-Li <[email protected]>

* Reformat the code (AcademySoftwareFoundation#1039)

Signed-off-by: ChinYing-Li <[email protected]>

* Turn off GPU unit test in CI (AcademySoftwareFoundation#1039)

Signed-off-by: ChinYing-Li <[email protected]>
@jfpanisset jfpanisset mentioned this pull request Aug 25, 2020
@doug-walker doug-walker mentioned this pull request Apr 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants