Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing mount og nvoptix.bin from libnvidia-gl-535 #127

Open
agirault opened this issue Oct 23, 2023 · 14 comments
Open

Missing mount og nvoptix.bin from libnvidia-gl-535 #127

agirault opened this issue Oct 23, 2023 · 14 comments

Comments

@agirault
Copy link

agirault commented Oct 23, 2023

Enabling Optix denoise requires the /usr/share/nvidia/nvoptix.bin file which is installed as part of libnvidia-gl-<ver> package but not present in containers with nvidia ctk runtime.

Workaround for Holoscan: https://github.com/nvidia-holoscan/holohub/pull/112/files

Content of libnvidia-gl-535

dpkg -L libnvidia-gl-535 | xargs -I % sh -c '[ -f "%" ] && echo "%"'
  • x86_64:
/usr/bin/nvidia-ngx-updater
/usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.535.86.05
/usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.535.86.05
/usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.535.86.05
/usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.535.86.05
/usr/lib/x86_64-linux-gnu/libnvidia-api.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-egl-gbm.so.1.1.0
/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.86.05
/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.535.86.05
/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.535.86.05
/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.535.86.05
/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.535.86.05
/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.535.86.05
/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.535.86.05
/usr/lib/x86_64-linux-gnu/libnvidia-vulkan-producer.so.535.86.05
/usr/lib/x86_64-linux-gnu/libnvoptix.so.535.86.05
/usr/lib/x86_64-linux-gnu/nvidia/wine/_nvngx.dll
/usr/lib/x86_64-linux-gnu/nvidia/wine/nvngx.dll
/usr/lib/x86_64-linux-gnu/nvidia/xorg/libglxserver_nvidia.so.535.86.05
/usr/share/doc/libnvidia-gl-535/changelog.Debian.gz
/usr/share/doc/libnvidia-gl-535/copyright
/usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json
/usr/share/glvnd/egl_vendor.d/10_nvidia.json
/usr/share/lintian/overrides/libnvidia-gl-535
/usr/share/nvidia/nvoptix.bin
/usr/share/vulkan/icd.d/nvidia_icd.json
/usr/share/vulkan/implicit_layer.d/nvidia_layers.json
/usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0
/usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.1
/usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.2
/usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0
/usr/lib/x86_64-linux-gnu/libnvoptix.so.1
/usr/lib/x86_64-linux-gnu/nvidia/xorg/libglxserver_nvidia.so
  • aarch64:
/usr/lib/aarch64-linux-gnu/libEGL_nvidia.so.535.86.10
/usr/lib/aarch64-linux-gnu/libGLESv1_CM_nvidia.so.535.86.10
/usr/lib/aarch64-linux-gnu/libGLESv2_nvidia.so.535.86.10
/usr/lib/aarch64-linux-gnu/libGLX_nvidia.so.535.86.10
/usr/lib/aarch64-linux-gnu/libnvidia-egl-gbm.so.1.1.0
/usr/lib/aarch64-linux-gnu/libnvidia-eglcore.so.535.86.10
/usr/lib/aarch64-linux-gnu/libnvidia-glcore.so.535.86.10
/usr/lib/aarch64-linux-gnu/libnvidia-glsi.so.535.86.10
/usr/lib/aarch64-linux-gnu/libnvidia-glvkspirv.so.535.86.10
/usr/lib/aarch64-linux-gnu/libnvidia-ngx.so.535.86.10
/usr/lib/aarch64-linux-gnu/libnvidia-rtcore.so.535.86.10
/usr/lib/aarch64-linux-gnu/libnvidia-tls.so.535.86.10
/usr/lib/aarch64-linux-gnu/libnvoptix.so.535.86.10
/usr/lib/aarch64-linux-gnu/nvidia/xorg/libglxserver_nvidia.so.535.86.10
/usr/share/doc/libnvidia-gl-535/changelog.Debian.gz
/usr/share/doc/libnvidia-gl-535/copyright
/usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json
/usr/share/glvnd/egl_vendor.d/10_nvidia.json
/usr/share/nvidia/nvoptix.bin
/usr/share/vulkan/icd.d/nvidia_icd.json
/usr/share/vulkan/icd.d/nvidia_layers.json
/usr/lib/aarch64-linux-gnu/libEGL_nvidia.so.0
/usr/lib/aarch64-linux-gnu/libGLESv1_CM_nvidia.so.1
/usr/lib/aarch64-linux-gnu/libGLESv2_nvidia.so.2
/usr/lib/aarch64-linux-gnu/libGLX_nvidia.so.0
/usr/lib/aarch64-linux-gnu/libnvidia-ngx.so.1
/usr/lib/aarch64-linux-gnu/libnvoptix.so.1
/usr/lib/aarch64-linux-gnu/nvidia/xorg/libglxserver_nvidia.so

Files not mounted with nvidia runtime

Run this command to test:

nv_gl_files=$(dpkg -L libnvidia-gl-535 | xargs -I % sh -c '[ -f "%" ] && echo "%"')
docker run -it --rm \
  --runtime=nvidia -e NVIDIA_DRIVER_CAPABILITIES=all --gpus=all \
  -e FILES="$nv_gl_files" \
  ubuntu:22.04 \
  bash -c '
    for file in $FILES; do
      [ ! -f "$file" ] && echo "Missing: $file"
    done
'
  • x86_64:
Missing: /usr/bin/nvidia-ngx-updater
Missing: /usr/lib/x86_64-linux-gnu/libnvidia-api.so.1
Missing: /usr/lib/x86_64-linux-gnu/libnvidia-vulkan-producer.so.535.86.05
Missing: /usr/lib/x86_64-linux-gnu/nvidia/wine/_nvngx.dll
Missing: /usr/lib/x86_64-linux-gnu/nvidia/wine/nvngx.dll
Missing: /usr/share/doc/libnvidia-gl-535/changelog.Debian.gz
Missing: /usr/share/doc/libnvidia-gl-535/copyright
Missing: /usr/share/lintian/overrides/libnvidia-gl-535
Missing: /usr/share/nvidia/nvoptix.bin
  • aarch64:
Missing: /usr/lib/aarch64-linux-gnu/libnvidia-egl-gbm.so.1.1.0
Missing: /usr/share/doc/libnvidia-gl-535/changelog.Debian.gz
Missing: /usr/share/doc/libnvidia-gl-535/copyright
Missing: /usr/share/nvidia/nvoptix.bin
Missing: /usr/share/vulkan/icd.d/nvidia_layers.json

Observations

  1. Why dll files on x86_64? /wine/nvngx.dll. Interestingly, there is no libnvidia-ngx.so.1 on x86_64 (vs aarch64).
  2. The missing nvidia-ngx-updater, libnvidia-api.so.1 and libnvidia-vulkan-producer.so.535 only exist on x86_64. Expected ? Need mounting?
  3. libnvidia-egl-gbm.so exist for both x86_64 and aarch64, but missing only in aarch64 containers.
  4. nvidia_layers.json is in icd.d on aarch64, instead of implicit_layer.d in x86_64. The former isn't mounted, while the latter is.
@agirault
Copy link
Author

cc @AndreasHeumann @jjomier

@elezar
Copy link
Member

elezar commented Nov 22, 2023

@agirault thanks for reporting this. Looking at the list of files, I think adding the following is relatively straightforward:

  • /usr/share/nvidia/nvoptix.bin

The following (for aarch64) is also not really a problem:

  • /usr/share/vulkan/icd.d/nvidia_layers.json
    but it would be good to confirm that there is no conflicting file at this location for x86_64 systems. Handling such a conflict is possible though, we just need an indication as to whether the additional effort is required there.

With regards to the libnvidia-egl-gbm.so file. Since the file actually included in the driver installation is libnvidia-egl-gbm.so.1.1.0 it would be good to understand which symlinks on the host (in either case) point to this file.

The same is required for libnvidia-api.so.1. Here it's key to know what this points to on the host -- since it's expected to be a symbolic link.

@elezar
Copy link
Member

elezar commented Nov 22, 2023

I have created https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/merge_requests/501 to add the processing of these files. If we can settle on a final list of missing ones that should be included we can get that in to an upcoming release candidate.

@elezar
Copy link
Member

elezar commented Dec 4, 2023

We have just released https://github.com/NVIDIA/nvidia-container-toolkit/releases/tag/v1.15.0-rc.1 that includes the injection of the nvoptix.bin file. The packages are available from our public experimental repositories.

Assuming these have been configured running:

sudo apt-get install -y \
    nvidia-container-toolkit=1.15.0~rc.1-1 \
    nvidia-container-toolkit-base=1.15.0~rc.1-1  \
    libnvidia-container-tools=1.15.0~rc.1-1 \
    libnvidia-container1=1.15.0~rc.1-1

should install the required packages.

@elezar
Copy link
Member

elezar commented Jan 23, 2024

Note that we have backported these changes to the release-0.14 branch and they are included in the v1.14.4 release.

@agirault if you get a chance to validate what is still missing that would be great.

@turowicz
Copy link

turowicz commented Apr 24, 2024

I think the issue is back with latest

apt list --installed | grep nvidia-container-toolkit

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

nvidia-container-toolkit-base/unknown,now 1.15.0-1 amd64 [installed,automatic]
nvidia-container-toolkit/unknown,now 1.15.0-1 amd64 [installed,automatic]

I recently started getting in Omniverse docker container for Isaac Sim:

Could not open optix denoiser weights file "/usr/share/nvidia/nvoptix.bin"

@elezar
Copy link
Member

elezar commented Apr 24, 2024

@turowicz first, could you confirm that the file exists on your host?

Then, which docker command are you running? Could you confirm that you are using the nvidia runtime and that the image has NVIDIA_DRIVER_CAPABILITIES=all set (alternatively add -e NVIDIA_DRIVER_CAPABILITIES=all to your docker command line).

The nvoptix.bin file is only injected if NVIDIA_DRIVER_CAPABILITIES include graphics or display.

@turowicz
Copy link

turowicz commented Apr 24, 2024

yes, to fix the error I had to -v /usr/share/nvidia/nvoptix.bin:/usr/share/nvidia/nvoptix.bin
I am using nvidia runtime through --gpus all
I don't use -e NVIDIA_DRIVER_CAPABILITIES=all and I have never used it. It used to work fine without it.

@turowicz
Copy link

updated my answer above

@turowicz
Copy link

addon: I am using nvcr.io/nvidia/isaac-sim:2023.1.1 and it used to work fine.

@turowicz
Copy link

turowicz commented Apr 24, 2024

I confirm the container nvcr.io/nvidia/isaac-sim:2023.1.1 has NVIDIA_DRIVER_CAPABILITIES=all

@elezar
Copy link
Member

elezar commented Apr 24, 2024

@turowicz could you provide the full docker command you run?

@turowicz
Copy link

turowicz commented Apr 25, 2024

Here's the .devcontainer file:

// See https://aka.ms/vscode-remote/containers for the
// documentation about the devcontainer.json format
{
	"name": "surveily.omniverse",
	"build": {
		"dockerfile": "dockerfile"
	},
	"runArgs": [
		"--name",
		"surveily.omniverse",
		"-v",
		"${env:HOME}${env:USERPROFILE}/.ssh:/root/.ssh-localhost:ro",
		"-v",
		"/var/run/docker.sock:/var/run/docker.sock",
		"-v",
		"/usr/share/nvidia/nvoptix.bin:/usr/share/nvidia/nvoptix.bin",
		"--network",
		"host",
		"--gpus",
		"all",
		"-e",
		"ACCEPT_EULA=Y",
		"-e",
		"PRIVACY_CONSENT=N"
	],
	"postCreateCommand": "mkdir -p ~/.ssh && cp -r ~/.ssh-localhost/* ~/.ssh && chmod 700 ~/.ssh && chmod 600 ~/.ssh/*",
	"appPort": [
		"5003:5003"
	],
	"extensions": [
		"kosunix.guid",
		"redhat.vscode-yaml",
		"rogalmic.bash-debug",
		"mikeburgh.xml-format",
		"donjayamanne.githistory",
		"ms-azuretools.vscode-docker",
		"ms-azure-devops.azure-pipelines",
	],
	"settings": {
		"extensions.autoUpdate": false,
		"files.exclude": {
			"**/CVS": true,
			"**/bin": true,
			"**/obj": true,
			"**/.hg": true,
			"**/.svn": true,
			"**/.git": true,
			"**/.DS_Store": true,
			"**/BenchmarkDotNet.Artifacts": true
		}
	},
	"shutdownAction": "stopContainer",
}

and the dockerfile:

FROM nvcr.io/nvidia/isaac-sim:2023.1.1

# Install tools
RUN apt update && apt install git vim -y

# Remove ROS/2 Bridge
RUN sed -i 's/ros_bridge_extension = "omni.isaac.ros2_bridge"/ros_bridge_extension = ""/g' /isaac-sim/apps/omni.isaac.sim.base.kit

# Toggle Grid Off
RUN sed -i '17i import omni.kit.viewport' /isaac-sim/extscache/omni.replicator.replicator_yaml-2.0.4+lx64/omni/replicator/replicator_yaml/scripts/replicator_yaml_extension.py
RUN sed -i '100i \ \ \ \ \ \ \ \ omni.kit.viewport.actions.actions.toggle_global_visibility(visible=False)' /isaac-sim/extscache/omni.replicator.replicator_yaml-2.0.4+lx64/omni/replicator/replicator_yaml/scripts/replicator_yaml_extension.py

@turowicz
Copy link

my workaround works but you guys may want to fix the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants