Updates tiled rendering API with full RTX rendering and additional an…

…notators (#97) This change updates the current tiled rendering APIs to use the full RTX tiled rendering feature, allowing for higher quality RGB renders and support of additional annotators, including semantic segmentation, instance segmentation, normals, and motion vectors. This change also aligns output dimensions across TiledCamera, Camera, and RayCasterCamera classes. All single-channel outputs will now have dimension (H, W, C). Camera class now outputs RGB data with shape (H, W, 3).  - New feature (non-breaking change which adds functionality) - Breaking change (fix or feature that would cause existing functionality to not work as expected) - This change requires a documentation update Fixes issue #775 - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there  --------- Co-authored-by: Alexander <[email protected]> Co-authored-by: Toni-SM <[email protected]>
isaac-sim · Sep 20, 2024 · 02b0d76 · 02b0d76
1 parent 52422b6
commit 02b0d76
Show file tree

Hide file tree

Showing 18 changed files with 1,225 additions and 196 deletions.
diff --git a/docs/source/features/tiled_rendering.rst b/docs/source/features/tiled_rendering.rst
@@ -9,9 +9,9 @@ Tiled Rendering
 
 .. note::
 
-    This feature is only available from Isaac Sim version 4.0.0 onwards.
+    This feature is only available from Isaac Sim version 4.2.0 onwards.
 
-    Tiled rendering requires heavy memory resources. We recommend running at most 256 cameras in the scene.
+    Tiled rendering in combination with image processing networks require heavy memory resources, especially at larger resolutions. We recommend running at 256 cameras in the scene on RTX 4090 GPUs or similar.
 
 Tiled rendering APIs provide a vectorized interface for collecting data from camera sensors.
 This is useful for reinforcement learning environments requiring vision in the loop.
@@ -20,7 +20,7 @@ one single large image instead of multiple smaller images that would have been p
 by each individual camera. This reduces the amount of time required for rendering and
 provides a more efficient API for working with vision data.
 
-Isaac Lab provides tiled rendering APIs for RGB and depth data through the :class:`~sensors.TiledCamera`
+Isaac Lab provides tiled rendering APIs for RGB, depth, along with other annotators through the :class:`~sensors.TiledCamera`
 class. Configurations for the tiled rendering APIs can be defined through the :class:`~sensors.TiledCameraCfg`
 class, specifying parameters such as the regex expression for all camera paths, the transform
 for the cameras, the desired data type, the type of cameras to add to the scene, and the camera
@@ -59,6 +59,80 @@ environment. For example:
     python source/standalone/workflows/rl_games/train.py --task=Isaac-Cartpole-RGB-Camera-Direct-v0 --headless --enable_cameras
 
 
+Annotators and Data Types
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Both :class:`~sensors.TiledCamera` and :class:`~sensors.Camera` classes provide APIs for retrieving various types annotator data from replicator:
+
+* ``"rgb"``: A 3-channel rendered color image.
+* ``"rgba"``: A 4-channel rendered color image with alpha channel.
+* ``"distance_to_camera"``: An image containing the distance to camera optical center.
+* ``"distance_to_image_plane"``: An image containing distances of 3D points from camera plane along camera's z-axis.
+* ``"depth"``: The same as ``"distance_to_image_plane"``.
+* ``"normals"``: An image containing the local surface normal vectors at each pixel.
+* ``"motion_vectors"``: An image containing the motion vector data at each pixel.
+* ``"semantic_segmentation"``: The semantic segmentation data.
+* ``"instance_segmentation_fast"``: The instance segmentation data.
+* ``"instance_id_segmentation_fast"``: The instance id segmentation data.
+
+RGB and RGBA
+""""""""""""
+
+``rgb`` data type returns a 3-channel RGB colored image of type ``torch.uint8``, with dimension (B, H, W, 3).
+
+``rgba`` data type returns a 4-channel RGBA colored image of type ``torch.uint8``, with dimension (B, H, W, 4).
+
+To convert the ``torch.uint8`` data to ``torch.float32``, divide the buffer by 255.0 to obtain a ``torch.float32`` buffer containing data from 0 to 1.
+
+Depth and Distances
+"""""""""""""""""""
+
+``distance_to_camera`` returns a single-channel depth image with distance to the camera optical center. The dimension for this annotator is (B, H, W, 1) and has type ``torch.float32``.
+
+``distance_to_image_plane`` returns a single-channel depth image with distances of 3D points from the camera plane along the camera's Z-axis. The dimension for this annotator is (B, H, W, 1) and has type ``torch.float32``.
+
+``depth`` is provided as an alias for ``distance_to_image_plane`` and will return the same data as the ``distance_to_image_plane`` annotator, with dimension (B, H, W, 1) and type ``torch.float32``.
+
+Normals
+"""""""
+
+``normals`` returns an image containing the local surface normal vectors at each pixel. The buffer has dimension (B, H, W, 3), containing the (x, y, z) information for each vector, and has data type ``torch.float32``.
+
+Motion Vectors
+""""""""""""""
+
+``motion_vectors`` returns the per-pixel motion vectors in image space, with a 2D array of motion vectors representing the relative motion of a pixel in the camera’s viewport between frames. The buffer has dimension (B, H, W, 2), representing x - the motion distance in the horizontal axis (image width) with movement to the left of the image being positive and movement to the right being negative and y - motion distance in the vertical axis (image height) with movement towards the top of the image being positive and movement to the bottom being negative. The data type is ``torch.float32``.
+
+Semantic Segmentation
+"""""""""""""""""""""
+
+``semantic_segmentation`` outputs semantic segmentation of each entity in the camera’s viewport that has semantic labels. In addition to the image buffer, an ``info`` dictionary can be retrieved with ``tiled_camera.data.info['semantic_segmentation']`` containing ID to labels information.
+
+If ``colorize_semantic_segmentation=True`` in the camera config, a 4-channel RGBA image will be returned with dimension (B, H, W, 4) and type ``torch.uint8``. The info ``idToLabels`` dictionary will be the mapping from color to semantic labels.
+
+If ``colorize_semantic_segmentation=False``, a buffer of dimension (B, H, W, 1) of type ``torch.int32`` will be returned, containing the semantic ID of each pixel. The info ``idToLabels`` dictionary will be the mapping from semantic ID to semantic labels.
+
+Instance ID Segmentation
+""""""""""""""""""""""""
+
+``instance_id_segmentation_fast`` outputs instance ID segmentation of each entity in the camera’s viewport. The instance ID is unique for each prim in the scene with different paths. In addition to the image buffer, an ``info`` dictionary can be retrieved with ``tiled_camera.data.info['instance_id_segmentation_fast']`` containing ID to labels information.
+
+The main difference between ``instance_id_segmentation_fast`` and ``instance_segmentation_fast`` are that instance segmentation annotator goes down the hierarchy to the lowest level prim which has semantic labels, where instance ID segmentation always goes down to the leaf prim.
+
+If ``colorize_instance_id_segmentation=True`` in the camera config, a 4-channel RGBA image will be returned with dimension (B, H, W, 4) and type ``torch.uint8``. The info ``idToLabels`` dictionary will be the mapping from color to USD prim path of that entity.
+
+If ``colorize_instance_id_segmentation=False``, a buffer of dimension (B, H, W, 1) of type ``torch.int32`` will be returned, containing the instance ID of each pixel. The info ``idToLabels`` dictionary will be the mapping from instance ID to USD prim path of that entity.
+
+Instance Segmentation
+"""""""""""""""""""""
+
+``instance_segmentation_fast`` outputs instance segmentation of each entity in the camera’s viewport. In addition to the image buffer, an ``info`` dictionary can be retrieved with ``tiled_camera.data.info['instance_segmentation_fast']`` containing ID to labels and ID to semantic information.
+
+If ``colorize_instance_segmentation=True`` in the camera config, a 4-channel RGBA image will be returned with dimension (B, H, W, 4) and type ``torch.uint8``. The info ``idToLabels`` dictionary will be the mapping from color to USD prim path of that semantic entity. The info ``idToSemantics`` dictionary will be the mapping from color to semantic labels of that semantic entity.
+
+If ``colorize_instance_segmentation=False``, a buffer of dimension (B, H, W, 1) of type ``torch.int32`` will be returned, containing the instance ID of each pixel. The info ``idToLabels`` dictionary will be the mapping from instance ID to USD prim path of that semantic entity. The info ``idToSemantics`` dictionary will be the mapping from instance ID to semantic labels of that semantic entity.
+
+
 Recording during training
 -------------------------
 

diff --git a/source/apps/isaaclab.python.headless.kit b/source/apps/isaaclab.python.headless.kit
@@ -43,6 +43,9 @@ exts."omni.kit.window.viewport".blockingGetViewportDrawable = false
 # Fix PlayButtonGroup error
 exts."omni.kit.widget.toolbar".PlayButton.enabled = false
 
+# disable replicator orchestrator for better runtime perf
+exts."omni.replicator.core".Orchestrator.enabled = false
+
 [settings.app.settings]
 persistent = true
 dev_build = false

diff --git a/source/apps/isaaclab.python.headless.rendering.kit b/source/apps/isaaclab.python.headless.rendering.kit
@@ -37,18 +37,24 @@ app.version = "4.1.0"
 # set the default ros bridge to disable on startup
 isaac.startup.ros_bridge_extension = ""
 
-# Increase available descriptors to support more simultaneous cameras
-rtx.descriptorSets=30000
+# Flags for better rendering performance
+rtx.translucency.enabled = false
+rtx.reflections.enabled = false
+rtx.indirectDiffuse.enabled = false
+rtx.transient.dlssg.enabled = false
+rtx.directLighting.sampledLighting.enabled = true
+rtx.directLighting.sampledLighting.samplesPerPixel = 1
+rtx.sceneDb.ambientLightIntensity = 1.0
+# rtx.shadows.enabled = false
 
-# Enable new denoiser to reduce motion blur artifacts
-rtx.newDenoiser.enabled=true
+# Avoids replicator warning
+rtx.pathtracing.maxSamplesPerLaunch = 1000000
 
 # Disable present thread to improve performance
 exts."omni.renderer.core".present.enabled=false
 
 # Disabling these settings reduces renderer VRAM usage and improves rendering performance, but at some quality cost
 rtx.raytracing.cached.enabled = false
-rtx.raytracing.lightcache.spatialCache.enabled = false
 rtx.ambientOcclusion.enabled = false
 rtx-transient.dlssg.enabled = false
 
@@ -61,6 +67,8 @@ renderer.multiGpu.maxGpuCount=1
 # Force synchronous rendering to improve training results
 omni.replicator.asyncRendering = false
 
+# Avoids frame offset issue
+app.updateOrder.checkForHydraRenderComplete = 1000
 app.renderer.waitIdle=true
 app.hydraEngine.waitIdle=true
 
@@ -69,6 +77,9 @@ app.audio.enabled = false
 # Enable Vulkan - avoids torch+cu12 error on windows
 app.vulkan = true
 
+# disable replicator orchestrator for better runtime perf
+exts."omni.replicator.core".Orchestrator.enabled = false
+
 [settings.exts."omni.kit.registry.nucleus"]
 registries = [
     { name = "kit/default", url = "https://ovextensionsprod.blob.core.windows.net/exts/kit/prod/shared" },

diff --git a/source/apps/isaaclab.python.kit b/source/apps/isaaclab.python.kit
@@ -216,6 +216,9 @@ app.audio.enabled = false
 # Enable Vulkan - avoids torch+cu12 error on windows
 app.vulkan = true
 
+# disable replicator orchestrator for better runtime perf
+exts."omni.replicator.core".Orchestrator.enabled = false
+
 # Basic Kit App
 ################################
 app.versionFile = "${exe-path}/VERSION"

diff --git a/source/apps/isaaclab.python.rendering.kit b/source/apps/isaaclab.python.rendering.kit
@@ -37,18 +37,24 @@ app.version = "4.1.0"
 # set the default ros bridge to disable on startup
 isaac.startup.ros_bridge_extension = ""
 
-# Increase available descriptors to support more simultaneous cameras
-rtx.descriptorSets=30000
+# Flags for better rendering performance
+rtx.translucency.enabled = false
+rtx.reflections.enabled = false
+rtx.indirectDiffuse.enabled = false
+rtx.transient.dlssg.enabled = false
+rtx.directLighting.sampledLighting.enabled = true
+rtx.directLighting.sampledLighting.samplesPerPixel = 1
+rtx.sceneDb.ambientLightIntensity = 1.0
+# rtx.shadows.enabled = false
 
-# Enable new denoiser to reduce motion blur artifacts
-rtx.newDenoiser.enabled=true
+# Avoids replicator warning
+rtx.pathtracing.maxSamplesPerLaunch = 1000000
 
 # Disable present thread to improve performance
 exts."omni.renderer.core".present.enabled=false
 
 # Disabling these settings reduces renderer VRAM usage and improves rendering performance, but at some quality cost
 rtx.raytracing.cached.enabled = false
-rtx.raytracing.lightcache.spatialCache.enabled = false
 rtx.ambientOcclusion.enabled = false
 rtx-transient.dlssg.enabled = false
 
@@ -61,11 +67,16 @@ renderer.multiGpu.maxGpuCount=1
 # Force synchronous rendering to improve training results
 omni.replicator.asyncRendering = false
 
+# Avoids frame offset issue
+app.updateOrder.checkForHydraRenderComplete = 1000
 app.renderer.waitIdle=true
 app.hydraEngine.waitIdle=true
 
 app.audio.enabled = false
 
+# disable replicator orchestrator for better runtime perf
+exts."omni.replicator.core".Orchestrator.enabled = false
+
 [settings.physics]
 updateToUsd = false
 updateParticlesToUsd = false

diff --git a/source/extensions/omni.isaac.lab/config/extension.toml b/source/extensions/omni.isaac.lab/config/extension.toml
@@ -1,7 +1,7 @@
 [package]
 
 # Note: Semantic Versioning is used: https://semver.org/
-version = "0.23.10"
+version = "0.24.10"
 
 # Description
 title = "Isaac Lab framework for Robot Learning"

diff --git a/source/extensions/omni.isaac.lab/docs/CHANGELOG.rst b/source/extensions/omni.isaac.lab/docs/CHANGELOG.rst
@@ -1,7 +1,7 @@
 Changelog
 ---------
 
-0.23.10 (2024-09-10)
+0.24.10 (2024-09-10)
 ~~~~~~~~~~~~~~~~~~~~
 
 Added
@@ -10,7 +10,7 @@ Added
 * Added config class, support, and tests for MJCF conversion via standalone python scripts.
 
 
-0.23.9 (2024-09-09)
+0.24.9 (2024-09-09)
 ~~~~~~~~~~~~~~~~~~~~
 
 Added
@@ -22,7 +22,7 @@ Added
   file or the command line argument. This ensures that the simulation results are reproducible across different runs.
 
 
-0.23.8 (2024-09-08)
+0.24.8 (2024-09-08)
 ~~~~~~~~~~~~~~~~~~~
 
 Changed
@@ -32,7 +32,7 @@ Changed
   for faster processing of high dimensional input tensors.
 
 
-0.23.7 (2024-09-06)
+0.24.7 (2024-09-06)
 ~~~~~~~~~~~~~~~~~~~
 
 Added
@@ -43,7 +43,7 @@ Added
   instance variables instead.
 
 
-0.23.6 (2024-09-05)
+0.24.6 (2024-09-05)
 ~~~~~~~~~~~~~~~~~~~
 
 Fixed
@@ -53,7 +53,7 @@ Fixed
   more-intuitive to control the y-axis motion based on the right-hand rule.
 
 
-0.23.5 (2024-08-29)
+0.24.5 (2024-08-29)
 ~~~~~~~~~~~~~~~~~~~
 
 Added
@@ -63,7 +63,7 @@ Added
   consistent with all other cameras (equal to type "depth").
 
 
-0.23.4 (2024-09-02)
+0.24.4 (2024-09-02)
 ~~~~~~~~~~~~~~~~~~~
 
 Fixed
@@ -74,7 +74,7 @@ Fixed
 * Added test to check :attr:`omni.isaac.lab.sensors.RayCasterCamera.set_intrinsic_matrices`
 
 
-0.23.3 (2024-08-29)
+0.24.3 (2024-08-29)
 ~~~~~~~~~~~~~~~~~~~
 
 Fixed
@@ -85,7 +85,7 @@ Fixed
   which required initialization of the class to call the class-methods.
 
 
-0.23.2 (2024-08-28)
+0.24.2 (2024-08-28)
 ~~~~~~~~~~~~~~~~~~~
 
 Added
@@ -106,7 +106,7 @@ Fixed
   the behavior equal to the USD Camera.
 
 
-0.23.1 (2024-08-21)
+0.24.1 (2024-08-21)
 ~~~~~~~~~~~~~~~~~~~
 
 Changed
@@ -115,6 +115,23 @@ Changed
 * Disabled default viewport in certain headless scenarios for better performance.
 
 
+0.24.0 (2024-08-17)
+~~~~~~~~~~~~~~~~~~~
+
+Added
+^^^^^
+
+* Added additional annotators for :class:`omni.isaac.lab.sensors.camera.TiledCamera` class.
+
+Changed
+^^^^^^^
+
+* Updated :class:`omni.isaac.lab.sensors.TiledCamera` to latest RTX tiled rendering API.
+* Single channel outputs for :class:`omni.isaac.lab.sensors.TiledCamera`, :class:`omni.isaac.lab.sensors.Camera` and :class:`omni.isaac.lab.sensors.RayCasterCamera` now has shape (H, W, 1).
+* Data type for RGB output for :class:`omni.isaac.lab.sensors.TiledCamera` changed from ``torch.float`` to ``torch.uint8``.
+* Dimension of RGB output for :class:`omni.isaac.lab.sensors.Camera` changed from (H, W, 4) to (H, W, 3). Use type ``rgba`` to retrieve the previous dimension.
+
+
 0.23.1 (2024-08-17)
 ~~~~~~~~~~~~~~~~~~~