Ogre 2.3 preparations #101

darksylinc · 2020-06-15T18:00:37Z

Since commit 95c09d7 there are three changes or upcoming changes that warrant bumping the minor version:

Mesh::importV1 deprecated over createByImportingV1
Unlit shaders unification can cause a few minor headaches if the old
shader templates mix with the new ones
Upcoming Normal Offset Bias

These are minimum changes, but when accumulated it is not reasonable that someone porting from 2.2.x to consider this "100% hassle free".

Nonetheless we should be careful not to scare anyone, since these changes are not major at all. Porting should still be easy.

Tasks remaining:

Porting:

Switch importV1 to createByImportingV1

In 2.2.2 and earlier we had a function called Mesh::importV1 which would populate a v2 mesh by filling it with data from a v1 mesh, effectively importing it.

In 2.2.3 users should use MeshManager::createByImportingV1 instead. This function 'remembers' which meshes have been created through a conversion process, which allows device lost handling to repeat this import process and recreate the resources.

Aside from this little difference, there are no major functionality changes and the function arguments are the same.

Shadow's Normal Offset Bias

We've had a couple complaints, but it wasn't until user SolarPortal made a more exhaustive research where we realized we were not using state of the art shadow mapping techniques.

We were relying on hlmsManager->setShadowMappingUseBackFaces( true ) to hide most self-occlussion errors, but this caused other visual errors.

Normal Offset Bias is a technique from 2011 (yes, it's old!) which drastically improves self occlussion and shadow acne while improving overall shadow quality; and is much more robust than using inverted-culling during the caster pass.

Therefore this technique replaced the old one and the function HlmsManager::setShadowMappingUseBackFaces() has been removed.

Users can globally control normal-offset and constant biases per cascade by tweaking ShadowTextureDefinition::normalOffsetBias and ShadowTextureDefinition::constantBiasScale respectively.

You can also control them via compositors scripts in the shadow node declaration, using the new keywords constant_bias_scale and normal_offset_bias

Users porting from 2.2.x may notice their shadows are a bit different (for the better!), but may encounter some self shadowing artifacts. Thus they may have to adjust these two biases if they need to.

Unlit vertex and pixel shaders unified

Unlit shaders were still duplicating its code 3 times (one for each RenderSystem) and all of its vertex & pixel shader code has been unified into a single .any file.

Although this shouldn't impact you at all, users porting from 2.2.x need to make sure old Hlms shader templates from Unlit don't linger and get mixed with the new files.

Pay special attention the files from Samples/Media/Hlms/Unlit match 1:1 the ones in your project and there aren't stray .glsl/.hlsl/.metal files from an older version.

If you have customized the Unlit implementation, you may find your customizations to be broken. But they're easy to fix. For reference look at Colibri's two commits which ported its Unlit customizations from 2.2.x to 2.3.0

Added HlmsMacroblock::mDepthClamp

It is now possible to toggle Depth Clamp on/off. Check if it's supported via RSC_DEPTH_CLAMP. All desktop GPU should support it unless you're using extremely old OpenGL drivers.
iOS supports it since A11 chip (iPhone 8 or newer)

Users upgrading from older Ogre versions should be careful their libraries and headers don't get out of sync. A full rebuild is recommended.

The reason being is that HlmsMacroblock (which is used almost anywhere in Ogre) added a new member variable. And if a DLL or header gets out of sync, it likely won't crash but the artifacts will be very funny (most likely depth buffer will be disabled).

Added shadow pancaking

With the addition of depth clamp, we are now able to push the near plane of directional shadow maps in PSSM (non-stable variant). This greatly enhances depth buffer precision and reduces self-occlusion and acne bugs.

This improvement may make it possible for users to try using PFG_D16_UNORM instead of PFG_D32_FLOAT for shadow mapping, halving memory consumption.

Shadow pancaking is automatically disabled when depth clamp is not supported.

Other relevant information when porting

HlmsListener::hlmsTypeChanged added an argument. Beware of it if you are overloading this function
HlmsListener::propertiesMergedPreGenerationStep changed its arguments. Beware of it if you are overloading this function
Terra's compositor does not need to expose terrain shadows' texture anymore. This is done automatically.
Root() constructor now takes one last string parameter to set the app's name. Use this string to tell vendors your app's name so they can create custom driver profiles.
SMAA was using PFG_D24_UNORM_S8_UINT, now it uses PFG_D32_FLOAT_S8X24_UINT
SMAA shaders were updated. If you were using SMAA, remember to grab the latest ones
RenderSystem::_setTexture added an argument which almost always should be set to false (should only be set to true if rendering to a depth buffer without writing to it while also binding that same depth buffer as a texture for sampling).
TextureGpu now has getInternalWidth and getInternalHeight. This happens because Vulkan on Android may require us to rotate the window ourselves to avoid performance degradation instead of letting the OS or HW do it (see TextureGpu::setOrientationMode). If orientation mode is 90° or 270°, then getInternalWidth returns the height and getInternalHeight returns the width). It is only relevant for Vulkan on Android. This is important if you need to perform copy operations or use AsyncTextureTickets on oriented textures.
CompositorManager2::addWorkspace removed the last parameters. ResourceLayoutMap and ResourceAccessMap are no longer needed, they're automatic. addWorkspace now accepts a ResourceStatusMap in case the workspace needs to assume the texture is in a specific initial layout (very unlikely)
Terra's compositor setup has changed again, but this time it's been simplified. It is no longer needed to expose the shadow map texture.
If you see 'Transitioning texture <Texture Name> from Undefined to a read-only layout. Perhaps you didn't want to set TextureFlags::DiscardableContent / aka keep_content in compositor?' messages, then you need to add 'keep_content' to the compositor's declaration. e.g.

Old:
texture prevFrameDepthBuffer target_width target_height PFG_R32_FLOAT uav

New:
texture prevFrameDepthBuffer target_width target_height PFG_R32_FLOAT uav keep_content

This is normal for textures whose contents are meant to be carried over from the previous frame.

keep_content is for RenderTextures and/or UAVs. The question is whether they can start fresh clean in a new frame; or we must preserve their changes between frames.

Is keep_content for UAV textures or normal textures? And what about a UAV buffer whose content have to be preserved among multiple frames?

Normal textures: No, their contents are always kept
UAV textures: Yes, if you need to preserve their contents
UAV buffers: This flag is only for textures, thus there is no need

Another way to look at it is to think of Multi-GPU rendering:

If GPU0 works on frame A and GPU1 works on frame B... can we assume each GPU can have its own independent version? or must GPU1 wait for GPU0 to finish and transfer the contents before GPU1 can start working with that texture?

This only applies to textures (for the time being), so UAV buffers don't have this flag. However they might in the future should we support multi-GPU rendering.

We discard by default in the compositor because the vast majority of textures are either temporary (i.e. used for ping-pong FXs) or reconstructed from scratch every frame.

However textureManager->createOrRetrieve disables this flag by default (the C++ flag is TextureFlags::DiscardableContent) to avoid breaking more complex algorithms.

keep_content isn't just for multi-GPU rendering (i.e. it allows the HW to reset all compression flags, start working on a resource before other components within the GPU are finished), but it is the easiest way to understand it.

Dynamic shadow maps don't need this flag.

But fully or partially static shadow maps (like shown in StaticShadowMaps sample, see StaticShadowMaps.compositor) need this flag, since contents are kept between frames until manually updated.

Do not call notifyDataIsReady more than needed

In Ogre 2.2 you could call notifyDataIsReady as many times as you want. In fact we gave the following example which is now possibly wrong:

stagingTexture->startMapRegion();
TextureBox box0 = stagingTexture->mapRegion( width, height, depth, slices, pixelFormat );
... write to box0.data ...
TextureBox box1 = stagingTexture->mapRegion( width, height, depth, slices, pixelFormat );
... write to box1.data ...
stagingTexture->stopMapRegion();
stagingTexture->upload( box0, texture0 );
stagingTexture->upload( box1, texture1 );
textureManager.removeStagingTexture( stagingTexture );
texture0->notifyDataIsReady();
texture1->notifyDataIsReady();

However in Ogre 2.3 every notifyDataIsReady must previously have had a call to scheduleTransitionTo (Resident) or scheduleReupload, assuming you didn't cancel the load e.g. assuming you didn't do this:

tex->scheduleTransitionTo( GpuResidency::Resident );
tex->scheduleTransitionTo( GpuResidency::OnStorage );
// Must NOT call tex->notifyDataIsReady();

Additionally for ManualTexture textures, Ogre automatically calls notifyDataIsReady as soon as the texture becomes resident. Thus notifyDataIsReady shouldn not be called by the user if TextureFlags::ManualTexture flag is set.

The solution is simple: Remove the call to notifyDataIsReady, since it wasn't previous needed, and now it must not be there.

If the assert is triggering inside Ogre, then it means you previously called notifyDataIsReady on that texture and now Ogre is doing it again. Find where you're calling notifyDataIsReady and remove that line.

Terra, SSAO, Postprocessing samples and v1 Overlays were updated to reflect this change.

Global changes for Vulkan compatibility:

GraphicsSystem::initialize has changed slightly. Look for SDL2x11 and VSync Method
The last compositor pass to render to the screen must set RenderPassDescriptor::mReadyWindowForPresent (Vulkan requirement). This is handled automatically in CompositorManager2::prepareRenderWindowsForPresent whenever the compositor chain changes. However if the workspace that presents to screen is disabled (i.e. you call CompositorWorkspace::_update manually) then you'll have to set this out yourself. The same happens if the last pass is a custom pass calls initialize( rtv ) but never calls setRenderPassDescToCurrent.
EmptyProject's CMake scripts have been updated to account Vulkan plugin

ReadOnlyBuffers

This affects all RS and is a performance optimization (also we needed this to support Android). To quote documentation:

Represents the best way to access read-only data in shaders.
But how it is implemented depends largely on which HW/API it is running on:

Buffer<> aka texture buffer in D3D11 on D3D10+ HW
samplerBuffer aka texture buffer in GL3 on GL3/D3D10 HW
SSBO aka UAV buffer in GL4 on GL4/D3D11 HW
SSBO aka UAV buffer in Vulkan
Metal doesn't matter, there are no differences between TexBuffer & UavBuffer, however we consume the slots reserved for tex buffers.

In short, ReadOnlyBufferPacked either behaves as a TexBufferPacked or as an UavBufferPacked` (but read only) depending on HW and API being used.

Some existing code has changed to use ReadOnlyBufferPacked where it originally used TexBufferPacked.
This means in shader code, these buffers should be accessed via the readOnlyFetch() macro.

i.e.

// GLSL
// old code:
float4 value = texelFetch( matrixBuf, idx );
// new code
float4 value = readOnlyFetch( matrixBuf, idx );

// Note that readOnlyFetch macro may expand to either
float4 value = matrixBuf[idx]; // Modern GPUs
float4 value = texelFetch( matrixBuf, idx ); // Old GPUs

In HLSL the difference is less noticeable, but variables are now declared as StructuredBuffer<floatN> instead of Buffer<floatN>

Additionally, Matrix functions in Matrix_piece_all.glsl/hlsl have been modified to only work with matrices stored in ReadOnly buffers instead of Tex buffers.

If your custom modifications to Hlms access buffers that are now ReadOnlyBufferPacked, you need to change your shaders to use readOnlyFetch. This may sound scary but it's just routinely changing code that refuses to compile with the new syntax.

Is Vulkan RenderSystem stable?

Yes. It's not beta. It has been tested on multiple platforms and GPUs (NVIDIA, AMD Mesa, AMD Propietary on Linux, AMD Propietary on Windows, Intel Mesa, Intel Windows, Qualcomm, ARM) and it works very well, sometimes even better than GL3+.

But given that it is the newest RenderSystem, please report any issue you encounter.

However please note:

Use the latest drivers. Some vendors (e.g. NVIDIA and AMD) have fixed bugs that caused some issues with our implementation.
It works on Android very well, but many users may be stuck on older implementations and could have issues. We workaround multiple driver bugs but in general the older the Android version, the worse it gets. Android 7.0 will work but it is not recommended as drivers on that version have lots of bugs. Android 10 drivers work extremely well.
Our default settings are memory-greedy. On older GPUs (e.g. 2GB models), you could run out of VRAM and run at reduced performance. Make sure to tweak your memory settings to achieve best results. See Samples/2.0/Tutorials/Tutorial_Memory on how to monitor and tweak memory. Also see Samples/2.0/Tests/MemoryCleanup

See known issues.

Known issues

VulkanVaoManager::cleanupEmptyPools is not implemented and will raise an exception when called. This may be implemented at a later date.
- Vulkan memory manager had to be tweaked and recycles memory much better than the GL3+, D3D11 and Metal RenderSystems, hence there may not be a need to call this function at all
Vulkan: Taking a screenshot from the Render Window (i.e. downloading its contents) is not implemented.

The text was updated successfully, but these errors were encountered:

TaaTT4 · 2020-09-22T10:27:43Z

Must keep_content being used just for UAV textures or for normal textures either? And what about a UAV buffer whose content have to be preserved among multiple frames?

darksylinc · 2020-09-22T14:54:35Z

keep_content is for RenderTextures and/or UAVs. The question is whether they can start fresh clean in a new frame; or we must preserve their changes between frames.

Another way to look at it is to think of Multi-GPU rendering:

If GPU0 works on frame A and GPU1 works on frame B; can we assume each GPU can have its own independent version; or must GPU1 wait for GPU0 to finish and transfer the contents before GPU1 can start working with that texture?

This only applies to textures (for the time being), so UAV buffers don't have this flag. However they might in the future should we support multi-GPU rendering.

We discard by default in the compositor because the vast majority of textures are either temporary (i.e. used for ping-pong FXs) or reconstructed from scratch every frame.

However textureManager->createOrRetrieve disables this flag by default (the C++ flag is TextureFlags::DiscardableContent) to avoid breaking more complex algorithms.

keep_content isn't just for multi-GPU rendering (i.e. it allows the HW to reset all compression flags, start working on a resource before other components within the GPU are finished), but it is the easiest way to understand it.

TaaTT4 · 2020-09-22T17:21:16Z

keep_content is for RenderTextures and/or UAVs. The question is whether they can start fresh clean in a new frame; or we must preserve their changes between frames.

Why does this isn't valid for shadow textures? They need keep_content flag, but don't they start fresh clean in a new frame (I'm just asking for curiosity)?

TaaTT4 · 2020-09-22T17:27:24Z

The last compositor pass to render to the screen must set RenderPassDescriptor::mReadyWindowForPresent (Vulkan requirement). This is handled automatically in CompositorManager2::prepareRenderWindowsForPresent whenever the compositor chain changes. However if you the workspace that presents to screen is disabled (i.e. calls CompositorWorkspace::_update manually) then you'll have to set this out yourself. The same happens if the last pass is a custom pass calls initialize( rtv ) but never calls setRenderPassDescToCurrent.

Could you please detail a bit more this point? The utility I use to take screenshots do exactly what you mention on a keypress (creates a separate workspace that renders in a rtt instead of a window and manually invokes _update). What I am supposed to do?

darksylinc · 2020-09-22T17:52:06Z

keep_content is for RenderTextures and/or UAVs. The question is whether they can start fresh clean in a new frame; or we must preserve their changes between frames.

Why does this isn't valid for shadow textures? They need keep_content flag, but don't they start fresh clean in a new frame (I'm just asking for curiosity)?

It should be valid for shadow textures. If it's being rejected, it's a parser bug.

Dynamic shadow maps don't need this flag
But fully or partially static shadow maps (like shown in StaticShadowMaps sample, see StaticShadowMaps.compositor) need this flag, since contents are kept between frames until manually updated.

Could you please detail a bit more this point? The utility I use to take screenshots do exactly what you mention on a keypress (creates a separate workspace that renders in a rtt instead of a window and manually invokes _update). What I am supposed to do?

If you don't touch the render window, then nothing. This flag is specifically for the last pass that touches the render window. The last pass in a frame is responsible for preparing the render window for presentation to the screen, attempting to use it afterwards (whether for reading or writing) is undefined behaviour.

Thus if pass N prepares the window for presentation but you manually execute pass N+1 afterwards which needs the render window, then pass N must set mReadyWindowForPresent = false and call entriesModified( RenderPassDescriptor::Colour ), while pass N+1 must do the same but with true.

This is a bit user hostile, thus I'm still thinking of better ways to solve it.

Also because this flag is very specific to Vulkan, and taking screenshots from the Render Window is still WIP, exactly what needs to be done is subject to change.

But if you render to a temporary RTT and take a screenshot of that RTT, then nothing needs to be done. The render window is not involved at all.

Behavior of TextureGpu::notifyDataIsReady changed. Calling it excessively is now considered incorrect. Documented in #101 Added mDataPreparationsPending which is required. Otherwise calling scheduleReupload (>=1x times) then destroying the texture will immediately destroy the pointer while it may actually be still used in the worker thread. Previously this was not needed because if a texture was in the worker thread, then it meant notifyDataIsReady had never been called yet since the last time we transitioned to Resident; thus destroyTexture would automatically delay the destruction.

darksylinc added the Task label Jun 15, 2020

darksylinc self-assigned this Jun 15, 2020

darksylinc closed this as completed Dec 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ogre 2.3 preparations #101

Ogre 2.3 preparations #101

darksylinc commented Jun 15, 2020 •

edited

Loading

TaaTT4 commented Sep 22, 2020

darksylinc commented Sep 22, 2020 •

edited

Loading

TaaTT4 commented Sep 22, 2020

TaaTT4 commented Sep 22, 2020

darksylinc commented Sep 22, 2020

Ogre 2.3 preparations #101

Ogre 2.3 preparations #101

Comments

darksylinc commented Jun 15, 2020 • edited Loading

Porting:

Switch importV1 to createByImportingV1

Shadow's Normal Offset Bias

Unlit vertex and pixel shaders unified

Added HlmsMacroblock::mDepthClamp

Added shadow pancaking

Other relevant information when porting

Is keep_content for UAV textures or normal textures? And what about a UAV buffer whose content have to be preserved among multiple frames?

Do not call notifyDataIsReady more than needed

Global changes for Vulkan compatibility:

ReadOnlyBuffers

Is Vulkan RenderSystem stable?

Known issues

TaaTT4 commented Sep 22, 2020

darksylinc commented Sep 22, 2020 • edited Loading

TaaTT4 commented Sep 22, 2020

TaaTT4 commented Sep 22, 2020

darksylinc commented Sep 22, 2020

darksylinc commented Jun 15, 2020 •

edited

Loading

darksylinc commented Sep 22, 2020 •

edited

Loading