Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ogre 2.3 preparations #101

Closed
10 of 12 tasks
darksylinc opened this issue Jun 15, 2020 · 5 comments
Closed
10 of 12 tasks

Ogre 2.3 preparations #101

darksylinc opened this issue Jun 15, 2020 · 5 comments
Assignees
Labels

Comments

@darksylinc
Copy link
Member

darksylinc commented Jun 15, 2020

Since commit 95c09d7 there are three changes or upcoming changes that warrant bumping the minor version:

  • Mesh::importV1 deprecated over createByImportingV1
  • Unlit shaders unification can cause a few minor headaches if the old
    shader templates mix with the new ones
  • Upcoming Normal Offset Bias

These are minimum changes, but when accumulated it is not reasonable that someone porting from 2.2.x to consider this "100% hassle free".

Nonetheless we should be careful not to scare anyone, since these changes are not major at all. Porting should still be easy.

Tasks remaining:

Porting:

Switch importV1 to createByImportingV1

In 2.2.2 and earlier we had a function called Mesh::importV1 which would populate a v2 mesh by filling it with data from a v1 mesh, effectively importing it.

In 2.2.3 users should use MeshManager::createByImportingV1 instead. This function 'remembers' which meshes have been created through a conversion process, which allows device lost handling to repeat this import process and recreate the resources.

Aside from this little difference, there are no major functionality changes and the function arguments are the same.

Shadow's Normal Offset Bias

We've had a couple complaints, but it wasn't until user SolarPortal made a more exhaustive research where we realized we were not using state of the art shadow mapping techniques.

We were relying on hlmsManager->setShadowMappingUseBackFaces( true ) to hide most self-occlussion errors, but this caused other visual errors.

Normal Offset Bias is a technique from 2011 (yes, it's old!) which drastically improves self occlussion and shadow acne while improving overall shadow quality; and is much more robust than using inverted-culling during the caster pass.

Therefore this technique replaced the old one and the function HlmsManager::setShadowMappingUseBackFaces() has been removed.

Users can globally control normal-offset and constant biases per cascade by tweaking ShadowTextureDefinition::normalOffsetBias and ShadowTextureDefinition::constantBiasScale respectively.

You can also control them via compositors scripts in the shadow node declaration, using the new keywords constant_bias_scale and normal_offset_bias

Users porting from 2.2.x may notice their shadows are a bit different (for the better!), but may encounter some self shadowing artifacts. Thus they may have to adjust these two biases if they need to.

Unlit vertex and pixel shaders unified

Unlit shaders were still duplicating its code 3 times (one for each RenderSystem) and all of its vertex & pixel shader code has been unified into a single .any file.

Although this shouldn't impact you at all, users porting from 2.2.x need to make sure old Hlms shader templates from Unlit don't linger and get mixed with the new files.

Pay special attention the files from Samples/Media/Hlms/Unlit match 1:1 the ones in your project and there aren't stray .glsl/.hlsl/.metal files from an older version.

If you have customized the Unlit implementation, you may find your customizations to be broken. But they're easy to fix. For reference look at Colibri's two commits which ported its Unlit customizations from 2.2.x to 2.3.0

Added HlmsMacroblock::mDepthClamp

It is now possible to toggle Depth Clamp on/off. Check if it's supported via RSC_DEPTH_CLAMP. All desktop GPU should support it unless you're using extremely old OpenGL drivers.
iOS supports it since A11 chip (iPhone 8 or newer)

Users upgrading from older Ogre versions should be careful their libraries and headers don't get out of sync. A full rebuild is recommended.

The reason being is that HlmsMacroblock (which is used almost anywhere in Ogre) added a new member variable. And if a DLL or header gets out of sync, it likely won't crash but the artifacts will be very funny (most likely depth buffer will be disabled).

Added shadow pancaking

With the addition of depth clamp, we are now able to push the near plane of directional shadow maps in PSSM (non-stable variant). This greatly enhances depth buffer precision and reduces self-occlusion and acne bugs.

This improvement may make it possible for users to try using PFG_D16_UNORM instead of PFG_D32_FLOAT for shadow mapping, halving memory consumption.

Shadow pancaking is automatically disabled when depth clamp is not supported.

Other relevant information when porting

  • HlmsListener::hlmsTypeChanged added an argument. Beware of it if you are overloading this function
  • HlmsListener::propertiesMergedPreGenerationStep changed its arguments. Beware of it if you are overloading this function
  • Terra's compositor does not need to expose terrain shadows' texture anymore. This is done automatically.
  • Root() constructor now takes one last string parameter to set the app's name. Use this string to tell vendors your app's name so they can create custom driver profiles.
  • SMAA was using PFG_D24_UNORM_S8_UINT, now it uses PFG_D32_FLOAT_S8X24_UINT
  • SMAA shaders were updated. If you were using SMAA, remember to grab the latest ones
  • RenderSystem::_setTexture added an argument which almost always should be set to false (should only be set to true if rendering to a depth buffer without writing to it while also binding that same depth buffer as a texture for sampling).
  • TextureGpu now has getInternalWidth and getInternalHeight. This happens because Vulkan on Android may require us to rotate the window ourselves to avoid performance degradation instead of letting the OS or HW do it (see TextureGpu::setOrientationMode). If orientation mode is 90° or 270°, then getInternalWidth returns the height and getInternalHeight returns the width). It is only relevant for Vulkan on Android. This is important if you need to perform copy operations or use AsyncTextureTickets on oriented textures.
  • CompositorManager2::addWorkspace removed the last parameters. ResourceLayoutMap and ResourceAccessMap are no longer needed, they're automatic. addWorkspace now accepts a ResourceStatusMap in case the workspace needs to assume the texture is in a specific initial layout (very unlikely)
  • Terra's compositor setup has changed again, but this time it's been simplified. It is no longer needed to expose the shadow map texture.
  • If you see 'Transitioning texture <Texture Name> from Undefined to a read-only layout. Perhaps you didn't want to set TextureFlags::DiscardableContent / aka keep_content in compositor?' messages, then you need to add 'keep_content' to the compositor's declaration. e.g.

Old:
texture prevFrameDepthBuffer target_width target_height PFG_R32_FLOAT uav

New:
texture prevFrameDepthBuffer target_width target_height PFG_R32_FLOAT uav keep_content

This is normal for textures whose contents are meant to be carried over from the previous frame.

keep_content is for RenderTextures and/or UAVs. The question is whether they can start fresh clean in a new frame; or we must preserve their changes between frames.

Is keep_content for UAV textures or normal textures? And what about a UAV buffer whose content have to be preserved among multiple frames?

Normal textures: No, their contents are always kept
UAV textures: Yes, if you need to preserve their contents
UAV buffers: This flag is only for textures, thus there is no need

Another way to look at it is to think of Multi-GPU rendering:

If GPU0 works on frame A and GPU1 works on frame B... can we assume each GPU can have its own independent version? or must GPU1 wait for GPU0 to finish and transfer the contents before GPU1 can start working with that texture?

This only applies to textures (for the time being), so UAV buffers don't have this flag. However they might in the future should we support multi-GPU rendering.

We discard by default in the compositor because the vast majority of textures are either temporary (i.e. used for ping-pong FXs) or reconstructed from scratch every frame.

However textureManager->createOrRetrieve disables this flag by default (the C++ flag is TextureFlags::DiscardableContent) to avoid breaking more complex algorithms.

keep_content isn't just for multi-GPU rendering (i.e. it allows the HW to reset all compression flags, start working on a resource before other components within the GPU are finished), but it is the easiest way to understand it.

Dynamic shadow maps don't need this flag.

But fully or partially static shadow maps (like shown in StaticShadowMaps sample, see StaticShadowMaps.compositor) need this flag, since contents are kept between frames until manually updated.

Do not call notifyDataIsReady more than needed

In Ogre 2.2 you could call notifyDataIsReady as many times as you want. In fact we gave the following example which is now possibly wrong:

stagingTexture->startMapRegion();
TextureBox box0 = stagingTexture->mapRegion( width, height, depth, slices, pixelFormat );
... write to box0.data ...
TextureBox box1 = stagingTexture->mapRegion( width, height, depth, slices, pixelFormat );
... write to box1.data ...
stagingTexture->stopMapRegion();
stagingTexture->upload( box0, texture0 );
stagingTexture->upload( box1, texture1 );
textureManager.removeStagingTexture( stagingTexture );
texture0->notifyDataIsReady();
texture1->notifyDataIsReady();

However in Ogre 2.3 every notifyDataIsReady must previously have had a call to scheduleTransitionTo (Resident) or scheduleReupload, assuming you didn't cancel the load e.g. assuming you didn't do this:

tex->scheduleTransitionTo( GpuResidency::Resident );
tex->scheduleTransitionTo( GpuResidency::OnStorage );
// Must NOT call tex->notifyDataIsReady();

Additionally for ManualTexture textures, Ogre automatically calls notifyDataIsReady as soon as the texture becomes resident. Thus notifyDataIsReady shouldn not be called by the user if TextureFlags::ManualTexture flag is set.

The solution is simple: Remove the call to notifyDataIsReady, since it wasn't previous needed, and now it must not be there.

If the assert is triggering inside Ogre, then it means you previously called notifyDataIsReady on that texture and now Ogre is doing it again. Find where you're calling notifyDataIsReady and remove that line.

Terra, SSAO, Postprocessing samples and v1 Overlays were updated to reflect this change.

Global changes for Vulkan compatibility:

  • GraphicsSystem::initialize has changed slightly. Look for SDL2x11 and VSync Method
  • The last compositor pass to render to the screen must set RenderPassDescriptor::mReadyWindowForPresent (Vulkan requirement). This is handled automatically in CompositorManager2::prepareRenderWindowsForPresent whenever the compositor chain changes. However if the workspace that presents to screen is disabled (i.e. you call CompositorWorkspace::_update manually) then you'll have to set this out yourself. The same happens if the last pass is a custom pass calls initialize( rtv ) but never calls setRenderPassDescToCurrent.
  • EmptyProject's CMake scripts have been updated to account Vulkan plugin

ReadOnlyBuffers

This affects all RS and is a performance optimization (also we needed this to support Android). To quote documentation:

Represents the best way to access read-only data in shaders.
But how it is implemented depends largely on which HW/API it is running on:

  • Buffer<> aka texture buffer in D3D11 on D3D10+ HW
  • samplerBuffer aka texture buffer in GL3 on GL3/D3D10 HW
  • SSBO aka UAV buffer in GL4 on GL4/D3D11 HW
  • SSBO aka UAV buffer in Vulkan
  • Metal doesn't matter, there are no differences between TexBuffer & UavBuffer, however we consume the slots reserved for tex buffers.

In short, ReadOnlyBufferPacked either behaves as a TexBufferPacked or as an UavBufferPacked` (but read only) depending on HW and API being used.

Some existing code has changed to use ReadOnlyBufferPacked where it originally used TexBufferPacked.
This means in shader code, these buffers should be accessed via the readOnlyFetch() macro.

i.e.

// GLSL
// old code:
float4 value = texelFetch( matrixBuf, idx );
// new code
float4 value = readOnlyFetch( matrixBuf, idx );

// Note that readOnlyFetch macro may expand to either
float4 value = matrixBuf[idx]; // Modern GPUs
float4 value = texelFetch( matrixBuf, idx ); // Old GPUs

In HLSL the difference is less noticeable, but variables are now declared as StructuredBuffer<floatN> instead of Buffer<floatN>

Additionally, Matrix functions in Matrix_piece_all.glsl/hlsl have been modified to only work with matrices stored in ReadOnly buffers instead of Tex buffers.

If your custom modifications to Hlms access buffers that are now ReadOnlyBufferPacked, you need to change your shaders to use readOnlyFetch. This may sound scary but it's just routinely changing code that refuses to compile with the new syntax.

Is Vulkan RenderSystem stable?

Yes. It's not beta. It has been tested on multiple platforms and GPUs (NVIDIA, AMD Mesa, AMD Propietary on Linux, AMD Propietary on Windows, Intel Mesa, Intel Windows, Qualcomm, ARM) and it works very well, sometimes even better than GL3+.

But given that it is the newest RenderSystem, please report any issue you encounter.

However please note:

  • Use the latest drivers. Some vendors (e.g. NVIDIA and AMD) have fixed bugs that caused some issues with our implementation.
  • It works on Android very well, but many users may be stuck on older implementations and could have issues. We workaround multiple driver bugs but in general the older the Android version, the worse it gets. Android 7.0 will work but it is not recommended as drivers on that version have lots of bugs. Android 10 drivers work extremely well.
  • Our default settings are memory-greedy. On older GPUs (e.g. 2GB models), you could run out of VRAM and run at reduced performance. Make sure to tweak your memory settings to achieve best results. See Samples/2.0/Tutorials/Tutorial_Memory on how to monitor and tweak memory. Also see Samples/2.0/Tests/MemoryCleanup

See known issues.

Known issues

  • VulkanVaoManager::cleanupEmptyPools is not implemented and will raise an exception when called. This may be implemented at a later date.
    • Vulkan memory manager had to be tweaked and recycles memory much better than the GL3+, D3D11 and Metal RenderSystems, hence there may not be a need to call this function at all
  • Vulkan: Taking a screenshot from the Render Window (i.e. downloading its contents) is not implemented.
@darksylinc darksylinc self-assigned this Jun 15, 2020
@TaaTT4
Copy link
Contributor

TaaTT4 commented Sep 22, 2020

Must keep_content being used just for UAV textures or for normal textures either? And what about a UAV buffer whose content have to be preserved among multiple frames?

@darksylinc
Copy link
Member Author

darksylinc commented Sep 22, 2020

keep_content is for RenderTextures and/or UAVs. The question is whether they can start fresh clean in a new frame; or we must preserve their changes between frames.

Another way to look at it is to think of Multi-GPU rendering:

If GPU0 works on frame A and GPU1 works on frame B; can we assume each GPU can have its own independent version; or must GPU1 wait for GPU0 to finish and transfer the contents before GPU1 can start working with that texture?

This only applies to textures (for the time being), so UAV buffers don't have this flag. However they might in the future should we support multi-GPU rendering.

We discard by default in the compositor because the vast majority of textures are either temporary (i.e. used for ping-pong FXs) or reconstructed from scratch every frame.

However textureManager->createOrRetrieve disables this flag by default (the C++ flag is TextureFlags::DiscardableContent) to avoid breaking more complex algorithms.

keep_content isn't just for multi-GPU rendering (i.e. it allows the HW to reset all compression flags, start working on a resource before other components within the GPU are finished), but it is the easiest way to understand it.

@TaaTT4
Copy link
Contributor

TaaTT4 commented Sep 22, 2020

keep_content is for RenderTextures and/or UAVs. The question is whether they can start fresh clean in a new frame; or we must preserve their changes between frames.

Why does this isn't valid for shadow textures? They need keep_content flag, but don't they start fresh clean in a new frame (I'm just asking for curiosity)?

@TaaTT4
Copy link
Contributor

TaaTT4 commented Sep 22, 2020

The last compositor pass to render to the screen must set RenderPassDescriptor::mReadyWindowForPresent (Vulkan requirement). This is handled automatically in CompositorManager2::prepareRenderWindowsForPresent whenever the compositor chain changes. However if you the workspace that presents to screen is disabled (i.e. calls CompositorWorkspace::_update manually) then you'll have to set this out yourself. The same happens if the last pass is a custom pass calls initialize( rtv ) but never calls setRenderPassDescToCurrent.

Could you please detail a bit more this point? The utility I use to take screenshots do exactly what you mention on a keypress (creates a separate workspace that renders in a rtt instead of a window and manually invokes _update). What I am supposed to do?

@darksylinc
Copy link
Member Author

keep_content is for RenderTextures and/or UAVs. The question is whether they can start fresh clean in a new frame; or we must preserve their changes between frames.

Why does this isn't valid for shadow textures? They need keep_content flag, but don't they start fresh clean in a new frame (I'm just asking for curiosity)?

It should be valid for shadow textures. If it's being rejected, it's a parser bug.

Dynamic shadow maps don't need this flag
But fully or partially static shadow maps (like shown in StaticShadowMaps sample, see StaticShadowMaps.compositor) need this flag, since contents are kept between frames until manually updated.

Could you please detail a bit more this point? The utility I use to take screenshots do exactly what you mention on a keypress (creates a separate workspace that renders in a rtt instead of a window and manually invokes _update). What I am supposed to do?

If you don't touch the render window, then nothing. This flag is specifically for the last pass that touches the render window. The last pass in a frame is responsible for preparing the render window for presentation to the screen, attempting to use it afterwards (whether for reading or writing) is undefined behaviour.

Thus if pass N prepares the window for presentation but you manually execute pass N+1 afterwards which needs the render window, then pass N must set mReadyWindowForPresent = false and call entriesModified( RenderPassDescriptor::Colour ), while pass N+1 must do the same but with true.

This is a bit user hostile, thus I'm still thinking of better ways to solve it.

Also because this flag is very specific to Vulkan, and taking screenshots from the Render Window is still WIP, exactly what needs to be done is subject to change.

But if you render to a temporary RTT and take a screenshot of that RTT, then nothing needs to be done. The render window is not involved at all.

darksylinc added a commit that referenced this issue May 15, 2021
Behavior of TextureGpu::notifyDataIsReady changed. Calling it
excessively is now considered incorrect.

Documented in #101

Added mDataPreparationsPending which is required. Otherwise calling
scheduleReupload (>=1x times) then destroying the texture will
immediately destroy the pointer while it may actually be still used in
the worker thread.

Previously this was not needed because if a texture was in the worker
thread, then it meant notifyDataIsReady had never been called yet since
the last time we transitioned to Resident; thus destroyTexture would
automatically delay the destruction.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants