This repository contains an MIT licensed demo of DirectX12 Sampler Feedback Streaming, a technique using DirectX12 Sampler Feedback to guide continuous loading and eviction of small portions (tiles) of assets allowing for much higher visual quality than previously possible by making better use of GPU memory capacity. Sampler Feedback Streaming allows scenes consisting of 100s of gigabytes of resources to be drawn on GPUs containing much less physical memory. The scene below uses just ~200MB of a 1GB heap, despite over 350GB of total texture resources.
The demo requires Windows 10 20H1 (aka May 2020 Update, build 19041) or later and a GPU with Sampler Feedback Support, such as Intel Iris Xe Graphics as found in 11th Generation Intel® Core™ processors and discrete GPUs (driver version 30.0.100.9667 or later).
This repository will be updated when DirectStorage for Windows® becomes available.
See also:
-
GDC 2021 video (alternate link) which provides an overview of Sampler Feedback and discusses this sample starting at about 15:30.
-
GDC 2021 presentation in PDF form
Textures derived from Hubble Images, see the Hubble Copyright
Note the textures shown above, which total over 13GB, are not part of the repo. A few 16k x 16k textures are available as a release
Test textures are provided, as is a mechanism to convert from BCx format DDS files into the custom .XET format.
Download the source. Build the solution file SamplerFeedbackStreaming.sln (tested with Visual Studio 2019).
All executables, scripts, configurations, and media files will be found in the x64/Release or x64/Debug directories.
To run within Visual Studio, change the working directory to $(TargetDir) under Properties/Debugging:
Or cd to the build directory (x64/Release or x64/Debug) and run from the command line:
c:\SamplerFeedbackStreaming\x64\Release> expanse.exe
On nvidia drivers prior to 496.13, it is recommended to add -config nvidia.json
to the command line. See the below description of json files and configurations.
c:\SamplerFeedbackStreaming\x64\Release> expanse.exe -config nvidia.json
By default (no command line options) there will be a single object, "terrain", which allows for exploring sampler feedback streaming. In the top right find 2 windows: on the left is the raw GPU min mip feedback, on the right is the min mip map "residency map" generated by the application. Across the bottom are the mips of the texture, with mip 0 in the bottom left. Left-click drag the terrain to see sampler feedback streaming in action.
The batch file demo.bat starts in a more interesting state. Note keyboard controls are inactive while Camera animation is non-zero.
c:\SamplerFeedbackStreaming\x64\Release> demo.bat
The high-resolution textures in the first "release" package, hubble-16k.zip, work with "demo-hubble.bat", including a sky and earth. Make sure the mediadir in the batch file is set properly, or override it on the command line as follows:
c:\SamplerFeedbackStreaming\x64\Release> demo-hubble.bat -mediadir c:\hubble-16k
qwe / asd
: strafe left, forward, strafe right / rotate left, back, rotate rightz c
: levitate up and downx
: toggles "up lock". When hovering over the "terrain" object, locking the up direction "feels right" with mouse navigation. Otherwise, it should be turned off.v b
: rotate around the look direction (z axis)arrow keys
: rotate left/right, pitch down/upshift
: move fastermouse left-click drag
: rotate viewpage up
: toggle the min mip map viewer for the "terrain" geometry in the center of the universepage down
: while camera animation is non-zero, toggles fly-through "rollercoaster" vs. fly-around "orbit"space
: toggles camera animation on/off.home
: toggles UI. Hold "shift" while UI is enabled to toggle mini UI mode.end
: toggle overlay of min mip map onto every objectinsert
: toggles frustum. This behaves a little wonky.esc
: while windowed, exit. while full-screen, return to windowed mode
For a full list of command line options, pass the command line "?", e.g.
c:> expanse.exe ?
Most of the detailed settings for the system can be found in the default configuration file config.json.
The options in the json have corresponding command lines, e.g.:
json:
"mediaDir" : "media"
equivalent command line:
-mediadir media
On nvidia devices using drivers prior to 496.13, it is recommended to add -config nvidia.json
to the command line, e.g.:
c:\SamplerFeedbackStreaming\x64\Release> demo.bat -config nvidia.json
c:\SamplerFeedbackStreaming\x64\Release> stress.bat -mediadir c:\hubble-16k -config nvidia.json
This config works around an issue where performance decays over time as the tiles in the heap become fragmented relative to resources. Specifically, the CPU time for UpdateTileMappings limits the system throughput. The workaround distribute textures across many small heaps, which can result in artifacts if the small heaps fill.
The executable DdsToXet.exe
converts BCn DDS textures to the custom XET format. Only BC1 and BC7 textures have been tested. Usage:
c:> ddstoxet.exe -in myfile.dds -out myfile.xet
The batch file convert.bat will read all the DDS files in one directory and write XET files to a second directory. The output directory must exist.
c:> convert c:\myDdsFiles c:\myXetFiles
The sample includes a library TileUpdateManager with a minimal set of APIs defined in SamplerFeedbackStreaming.h. The central object, TileUpdateManager, allows for the creation of streaming textures and heaps to contain them. These objects handle all the feedback resource creation, readback, processing, and file/IO.
The application creates a TileUpdateManager and 1 or more heaps in Scene.cpp:
m_pTileUpdateManager = std::make_unique<TileUpdateManager>(m_device.Get(), m_commandQueue.Get(), tumDesc);
// create 1 or more heaps to contain our StreamingResources
for (UINT i = 0; i < m_args.m_numHeaps; i++)
{
m_sharedHeaps.push_back(m_pTileUpdateManager->CreateStreamingHeap(m_args.m_streamingHeapSize));
}
Each SceneObject creates its own StreamingResource. Note a StreamingResource can be used by multiple objects, but this sample was designed to emphasize the ability to manage many resources and so objects are 1:1 with StreamingResources.
m_pStreamingResource = std::unique_ptr<StreamingResource>(in_pTileUpdateManager->CreateStreamingResource(in_filename, in_pStreamingHeap));
The demo exhibits texture cracks due to the way feedback is used. Feedback is always read after drawing, resulting in loads and evictions corresponding to that frame only becoming available for a future frame. That means we never have exactly the texture data we need when we draw (unless no new data is needed). Most of the time this isn't perceptible, but sometimes a fast-moving object enters the view resulting in visible artifacts.
The following image shows an exaggerated version of the problem, created by disabling streaming completely then moving the camera:
In this case, the hardware sampler is reaching across tile boundaries to perform anisotropic sampling, but encounters tiles that are not physically mapped. D3D12 Reserved Resource tiles that are not physically mapped return black to the sampler. This could be mitigated by dilating or eroding the min mip map such that there is no more than 1 mip level difference between neighboring tiles. That visual optimization is TBD.
There are also a few known bugs:
- entering full screen in a multi-gpu system moves the window to a monitor attached to the GPU by design. However, if the window starts on a different monitor, it "disappears" on the first maximization. Hit escape then maximize again, and it should work fine.
- full-screen while remote desktop is not borderless.
This implementation of Sampler Feedback Streaming uses DX12 Sampler Feedback in combination with DX12 Reserved Resources, aka Tiled Resources. A multi-threaded CPU library processes feedback from the GPU, makes decisions about which tiles to load and evict, loads data from disk storage, and submits mapping and uploading requests via GPU copy queues. There is no explicit GPU-side synchronization between the queues, so rendering frame rate is not dependent on completion of copy commands (on GPUs that support concurrent multi-queue operation) - in this sample, GPU time is mostly a function of the Sampler Feedback Resolve() operations described below. The CPU threads run continuously and asynchronously from the GPU (pausing when there's no work to do), polling fence completion states to determine when feedback is ready to process or copies and memory mapping has completed.
All the magic can be found in the TileUpdateManager library (see the internal file TileUpdateManager.h - applications should include SamplerFeedbackStreaming.h), which abstracts the creation of StreamingResources and heaps while internally managing feedback resources, file I/O, and GPU memory mapping.
The technique works as follows:
The streaming textures are allocated as DX12 Reserved Resources, which behave like VirtualAlloc in C. Each resource takes no physical GPU memory until 64KB regions of the resource are committed in 1 or more GPU heaps. The x/y dimensions of a reserved resource tile is a function of the texture format, such that it fills a 64KB GPU memory page. For example, BC7 textures have 256x256 tiles, while BC1 textures have 512x256 tiles.
In Expanse, each tiled resource corresponds to a single .XeT file on a hard drive (though multiple resources can point to the same file). The file contains dimensions and format, but also information about how to access the tiles within the file.
To use sampler feedback, we create a feedback resource corresponding to each streaming resource, with identical dimensions to record information about which texels were sampled.
For this streaming usage, we use the min mip feedback feature by creating the resource with the format DXGI_FORMAT_SAMPLER_FEEDBACK_MIN_MIP_OPAQUE. We set the region size of the feedback to match the tile dimensions of the tiled resource (streaming resource) through the SamplerFeedbackRegion member of D3D12_RESOURCE_DESC1.
For the feedback to be written by GPU shaders (in this case, pixel shaders) the texture and feedback resources must be paired through a view created with CreateSamplerFeedbackUnorderedAccessView.
For expanse, there is a "normal" non-feedback shader named terrainPS.hlsl and a "feedback-enabled" version of the same shader, terrainPS-FB.hlsl. The latter simply writes feedback using WriteSamplerFeedback HLSL intrinsic, using the same sampler and texture coordinates, then calls the prior shader. Compare the WriteSamplerFeedback() call below to to the Sample() call above.
To add feedback to an existing shader:
- include the original shader hlsl
- add binding for the paired feedback resource
- call the WriteSamplerFeedback intrinsic with the resource and sampler defined in the original shader
- call the original shader
#include "terrainPS.hlsl"
FeedbackTexture2D<SAMPLER_FEEDBACK_MIN_MIP> g_feedback : register(u0);
float4 psFB(VS_OUT input) : SV_TARGET0
{
g_feedback.WriteSamplerFeedback(g_streamingTexture, g_sampler, input.tex.xy);
return ps(input);
}
Sampler Feedback resources are opaque, and must be Resolved before interpretting on the CPU.
Resolving feedback for one resource is inexpensive, but adds up when there are 1000 objects. Expanse has a configurable time limit for the amount of feedback resolved each frame. The "FB" shaders are only used for a subset of resources such that the amount of feedback produced can be resolved within the time limit. The time limit is managed by the application, not by the TileUpdateManager library, by keeping a running average of resolve time as reported by GPU timers.
As an optimization, Expanse tells streaming resources to evict all tiles if they are behind the camera. This could potentially be improved to include any object not in the view frustum.
You can find the time limit estimation, the eviction optimization, and the request to gather sampler feedback by searching Scene.cpp for the following:
- DetermineMaxNumFeedbackResolves determines how many resources to gather feedback for
- QueueEviction tell runtime to evict tiles for this resource (as soon as possible)
- SetFeedbackEnabled results in 2 actions:
- tell the runtime to collect feedback for this object via TileUpdateManager::QueueFeedback(), which results in clearing and resolving the feedback resource for this resource for this frame
- use the feedback-enabled pixel shader for this object
The resolved Min mip feedback tells us the minimum mip tile that should be loaded. The min mip feedback is traversed, updating an internal reference count for each tile. If a tile previously was unused (ref count = 0), it is queued for loading from the bottom (highest mip) up. If a tile is not needed for a particular region, its ref count is decreased (from the top down). When its ref count reaches 0, it might be ready to evict.
Data structures for tracking reference count, residency state, and heap usage can be found in StreamingResource.cpp and StreamingResource.h, look for TileMappingState. This class also has methods for interpreting the feedback buffer (ProcessFeedback) and updating the residency map (UpdateMinMipMap), which execute concurrently in separate CPU threads.
class TileMappingState
{
public:
// see file for method declarations
private:
TileLayer<BYTE> m_resident;
TileLayer<UINT32> m_refcounts;
TileLayer<UINT32> m_heapIndices;
};
TileMappingState m_tileMappingState;
Tiles can only be evicted if there are no lower-mip-level tiles that depend on them, e.g. a mip 1 tile may have four mip 0 tiles "above" it in the mip hierarchy, and may only be evicted if all 4 of those tiles have also been evicted. The ref count helps us determine this dependency.
A tile also cannot be evicted if it is being used by an outstanding draw command. We prevent this by delaying evictions a frame or two depending on swap chain buffer count (i.e. double or triple buffering). If a tile is needed before the eviction delay completes, the tile is simply rescued from the pending eviction data structure instead of being re-loaded.
The mechanics of loading, mapping, and unmapping tiles is all contained within the DataUploader class, which depends on a FileStreamer class to do the actual tile loads. The latter implementation (FileStreamerReference) can easily be exchanged with DirectStorage for Windows.
Because textures are only partially resident, we only want the pixel shader to sample resident portions. Sampling texels that are not physically mapped that returns 0s, resulting in undesirable visual artifacts. To prevent this, we clamp all sampling operations based on a residency map. The residency map is relatively tiny: for a 16k x 16k BC7 texture, which would take 350MB of GPU memory, we only need a 4KB residency map. Note that the lowest-resolution "packed" mips are loaded for all objects, so there is always something available to sample. See also GetResourceTiling.
When a texture tile has been loaded or evicted by TileUpdateManager, it updates the corresponding residency map. The residency map is an application-generated representation of the minimum mip available for each region in the texture, and is described in the Sample Feedback spec as follows:
The MinMip map represents per-region mip level clamping values for the tiled texture; it represents what is actually loaded.
Below, the Visualization mode was set to "Color = Mip" and labels were added. TileUpdateManager processes the Min Mip Feedback (left window in top right), uploads and evicts tiles to form a Residency map, which is a proper min-mip-map (right window in top right). The contents of memory can be seen in the partially resident mips along the bottom (black is not resident). The last 3 mip levels are never evicted because they are packed mips (all fit within a 64KB tile). In this visualization mode, the colors of the texture on the bottom correspond to the colors of the visualization windows in the top right. Notice how the resident tiles do not exactly match what feedback says is required.
To reduce GPU memory, a single combined buffer contains all the residency maps for all the resources. The pixel shader samples the corresponding residency map to clamp the sampling function to the minimum available texture data available, thereby avoiding sampling tiles that have not been mapped.
We can see the lookup into the residency map in the pixel shader terrainPS.hlsl. Resources are defined at the top of the shader, including the reserved (tiled) resource g_streamingTexture, the residency map g_minmipmap, and the sampler:
Texture2D g_streamingTexture : register(t0);
Buffer<uint> g_minmipmap: register(t1);
SamplerState g_sampler : register(s0);
The shader offsets into its region of the residency map (g_minmipmapOffset) and loads the minimum mip value for the region to be sampled.
int2 uv = input.tex * g_minmipmapDim;
uint index = g_minmipmapOffset + uv.x + (uv.y * g_minmipmapDim.x);
uint mipLevel = g_minmipmap.Load(index);
The sampling operation is clamped to the minimum mip resident (mipLevel).
float3 color = g_streamingTexture.Sample(g_sampler, input.tex, 0, mipLevel).rgb;
There is some work that needs to be done before drawing objects that use feedback (clearing feedback resources), and some work that needs to be done after (resolving feedback resources). TileUpdateManager creates theses commands, but does not execute them. Each frame, these command lists must be built and submitted with application draw commands, which you can find just before the call to Present() in Scene.cpp as follows:
auto commandLists = m_pTileUpdateManager->EndFrame();
ID3D12CommandList* pCommandLists[] = { commandLists.m_beforeDrawCommands, m_commandList.Get(), commandLists.m_afterDrawCommands };
m_commandQueue->ExecuteCommandLists(_countof(pCommandLists), pCommandLists);
Sample and its code provided under MIT license, please see LICENSE. All third-party source code provided under their own respective and MIT-compatible Open Source licenses.
Copyright (C) 2021, Intel Corporation