This is a demo of a possible use of Slang shader compiler together with WebGPU in C++ (both in native and Web contexts), using CMake.
Key Features
- The CMake setup fetches Slang compiler and a WebGPU implementation.
- Slang shaders are compiled into WGSL upon compilation, with CMake taking care of dependencies.
- Slang reflection API is used to auto-generate boilerplate binding code on the C++ side.
- Example of auto-differentiation features of Slang are given.
- Build instruction for native and Web targets.
- Provides a
add_slang_shader
andtarget_link_slang_shader
to help managing Slang shader targets (incmake/SlangUtils.cmake
).
Warning
The WebGPU API is still a Work in Progress at the time I write these lines. To make sure this setup works, use the very webgpu
directory provided in this repository, which fetches the very version of the API for which it was developed (which is Dawn-specific, BTW). When using emscripten, use version 3.1.72.
Note
This example relies on the webgpu.hpp
and webgpu-raii.hpp
shallow wrapper of webgpu.h
provided WebGPU-distribution. If this would be a deal-breaker for your use case, you are welcome to open an issue or a pull request that addresses this, as there should be no particular blocker to get rid of it.
🤔 Ergl, I don't want codegen, but I'm interested in the Slang to WGSL part...
Sure, have a look at examples/00_no_codegen
. All it needs is cmake/FetchSlang.cmake
and cmake/SlangUtils.cmake
, but you can strip down the whole codegen part if you don't like it.
Slang is a great modern shader programming language, but of course it does not natively run in WebGPU. One possible workaround is to ship the Slang compiler to the client, but this is heavy both in bandwidth and client computation time, so this setup rather transpiles shaders into WGSL (WebGPU shading language) at compilation time.
Compiling shaders are one thing, but when working on a shader-intensive application, one easily spends too much time writing kernel binding boilerplate rather than actuall business logic (i.e., the nice things we actually want to code). Among Slang's nice features is a reflection API, which this project benefits from to automatically generate kernel binding code.
Let us consider the following example:
// hello-world.slang
StructuredBuffer<float> buffer0;
StructuredBuffer<float> buffer1;
RWStructuredBuffer<float> result;
[shader("compute")]
[numthreads(8,1,1)]
void computeMain(uint3 threadId : SV_DispatchThreadID)
{
uint index = threadId.x;
result[index] = buffer0[index] + buffer1[index];
}
Our automatic code generation will create a HelloWorldKernel.h
and HelloWorldKernel.cpp
files that can be used as follows:
// main.cpp
#include "generated/HelloWorldKernel.h"
// (assuming 'device' is a wgpu::Device object)
generated::HelloWorldKernel kernel(device);
// (assuming 'buffer0', 'buffer1' and 'result' are wgpu::Buffer objects)
wgpu::BindGroup bindGroup = kernel.createBindGroup(buffer0, buffer1, result);
// First argument can be ThreadCount{ ... } or WorkgroupCount{ ... }
kernel.dispatch(ThreadCount{ 10 }, bindGroup);
Note for instance how the signature of createBindGroup
is automatically adapted to the resources declared in the shader.
All it takes to use the generator is to use the custom add_slang_webgpu_kernel
command that we define in cmake/SlangUtils.cmake
:
add_slang_webgpu_kernel(
generate_hello_world_kernel
NAME HelloWorld
SOURCE shaders/hello-world.slang
ENTRY computeMain
)
The target generate_hello_world_kernel
is a static library target that builds HelloWorldKernel
.
Lastly, this repository provides a basic setup to fetch precompiled Slang library in a CMake project (see cmake/FetchSlang.cmake
) that is compatible with cross-compilation.
This project can be used either in a fully native scenario, where the target is an executable, or in a web cross-compilation scenario, where the target is a WebAssembly module (and HTML demo page).
Nothing surprising in this case:
# Configure the build
cmake -B build
# Compile everything
cmake ---build build
Note
This project uses CMake and tries to bundle as many dependencies as possible. However, it will fetch at configuration time the following:
- A prebuilt version of Dawn, to get a WebGPU implementation (wgpu-native is a possible alternative, modulo some tweaks).
- A prebuilt version of Slang, both executable and library.
You may then explore build/examples
to execute the various examples.
Cross-compilation and code generation are difficult roommates, but here is how to get them along together: we create 2 build directories.
- We configure a
build-generator
build, where we can disable the examples so that it only builds the generator. Indeed, even if the end target is a WebAssembly module, we still need the generator to build for the host system (where the compilation occurs):
# Configure a native build, to compile the code generator
# We can turn on examples (or reuse the native build done previously and skip this)
cmake -B build-native -DSLANG_WEBGPU_BUILD_EXAMPLES=OFF
Note
Setting SLANG_WEBGPU_BUILD_EXAMPLES=OFF
has the nice byproduct of not fetching Dawn, because WebGPU is not needed for the generator, and the WebAssembly build has built-in support for WebGPU (so not need for Dawn).
We then build the generator target:
# Build the generator with the native build
cmake --build build-native --target slang_webgpu_generator
- We configure a
build-web
build withemcmake
to put in place the cross-compilation to WebAssembly, and this time we do not build the generator (SLANG_WEBGPU_BUILD_GENERATOR=OFF
), but rather tell CMake where to find it with theSlangWebGPU_Generator_DIR
option:
# Configure a cross-compilation build
emcmake cmake -B build-web -DSLANG_WEBGPU_BUILD_GENERATOR=OFF -DSlangWebGPU_Generator_DIR=build-native
Note
The emcmake
command is provided by emscripten. Make sure you activate your emscripten environment first (and select preferably version 3.1.72).
We can now build the WebAssembly module, which will call the generator from build-native
whenever needed:
# Build the Web targets
cmake --build build-web
And it is now ready to be tested!
# Start a local server
python -m http.server 8000
Then browse for instance to:
- http://localhost:8000/build-web/examples/00_no_codegen/slang_webgpu_example_00_no_codegen.html
- http://localhost:8000/build-web/examples/01_simple_kernel/slang_webgpu_example_01_simple_kernel.html
- http://localhost:8000/build-web/examples/02_multiple_entrypoints/slang_webgpu_example_02_multiple_entrypoints.html
- http://localhost:8000/build-web/examples/03_module_import/slang_webgpu_example_03_module_import.html
- http://localhost:8000/build-web/examples/04_uniforms/slang_webgpu_example_04_uniforms.html
- http://localhost:8000/build-web/examples/05_autodiff/slang_webgpu_example_05_autodiff.html
This repository is only meant to be a demo. To go further, start from one of the examples and progressively turn it into something more complex. You may eventually want to move your example into src/
. You will probably be tempted to tune the generator a bit and add all sorts of fun extensions: do it!
Note
Examples in the examples/
directory are sorted from the less complex to the most complex. Each example has its own README.
This is a proof of concept more than a fully fledged framework, and it is missing a lot of features that I will probably add progressively, or that you are more than welcome to suggest through a Pull Request:
- Add support for texture bindings (shouldn't be too difficult, just a matter of adding it to the generator).
- Auto-generate the
Uniform
struct in each kernel that has global uniform parameters (started already, though a bit boring to write all the cases of nested structs). - Add support to local uniform parameters (only global parameters are handled for now), which may lead to multiple
createBindGroup
methods for the same kernel (not sure). - Try more complex scenarios, for instance the 2D gaussian splatting one of Slang playground.
- Add proper CI workflow to check that everything works as expected.
Although this repo can be reused as is, it may be of interest to know how it was originally built, so here is a rough log of the steps I followed during its construction:
-
Getting WebGPU: I get from WebGPU-distribution a CMake setup that fetches a precompiled version. I copied the content of the
fetch
branch (soon to becomemain
) except.github
tothird_party/webgpu
and added a fatal error in case one tries to usewgpu-native
because I will focus on Dawn for now. -
Getting Slang: I write
cmake/FetchSlang.cmake
to fetch precompiled Slang from official releases, freezing tov2024.14.5
to ensure this repo doesn't break over time. -
Write
cmake/SlangUtils.cmake
to trigger slang compiler from CMake build setup. -
Write a basic
Kernel
class that contains everything needed to dispatch a compute job (for instance getting inspiration from gpu.cpp):
struct Kernel {
std::string name;
std::vector<raii::BindGroupLayout> bindGroupLayouts;
raii::ComputePipeline pipeline;
};
Note
If you are interested in an example that only calls slangc
to compile Slang shaders into WGSL but does no code generation, you may have a look at the no_codegen
example.
-
As we notice that a large part of the code to create a kernel (layout and bind groups) is redundant with binding information provided by the shader, we use Slang reflection API to generate some code. We thus create
src/generator
, using Slang Playground to explore the reflection API. For this, we addCLI11
to thethird_party
directory. -
To write the generator, follow instructions from Slang Compilation API documentation, then Slang Reflection API documentation.
-
Create a very basic template system so that the generated file can be drafter in
src/generator/binding-template.tpl
. Addingmagic_enum
to thethird_party
directory to produce nicer error messages.
Warning
Our binding generator does not handle every single one of the many cases of bindings that Slang supports. To add extra mechanism, go have a look at BindingGenerator
in src/generator/main.cpp
.
-
Split the CMake build into two stages to be able to separately build the generator and the examples when cross-compiling to WebAssembly (because the generator must be built for the host system, rather than for the target system). Following instructions from CMake documentation about cross-compilation.
-
Generalize to cases where we want to expose multiple entry points from the same shader, hence implementing multiple
dispatch
methods. -
Progressively complexify the generator to handle slang shader file inclusion (through a Depfile), to support global uniform buffers, etc.
# Memo to look for TODOs:
grep -R TODO --exclude-dir=build* --exclude-dir=third_party