-
Notifications
You must be signed in to change notification settings - Fork 280
Swizzling story
Swizzling is the term of re-ordering the R,G,B,A components within the texture color data.
Originally, Gecko produced all the image data in BGRA format, which is what WebRender expected. The semantics of ImageFormat::BGRA8
in WebRender was overloaded, it depended on the context:
- if applied to the texture data, it meant BGRA order of bytes
- if applied to texture internal format, it meant RGBA8
Thus, WebRender pretended to work with BGRA everywhere, while actually resolving BGRA -> RGBA conversion in one of two places:
- if BGRA8 was supported as an internal texture format, we used it, and thus the texture sampler resolved the conversion for us (i.e. shaders would always read RGBA and don't know about the swizzling). This is the case for Angle on Windows, and we consider it to be the fast path.
- otherwise, we'd ask the GL driver to convert the data on texture uploads. This is the case on macOS and some Linux desktops. Since the driver can't hold onto our data after the call, it has to copy all of it and convert each color separately, which made our texture uploads to be slow on the Renderer thread on those platforms.
The texture uploads can also be done in different ways. If using glTextureStorage
, we have direct control over the mipmap allocation and the format conversions, but BGRA8 is not available on platforms in (2). If using glTexImage*
, we could have real BGRA data, but the drivers pessimistically allocate video memory for mipmaps, which isn't desirable.
The ideal solution would have no data conversion by the driver, no memory allocated for mipmaps, and no run-time cost of using the texture data. Main problematic platform is macOS, where OpenGL capabilities are modest.
Another important piece of the puzzle is that the WebRender texture cache is a 2D array texture.
There could be many ways to address this from multiple attack angles, each with its own trade-offs:
- use
glTexImage
if BGRA8 internal format is not supported. Pay with mipmap video memory allocation. Since there is no direct control over the internal format in this case, we couldn't be 100% sure that no platforms would do the conversion, still. - use
glTexImage
withGL_TEXTURE_RECTANGLE
instead of the 2D array. That would waste no space on mipmaps and allow "client storage" way of uploading the data, which is even faster. The trade-offs are:
- could be significant amount of work
- could in the end allow us to allocate less space than the 2D array in total
- unclear how this would be compatible with Angle (if we do it on all platforms), since D3D11 doesn't have an analogue. Potentially, Angle would have to always keep the size around and apply it on every texture fetch, which is not what we want.
- Pretend the image data is in RGBA format, but for shader sampling configure the GL texture unit to do the swizzling using
GL_TEXTURE_SWIZZLE_?
parameters. This was implemented in bug 1548339. The trade-offs are:
- whenever the swizzling configuration is changed for a texture, we have to break the current batch
- consequently, we can't have the same texture bound with both swizzled and unswizzled configuration
- WR pipeline needs to be fully aware of the RGBA versus BGRA differences
- special handling must be provided in places where we
glBlitFramebuffer
from a swizzled texture - more risk of driver bugs, e.g. macOS Intel SandyBridge
- Instead of configuring the texture unit, swizzle the components in the shader right after fetching the color. This would largely come with a similar set of trade-offs: minus the driver bugs, but slower start-up or ramp-up time due to increased number of shader variations.
- Change Gecko's image sources (video, blobs, raw data) to produce RGBA at all times. This is being worked on in bug 1541900.
The current state is: we made the drivers to not do any conversion on uploads and to not allocate any mipmap video memory, while using 2D array textures. On platforms without native BGRA support, we pay with increased batch breaks. In the future, this should be addressed by getting (5) fully implemented.