-
Notifications
You must be signed in to change notification settings - Fork 101
GPU acceleration
GPU acceleration for YCbCr->RGB conversion and drawing is done in the JavaScript version with WebGL where supported. This can have a significant impact on performance, essentially reducing the colorspace conversion time to "free" and improving drawing speed for larger images.
WebGL is a fairly low-level wrapper around OpenGL ES, which is basically a funky subset of OpenGL. There's a lot of boilerplate needed just to draw a rectangle (excuse me; two triangles!)
It boils down to:
- write a "vertex shader" program to project your geometry to 2d coordinates (really easy for a 2d rectangle)
- write a "fragment shader" program to assign color for each pixel drawn
- buffer your triangles and upload your textures, and assign them to the inputs to the shader programs
- gl.drawTriangles()!
Drawing an RGB image is fairly "easy": upload it as an RGB or RGBA texture, and have the fragment shader just call the primitive function to sample the texture at the appropriate point.
Doing YCbCr conversion at the same time means uploading three textures -- one for each source color plane -- and having the fragment shader sample all three, perform the necessary arithmetic, and output an RGB triple. This math goes much faster on the GPU than on the CPU, in large part because we're able to run a lot of compute units in parallel.
Note that IE 11 Update 1 adds luminance-mode and alpha-mode textures, but very inefficiently, so this workaround is still used on IE and Edge for performance reasons as of July 2015.
To minimize copying of data, my initial attempt used 'luminance'-mode textures: taking one byte per pixel from the source buffer and translating that straight to brightness figures. However, IE 11's WebGL implementation only supports RGB and RGBA textures... copying the color data out of packed 1-byte-per-pixel buffers into 4-byte-per-pixel buffers is expensive, especially on IE where copying typed arrays with 'set' or the copy constructor is slow.
The code now uploads the color plane textures straight out of the emscripten heap, labeled as RGBA textures at 1/4 their real width. The fragment shader then extracts the 1-byte pixel values out of the RGBA "channels" by combining the 4-channel sample vector with a sample from a special striped texture that contains a 1.0 value in the correct channel for each 'subpixel'.
It's kind of crazy, but it seems to work and doesn't require futzing about with texel coordinates or integer math in the shader. Can probably be made more efficient, but seems to run quite fast on everything I've tested.
Safari on Mac OS X now supports WebGL by default; on some older versions it's available as a developer option, but is not enabled by default. With it on, the WebGL drawing mode works great.
Safari on iOS 8 has WebGL enabled, and it works beautifully.
Roughly the same is doable with Stage3d, Flash's weird cousin of OpenGL ES... did some early experimentation with it. The same channel-packing trick was needed as for old versions of IE.
Shaders were converted with https://github.com/adobe/glsl2agal
Stage3d didn't perform well, was buggy, and scaled poorly, so dumped it.