-
-
Notifications
You must be signed in to change notification settings - Fork 35.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add non-blocking large textures uploading to GPU - maybe via tiled uploading copyTextureToTexture
#28101
Comments
Couldn't you represent your textures as image bitmaps instead? That would solve the blocking issues since the texture decode does not block the main thread anymore. Another solution would be to transcode your texture into a compressed format. However, AVIF is similar to JPEG or PNG a compression format that isn't retained on the GPU so you end up with a decode overhead. The usage of KTX2 isn't satisfying for you? KTX2 produces a GPU format on the client side that should resolve the decode overhead since no decode happens on the CPU side. I would like to understand why you can't use one of these solutions before thinking about alternatives. |
As shown in the profiler, the blocking operation seem to be the uploadTexture itself, to the GPU by the main thread, rather than image-decode. I thought what was blocking was actually that the GPU could not render a frame during upload, rather than the CPU main thread being occupied, but this is probably a misunderstanding. AVIF is indeed akin to JPEG, PNG or WebP and require cpu-decode. We just tried with gpu-compressed KTX2 textures and the Given that this is the uploadTexture's that are blocking the main thread, would there be another way for the controls to be responsive during texture upload? Rather than engineering a complex workaround like the one described above (to tile the uploadTexture of large textures in chunks via Note on another topic, interesting read regarding virtual textures here |
I believe uploadTexture includes both the time of decoding (for JPG/PNG/WebP/AVIF...) and uploading. So if you aren't using ImageBitmap yet, it should indeed decrease the time for image formats that need to be decompressed. But as a 16k texture with mipmaps will occupy about 1.5 GB of VRAM, uploading that to the GPU is going to drop frames with or without ImageBitmap. Even with KTX2 (normally I expect 4-8x compression in VRAM) it's still a lot. Unfortunately, there's no way to upload textures from another thread. I'm unsure whether that's a WebGL limitation, or a lower-level limitation. |
Thanks for these precisions - that uploadTexture timings includes both decoding + uploading, and that textures cannot be uploaded from another thread. Since ktx2 does not require decoding before sending the texture to the GPU, then using ImageBitmap should yield no better timings, is that correct? If this is true, then we should either wait for all the textures to be loaded to the gpu, or find a way to stream these textures to the gpu bit by bit and still have the main thread and render loop reach 30-60 fps. Or use other mechanisms for loading large resolution textures, like texture streaming/progressive loading as presented in this threejs discussion or this babylon discussion for example. Thanks again for your help! |
Yes – KTX2 requires a transcoding step to a GPU-compatible compressed format, but transcoding does happen off the main thread (in WASM) before upload. ImageBitmap should improve results compared to using PNG/JPG/WebP/AVIF without ImageBitmap, but the amount of data uploaded to the GPU is still 4-8x larger than with KTX2, and I'd expect upload time to be about 4-8x worse than KTX2 accordingly. If the upload time for the 16K KTX2 texture is already unacceptable for your application, ImageBitmap will be worse. I'm less familiar with streaming texture upload, or what's required to support it... but I suspect that's the only way to upload uncompressed formats with less total blocking time than ImageBitmap offers, or to reduce the (already lower) blocking time with KTX2. |
The babylonjs thread discusses progressive upload of entire mipmap levels. This would get you a blurry version of the texture ready to render very quickly. But the largest mipmap (level=0) still represents about 1.1 GB of data for a 16k uncompressed texture, and so there's still a large chunk of blocking time before that full-res mipmap becomes available. |
@donmccurdy That's exactly what I do 😊. @jo-chemla This runs absolutely smoothly. My camera control is just a bit haptic. I've softened that up now. I load the many tiles in workers (multithreading). I now also do this with the normalmap and I can use it to display huge textures. I'm always happy about performance improvements, but loading textures dynamically without blocking the main thread works very well for me. If there had been larger textures for the spaceship, I would have used them. My site doesn't look particularly impressive, I apologize for that. Threejs can only offer what the WGSL standard enables, but threejs does that very efficiently by directly using the W3C commands. The new node-based system is very impressive. I use a tiled upload technology but it is technically very extensive. Here is the background about implementing copyTextureToTexture. That's what I wished for r163: |
What loading do you mean? The texture upload to the GPU cannot be done from a worker. Hoping to understand how your solution is implemented. |
@donmccurdy |
@jo-chemla |
It seems the project already exposes the tools which are required for implementing a more fine-grained texture upload. For now, let's leave it in the responsibility of the application to use the tools according to its specific requirement. |
Thanks all for your feedback. @Spiri0 Thanks for the link to your demo app, that's interesting. Yes I'm not looking for improved performance, since data has to be uploaded to the GPU anyway - and if this has to happen via the main thread, then the only way I can think of to avoid freezes would be to upload by chunks/tiles, a bit at each frame update/render. In terms of data volume, 100 256² textures is equivalent to a 2k texture upload, so roughly 64 less data-heavy than a single 16k texture. And indeed @Mugen87 this can be left as the responsibility of the application to implement that feature, although making that |
Description
We are developing a threejs application which will be deployed on a very controlled hardware, which allows us to use high-resolution meshes and textures. The final scene has around 1M tris (not that large) and a few 16k textures (which does take a few seconds to load). We have a small proxy mesh loaded first as LOD0 which should allow for controls to still be responsive, while the real heavy meshes geometry+textures are loaded.
However, as described in this non-blocking assets loaders thread, the primary thing blocking the main thread is uploading textures to GPU - on the following screenshot, 8s to load the 7x16k and 8k textures. During these
uploadTexture
, the scene and controls are unresponsive, since the GPU cannot process frame updates and renders during texture uploads.Solution
The above thread led to the TaskManager discussion yielding the WorkerPool.js implementation mid-2022.
Seeing the recent
copyTextureToTexture
WebGPURenderer support PR and this small stackoverflow thread I was wondering if it would be doable that the worker handling texture loading and uploading to the GPU could be parametrized to do it in chunks/tiles, so the uploadTexture bits could become non-blocking. As imagined in the SO post, we could usetexSubImage2D
/copyTextureToTexture
to upload small chunks that fit in a 16 ms time window (if aiming for 60 fps). This way, the update/render loops would still be responsive, and texture would be loading incrementally until completely uploaded to GPU.This could be parametrized for example by the tile count or resolution (NxM tiles or XxY pixels wide).
Alternatives
At the moment, we are using glb/gltf meshes, with draco mesh compression and avif texture compression (lower bandwidth impact). Switching to ktx2 texture compression yields textures which have ~10x lower gpuSize according to
gltf-transform inspect
(although ~2-3x larger on disk), which should proportionally reduce uploadTexture duration.Additional context
No response
The text was updated successfully, but these errors were encountered: