Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KHR_texture_basisu: Basis Universal for material textures #1751

Merged
merged 9 commits into from
Oct 1, 2020
Merged

Conversation

lexaknyazev
Copy link
Member

@lexaknyazev lexaknyazev commented Jan 30, 2020

Following up from #1612.

Now, KHR_texture_basisu extension is self-contained and should be ready for implementation.

Implementation note: Basis Universal supercompression is optimized for correlated color data, like natural images. It does not perform well on images with independent channels, like normal or some material maps. See the discussion in #1682 for more details.

Open questions before ratification

  • Can alpha values be premultiplied? (It's possible to set this flag in KTX2). Not premultiplied, per core spec.
  • How to handle normal maps? UASTC for best results now, possible future work out of scope.
  • How to handle roughness-metallic data that is expected to use green and blue channels? Out of scope.
  • Does KHR_texture_basisu allow zstd, zlib, or rely on file-level compression? Allow zstd, but not required.
  • Texture size restrictions? Require 4x4, suggest power of 2.

Ecosystem action items

See #1771.

@donmccurdy
Copy link
Contributor

Can alpha values be premultiplied? (It's possible to set this flag in KTX2).

/cc #1457

How to handle roughness-metallic data that is expected to use green and blue channels?

The only resource I've found was this one, claiming that ORM packed as RGB is plausible with DXT especially given that the metallic channel doesn't require much precision. But realistically, confirming that will require some investigation by an artist.

@lexaknyazev
Copy link
Member Author

ORM packed as RGB is plausible with DXT

Color endpoints have more degrees of freedom in DXT than in ETC1S. Even a simple case of e.g. using only metallic while keeping roughness on a single value may cause severe artifacts with ETC1S. Typical advice is to do 2x upscale before compression. It helps sometimes.

@donmccurdy donmccurdy changed the title Basis Universal for material textures KHR_texture_basisu: Basis Universal for material textures Feb 26, 2020
@polarathene
Copy link

How to handle normal maps

Unless there is comparable compressed size, some might be ok with the lower quality. Two component/channel at least seems worthwhile, and is what basisu officially advises on their github README when dealing with normal maps.

Since the last activity on this discussion, basisu also has released UASTC support trading off compression for better quality textures, which is meant to be similar to BC7.

Allowing for flexibility to deliver lower quality textures for lower bandwidth can be useful if quality doesn't take too big of a dive.

Even a simple case of e.g. using only metallic while keeping roughness on a single value may cause severe artifacts with ETC1S.

Could that be worked around the same as is advised for normalmaps with the separation to RGB and A slices option basisu offers? KTX swizzling can then provide the data to the GB format that glTF expects?

Typical advice is to do 2x upscale before compression. It helps sometimes.

On mobile devices, having a texture that is too large(unsupported by the device / ES version) can result in no texture rendered at all(ends up rendering as black iirc). Upscaling before compression would trigger that same issue at runtime?(I think in my experience it was 8k or 16k textures in sketchfab) Or does KTX/Basis have a way to avoid it?(downscales or uses mipmaps instead of failing?)

@donmccurdy
Copy link
Contributor

Whether UASTC and Basis Universal should be bundled into the same glTF extension, or not, is an important question. Some things we know:

  1. Basis Universal textures are smaller files. Decreasing the quality settings of a UASTC file does not match Basis Universal in terms of compactness.
  2. UASTC is higher quality. Increasing quality settings of Basis Universal encoding will not match UASTC in terms of quality.
  3. Given (1) and (2), there are clear benefits with both formats. However, supporting both requires larger transcoders and more client implementation complexity.

Some things we don't know:

  • How large are the transcoders for each? Does it make sense to create a combined transcoder, and if so how large would it be?
  • Early estimates on ZSTD decoder put it at ~890kb, although @MarkCallow has suggested that could likely be reduced. That decoder will be necessary for client implementations of UASTC, in addition to the transcoder.
  • Will additional universal texture formats appear over the upcoming years, and when?

My opinion on this — which I'll readily admit is skewed toward support on the Web — is that Basis Universal and UASTC should be supported by separate glTF extensions, and that we should focus on getting KHR_texture_basisu completed first. I believe that web developers will prioritize transmission size over GPU memory footprint, and that Basis Universal strikes a balance that will make it more common on the web. For cases where the quality of Basis Universal is insufficient, I don't want to predict yet when developers will choose UASTC (lower GPU memory cost & upload) vs. uncompressed PNG/JPG (lower transmission size). I'd prefer to allow time for that to become clear, rather than trying to foresee it.

Thoughts?

@zeux
Copy link
Contributor

zeux commented Apr 8, 2020

How large are the transcoders for each? Does it make sense to create a combined transcoder, and if so how large would it be?

Combined etc1s + uastc transcoder including support for BC7 et al is ~230 KB post-deflate; ETC1S only transcoder is ~180 KB post-deflate. I haven't looked closely into the size impact inside ETC1S and how much code is shared (e.g. whether a separate UASTC transcoder is ~50KB or more like ~150 KB).

Early estimates on ZSTD decoder put it at ~890kb, although @MarkCallow has suggested that could likely be reduced. That decoder will be necessary for client implementations of UASTC, in addition to the transcoder.

I really feel like this is a problem in KTX2 that should be addressed. ZSTD is stronger than deflate but it's not so much stronger that it should be required in contexts where deflate is already available. Previously the KTX2 spec suggested that only Basis support was required, but this has changed and while I understand the reasoning somewhat, it doesn't seem right to me.

@MarkCallow
Copy link
Collaborator

It is not at all clear the PNG will be smaller than UASTC/zstd. More likely it will be larger. In one experiment I did I took a PNG file, inflated it then deflated it with zstd. The result was smaller than the PNG file, despite lacking any of the scanline adjustments PNG does to improve LZ efficiency. UASTC files will certainly be smaller than the inflated PNG equivalent, giving a better starting point for zstd.

@zeux
Copy link
Contributor

zeux commented Apr 8, 2020

I really feel like this is a problem in KTX2 that should be addressed

Having said this I tried to quickly build zstd's single-file decoder a few weeks ago (available in contrib repo) and the resulting wasm binary was much smaller than 900KB. I didn't try running the resulting file so I'm now doubting my results - how was that estimate obtained?

@MarkCallow
Copy link
Collaborator

MarkCallow commented Apr 8, 2020

Early estimates on ZSTD decoder put it at ~890kb

What I said was that the size of libzstd.a (release version) is ~890kb. This will have the decoder and encoder and symbols for linking. I built with the default options except for omitting older versions. I have not yet made any attempt to minimize it.

I really feel like this is a problem in KTX2 that should be addressed.

Rich is of the opinion that Zstd will provide much better results with UASTC than Deflate. We don't want to support a large number of supercompression options so we chose zstd. We absolutely need some supercompression with UASTC.

@zeux
Copy link
Contributor

zeux commented Apr 8, 2020

This might be relevant, this is using single-slice ETC1S for all textures so naturally the quality of ETC1S is inferior. These are the results:

jrat: 12 2kx2k textures (64M pixels, including mipmaps)
etc1s: encode 67s, .glb 4.9 MB, .glb.gz 4.8 MB, .glb.zst 4.8 MB
uastc: encode 54s, .glb 67.5 MB, .glb.gz 37.0 MB, .glb.zst 33.5 MB
uastc rdo: encode 634s, .glk 67.5 MB, .glb.gz 26.6 MB, .glb.zst 24.1 MB

So maybe zstd is fine to include. The decoder I've built is 25 KB WASM (10 KB post-deflate), so it's insignificant compared to the gains.

edit Forgot to mention, the source data for this model is PNG (which doesn't contain mipmaps), the glb sizes for the asset with PNG data are ~32.2 MB regardless of the compression type. So UASTC pre-RDO is slightly larger than PNG here (but includes mipmap data), and RDO is noticeably smaller.

@lexaknyazev
Copy link
Member Author

in contexts where deflate is already available

I like this approach for UASTC. On the web, HTTP layer can provide gzip/deflate without any JS intervention. KTX2 header would indicate no supercompression in such a case. Generic clients should still support zstd though.

@MarkCallow
Copy link
Collaborator

On the web, HTTP layer can provide gzip/deflate without any JS intervention. KTX2 header would indicate no supercompression in such a case. Generic clients should still support zstd though.

This will definitely be an option for app developers. I plan to make selection of zstd compression a separate option from selection of UASTC in toktx. Before possibly mandating this approach for glTF extension(s) though we need to gain experience with UASTC/zstd vs UASTC/Deflate.

@donmccurdy
Copy link
Contributor

@zeux following up on #1751 (comment), is there a zstd decoder you would recommend for the web (js or wasm)?

@zeux
Copy link
Contributor

zeux commented Jul 2, 2020

@donmccurdy I believe this was just building the single-file decoder from https://github.com/facebook/zstd/tree/dev/contrib/single_file_libs with Emscripten to wasm using -Os and standalone .wasm mode

@MarkCallow
Copy link
Collaborator

The "single-file decoder" link is actually a script that combines the sources into a single file. I've already done that and use it in the web build of libktx. You can find it in lib/zstddeclib.c in the KTX-Software repo. On other builds we try to use the platform's libzstd.

This could easily be included in the web builds for msc_basis_transcoder. The missing piece for a JS KTX parser is a JS wrapper so the JS can call it. @zeux what is standalone .wasm mode?

@zeux
Copy link
Contributor

zeux commented Jul 2, 2020

@MarkCallow https://github.com/emscripten-core/emscripten/wiki/WebAssembly-Standalone - this effectively means not using any Emscripten-generated JS code. It requires manually implementing a JS interface though; I mentioned this in context of size comparisons, meaning that JS code used for the JS->WASM interface may have additional cost that I wasn't counting.

@donmccurdy
Copy link
Contributor

To do a full rescale you'd need to be able to decode UASTC & ETC1S into RGBA, rescale the RGBA image to fit to power of two, and reencode the resulting RGBA image to BCn / ETCn.

At least for three.js, I don't think we'd want that behavior in a transcoder. Would prefer to decode to RGBA, perhaps at lower resolution, and use that directly on these devices without re-encoding.

@lexaknyazev
Copy link
Member Author

To sum up the today's call, it seems like we've got two options here:

  • Require power-of-two dimensions for maximum compatibility.
  • Require multiples of 4 and use slow path on WebGL 1.0 (ES 2.0) clients: decode to uncompressed, rescale, regenerate mipmaps, upload as uncompressed.

@donmccurdy
Copy link
Contributor

donmccurdy commented Aug 29, 2020

To restate the reasons for each option —

(1) requiring power-of-two dimensions avoids the slow path on WebGL 1.0 (ES 2.0)
(2) requiring multiples-of-four avoids the slow path on D3D hardware

Were there any other hardware cases we are concerned about? I see PVRTC mentioned a few times above — do one or both of those restrictions also apply to any hardware where PVRTC is the only transcode target?

To hopefully strike a balance of supporting modern hardware and not overly restricting future options, I'd be inclined to require (2), and provide an implementation note — not a requirement — about (1).

@MarkCallow
Copy link
Collaborator

do one or both of those restrictions also apply to any hardware where PVRTC is the only transcode target?

No. 1 but even Imagination considers PVRTC to be deprecated.

@donmccurdy
Copy link
Contributor

Proposals from today's call:

  1. Allow zstd (not zlib) as the embedded compression option with UASTC, but do not require its use.
  2. Require multiple-of-four dimensions, but not power-of-two. Requiring power-of-two would impose size/quality tradeoffs on modern hardware. Leaving it optional means less capable devices on WebGL 1.0 will sometimes have to fully decode the texture, but this can be avoided when necessary by opting into the power-of-two dimensions.

@lexaknyazev
Copy link
Member Author

lexaknyazev commented Sep 4, 2020

Lifting the power-of-two restriction leads to an interesting question about the required number of mip levels.

GLES 2.0 recap:

  • npot textures do not support REPEAT or MIRRORED_REPEAT wrapping modes (Section 3.8.2);
  • mipmapping requires power-of-two dimensions and a full mip pyramid (Sections 3.7.10 and 3.8.2);

This means that there's no reason to require a full mip pyramid for npot textures.

DCC would have two export options:

  • Default (ES 3.0+, WebGL 2.0)
    • any non-zero dimensions, any number of mip levels
  • Legacy
    • power-of-two textures with all mip levels present

Engines running on ES 2.0 / WebGL 1.0 would need to fall back to uncompressed textures in some cases.

@donmccurdy
Copy link
Contributor

This means that there's no reason to require a full mip pyramid for npot textures.

Or WebGL 1.0 doesn't provide a reason, anyway... Are there other reasons to consider? In ES3.0+ and WebGL 2.0, we still can't generate the mip pyramid at runtime, so if engines need a full pyramid (even though their hardware does not) that may be an argument for requiring the full pyramid. At least for relevant minFilter/magFilter states.

@zeux
Copy link
Contributor

zeux commented Sep 9, 2020

Are there other reasons to consider?

In general the full pyramid should be strongly recommended for encoders; however, there are some fringe use cases where it's important to drop a few smallest mips (at a possible cost of some aliasing) to prevent blending from occurring. For example, in cases where multiple images are combined into an atlas, it may be important to remove all mip levels where different images are represented by the same pixel (or, alternatively, same block). Similarly, when packing lightmaps, to avoid bleeding you would typically need to include a 2^k-px border for k mips, so there's some tradeoff between aliasing vs wasted space in the texture.

So overall as long as ES2-class hardware isn't critical to support, it seems best to recommend the use of a full mipchain pyramid, but allow partial pyramids with an understanding that decoders will not generate one (short of ES2 where partial mipchain would be one of the reasons to fall back into slower RGBA decompression mode)

@lexaknyazev lexaknyazev marked this pull request as ready for review September 16, 2020 17:03
Copy link
Contributor

@bghgary bghgary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments. Looks good otherwise.

extensions/2.0/Khronos/KHR_texture_basisu/README.md Outdated Show resolved Hide resolved
Comment on lines +18 to +20
This extension adds the ability to specify textures using KTX v2 images with Basis Universal supercompression. An implementation of this extension can use such images as an alternative to the PNG or JPEG images available in glTF 2.0 for more efficient asset transmission and reducing GPU memory footprint. Furthermore, specifying mip map levels is possible.

When the extension is used, it's allowed to use value `image/ktx2` for the `mimeType` property of images that are referenced by the `source` property of `KHR_texture_basisu` texture extension object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to add links for Basis Universal and image/ktx2 in case people don't know what they are?

For Basis Universal, maybe the GitHub page?
For image/ktx2, maybe this link?

extensions/2.0/Khronos/KHR_texture_basisu/README.md Outdated Show resolved Hide resolved

## Resources

[KTX File Format Specification, version 2](https://github.khronos.org/KTX-Specification/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we allowing (and normatively referencing) KTX 2.0, or KTX 2.x? The page uses "KTX v2" throughout, so if 2.0 is the intended meaning, we may want to be more specific.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be 2.x so just "v2" is fine. The detailed requirements that have just been added are enough to avoid any issues from new supercompression types or universal formats that may be added in a 2.X.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In contexts referring to 2.0, do we plan to write KTX v2.0 or KTX 2.0? We should try to be consistent here with KTX v2 or KTX 2.

Maybe it's just me, but "v2" feels ambiguous in a way that 2.x would not... it might be worth specifying? But either way.

extensions/2.0/Khronos/KHR_texture_basisu/README.md Outdated Show resolved Hide resolved
@emackey
Copy link
Member

emackey commented Oct 1, 2020

I took a stab at adding the "Contributors" section. Please let me know if I left anyone out, folks can always be added there without resetting the clock on ratification.

I'm merging this since the vote passed and it's entering ratification review.

@emackey emackey merged commit 08a156f into master Oct 1, 2020
@emackey emackey deleted the basisu branch October 1, 2020 18:46
@aaronfranke aaronfranke mentioned this pull request Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants