Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle non-mandatory Empty Elements with default values correctly #72

Open
mbunkus opened this issue Feb 1, 2021 · 8 comments
Open
Labels

Comments

@mbunkus
Copy link
Contributor

mbunkus commented Feb 1, 2021

In section "6.1. Data Size Format" the EBML RFC states:

If an Empty Element has a default value declared, then the EBML Reader MUST interpret the value of the Empty Element as the default value.

At the moment this simply isn't done. For example, the EbmlUInteger::ReadData function always uses 0 if it's an Empty Element. This works for elements whose default value is 0, of course, but not for others.

BTW: the EbmlUInteger::RenderData function doesn't take default values into account either. To be honest, I prefer it that way as it will be less confusing to not fully spec-compliant EBML readers (such as libebml in its current state).

@mbunkus mbunkus added the bug label Feb 1, 2021
@mbunkus
Copy link
Contributor Author

mbunkus commented Feb 1, 2021

EbmlUInteger was just an example. All the other data type classes don't handle that case either.

@robUx4
Copy link
Contributor

robUx4 commented Feb 3, 2021

I can confirm this bug.
Luckily it is fixable without changing the ABI. There is DefaultISset() to tell if an element has a default value. And when the size is zero we should use that default value DefaultValue instead of 0/"".

It seems libebml2 (used in mkclean and mkvalidator) has the same issue with a twist, it's possible to set elements to their default value when they are created. Except it's not used...

@robUx4
Copy link
Contributor

robUx4 commented Feb 3, 2021

Matroska elements affected by this bug:

  • \Segment\Tracks\TrackEntry\Language
  • \Segment\Tracks\TrackEntry\Video\Colour\MatrixCoefficients
  • \Segment\Tracks\TrackEntry\Video\Colour\TransferCharacteristics
  • \Segment\Tracks\TrackEntry\Video\Colour\Primaries
  • \Segment\Cues\CuePoint\CueTrackPositions\CueBlockNumber
  • \Segment\Cues\CuePoint\CueTrackPositions\CueReference\CueRefNumber
  • \Segment\Tags\Tag\Targets\TargetTypeValue

Elements with a default value of 0 are not affected.

@robUx4
Copy link
Contributor

robUx4 commented Feb 7, 2021

BTW, are there files existing with a Length of 0 (to save space) ? Does mkvmerge use this feature for example ? It doesn't seem possible with libebml, even though the EbmlElement::SizeLength variable is initialized to 0. That variable is used exclusively with CodedSizeLength(), even in libmatroska.

In that function it's used as

  if (SizeLength > 0 && CodedSize < SizeLength) {
    // defined size
    CodedSize = SizeLength;
  }

Where CodedSize is always 1 and 5. So a EbmlElement::SizeLength value of 0 has no effect here. And the function never returns 0.

@mbunkus
Copy link
Contributor Author

mbunkus commented Feb 7, 2021

mkvmerge does not use that feature. I don't know if there's any implementation that does. I'm not even sure if the original specs even say what value to use in this case. My guess is that all existing implementations simply return 0 for numeric types & an empty string for string types. Just like libebml.

I haven't seen a single file using this shortcut for elements-with-default, at least not consciously. I have very vague memory of seeing files which wanted to store 0 / the empty string, though I really don't have any more details on that.

@robUx4
Copy link
Contributor

robUx4 commented Feb 7, 2021

Calling out @dericed who has a huge corpus of MKV files he can check.
@mkver do you know if libavformat uses the "zero length" feature when muxing ?

Our website never explained how it's supposed to be used, nor the dedicated libebml/EBML website.

@mkver
Copy link

mkver commented Feb 7, 2021

  1. The original specs disallow writing integers and floats on zero bytes:

The known basic types are:

  • Signed Integer - Big-endian, any size from 1 to 8 octets
  • Unsigned Integer - Big-endian, any size from 1 to 8 octets
  • Float - Big-endian, defined for 4 and 8 octets (32, 64 bits)

This is good as it means that this new feature can't break old valid files (at least for integers and floats).
2. libavformat doesn't write floats or integers on zero bytes. But it might write a zero-length element for a string/utf-8 string with strlen == 0 or for a binary element of size zero. Our demuxer btw also treats zero-length integers as having the value zero. I intend to work on this in the coming days.
3. I might be the reason for mbunkus' vague memory: I wanted to (and I still want to) create attachments with an empty (i.e. strlen == 0) filename if the source does not provide a filename because otherwise remuxing attached pictures from containers that don't have the concept of a filename for attached pictures fails. The relevant patchset is 95% finished, but for about a year I am too lazy/busy to finish it.

@robUx4
Copy link
Contributor

robUx4 commented Feb 14, 2021

So that should leave integers out of this issue. We can assume they were never written with a 0 length. It may not be the case for all muxers but they're probably not read correctly with well known demuxers.

That leaves \Segment\Tracks\TrackEntry\Language as the only element that can potentially have issues. I know in VLC we assign the default value to the variable we read it from. If the size is 0 that value wouldn't change.
In libavformat it seems to be similar. The default value is first assigned and then the actual data read.
This element being such an important element I can hardly imagine a demuxer would fail to use its default value properly when reading, especially since in many cases the value is not written. And it's highly unlikely a length of 0 would be used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants