Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIx handling of broken UTF-8 data #180

Closed
robUx4 opened this issue Dec 23, 2023 · 2 comments · Fixed by #184
Closed

FIx handling of broken UTF-8 data #180

robUx4 opened this issue Dec 23, 2023 · 2 comments · Fixed by #184
Labels
api-break breaks the API (e.g. programs using it will have to adjust their source code) bug

Comments

@robUx4
Copy link
Contributor

robUx4 commented Dec 23, 2023

Right now UTFstring doesn't keep/check the validity of its data. It discards invalid_code_point and invalid_utf8 exceptions.

When reading it means the string might not be the one that was expected or the file is broken. We should be able to know if we can trust the data read/converted.

When writing that means the given unicode strings cannot be converted to UTF-8 for some reason. We won't be able to store it accurately in EBML. So we should at least know when calling UTFstring::SetUTF8().

@robUx4 robUx4 added bug api-break breaks the API (e.g. programs using it will have to adjust their source code) labels Dec 23, 2023
@robUx4
Copy link
Contributor Author

robUx4 commented Dec 23, 2023

For example these lines crash libebml with an unhandled exception, the utf8 libraries throws a not_enough_room:

    UTFstring utf8;
    utf8.SetUTF8( "\x1\xF6\x00" );

@robUx4
Copy link
Contributor Author

robUx4 commented Dec 23, 2023

Also the internal storage should be UTF-8. So when copying unicode EBML elements they are not interpreted. bogus data in, bogus data out with the same size.

robUx4 added a commit to robUx4/libebml that referenced this issue Dec 26, 2023
If the string buffer comes from an EBML file we should be able to copy the same
buffer without interpreting it, even if it's bogus. It may be intentional.

Fixes Matroska-Org#180
robUx4 added a commit to robUx4/libebml that referenced this issue Dec 26, 2023
If the string buffer comes from an EBML file we should be able to copy the same
buffer without interpreting it, even if it's bogus. It may be intentional.

Fixes Matroska-Org#180
robUx4 added a commit that referenced this issue Dec 27, 2023
If the string buffer comes from an EBML file we should be able to copy the same
buffer without interpreting it, even if it's bogus. It may be intentional.

Fixes #180
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-break breaks the API (e.g. programs using it will have to adjust their source code) bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant