Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flb_utils_write_str: omit unicode with malform trailing bytes #7

Open
wants to merge 1 commit into
base: 1.8
Choose a base branch
from

Conversation

matthewfala
Copy link
Owner

Previously with unicode byte sequences such as

0xef 0xbf 0x00 ...

Fluent Bit would blindly trust the first unicode byte 0xef to describe
how many valid trailing unicode bytes to copy.
If a trailing unicode byte is invalid, such as 0x00, the null character,
the utility blindly copied this to the escaped string.

This commit adds checks for trailing byte utf-8 compliance.
If invalid, the character is omitted from the escaped string.

Signed-off-by: Matthew Fala [email protected]


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

Documentation

  • Documentation required for this feature

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@matthewfala matthewfala force-pushed the cloudwatch-serialization-exception branch 5 times, most recently from 0634391 to 184a49b Compare November 12, 2021 02:08
@matthewfala matthewfala force-pushed the cloudwatch-serialization-exception branch 2 times, most recently from f097f40 to 6e0fe88 Compare November 17, 2021 22:49
Previously with unicode byte sequences such as

   0xef 0xbf 0x00 ...

Fluent Bit would blindly trust the first unicode byte 0xef to describe
how many valid trailing unicode bytes to copy.
If a trailing unicode byte is invalid, such as 0x00, the null character,
the utility blindly copied this to the escaped string.

This commit adds checks for leading and trailing byte utf-8 compliance.
If invalid, the ill-formed character's bytes are individually mapped to
private use area [U+E000 to U+E0FF] preserving ill-formed character data
in a compact and safe utf-8 friendly format.

Signed-off-by: Matthew Fala <[email protected]>
@matthewfala matthewfala force-pushed the cloudwatch-serialization-exception branch from da01c17 to 8eb5c8c Compare November 18, 2021 19:37
@github-actions
Copy link

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant