flb_utils_write_str: omit unicode with malform trailing bytes #7

matthewfala · 2021-11-09T02:59:39Z

Previously with unicode byte sequences such as

0xef 0xbf 0x00 ...

Fluent Bit would blindly trust the first unicode byte 0xef to describe
how many valid trailing unicode bytes to copy.
If a trailing unicode byte is invalid, such as 0x00, the null character,
the utility blindly copied this to the escaped string.

This commit adds checks for trailing byte utf-8 compliance.
If invalid, the character is omitted from the escaped string.

Signed-off-by: Matthew Fala [email protected]

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

Example configuration file for the change
Debug log output from testing the change

Attached Valgrind output that shows no leaks or memory corruption was found

Documentation

Documentation required for this feature

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Previously with unicode byte sequences such as 0xef 0xbf 0x00 ... Fluent Bit would blindly trust the first unicode byte 0xef to describe how many valid trailing unicode bytes to copy. If a trailing unicode byte is invalid, such as 0x00, the null character, the utility blindly copied this to the escaped string. This commit adds checks for leading and trailing byte utf-8 compliance. If invalid, the ill-formed character's bytes are individually mapped to private use area [U+E000 to U+E0FF] preserving ill-formed character data in a compact and safe utf-8 friendly format. Signed-off-by: Matthew Fala <[email protected]>

github-actions · 2022-02-17T02:03:17Z

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

matthewfala force-pushed the cloudwatch-serialization-exception branch 5 times, most recently from 0634391 to 184a49b Compare November 12, 2021 02:08

matthewfala force-pushed the cloudwatch-serialization-exception branch 2 times, most recently from f097f40 to 6e0fe88 Compare November 17, 2021 22:49

matthewfala force-pushed the cloudwatch-serialization-exception branch from da01c17 to 8eb5c8c Compare November 18, 2021 19:37

github-actions bot added the Stale label Feb 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flb_utils_write_str: omit unicode with malform trailing bytes #7

flb_utils_write_str: omit unicode with malform trailing bytes #7

matthewfala commented Nov 9, 2021

github-actions bot commented Feb 17, 2022

flb_utils_write_str: omit unicode with malform trailing bytes #7

Are you sure you want to change the base?

flb_utils_write_str: omit unicode with malform trailing bytes #7

Conversation

matthewfala commented Nov 9, 2021

github-actions bot commented Feb 17, 2022