-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
utils: detect and replace ill-formed utf-8 bytes (#4346)
Previously with unicode byte sequences such as 0xef 0xbf 0x00 ... Fluent Bit would blindly trust the first unicode byte 0xef to describe how many valid trailing unicode bytes to copy. If a trailing unicode byte is invalid, such as 0x00, the null character, the utility blindly copied this to the escaped string. This commit adds checks for leading and trailing byte utf-8 compliance. If invalid, the ill-formed character's bytes are individually mapped to private use area [U+E000 to U+E0FF] preserving ill-formed character data in a compact and safe utf-8 friendly format. Signed-off-by: Matthew Fala <[email protected]>
- Loading branch information
1 parent
bf0f0d2
commit 861af37
Showing
2 changed files
with
319 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters