You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I find myself in need of such a method to determine how many bytes in a UTF-8 string when iterating over bytes. Following RFC 3629, we can implement something like utf8.RuneStartLen(b byte) int.
Zig and Rust have these implemented to provide this functionality. Go could have something like this to do the same.
// RuneStartLen reports the number of bytes an encoded rune will have. It// returns a value between 1-4, or -1 if the byte is not a valid UTF-8 first// byte.funcRuneStartLen(bbyte) int {
ifb<=0b0111_1111 { // 0x00-0x7Freturn1
} elseifb>=0b1111_0000 { // 0xF0-0xF7return4
} elseifb>=0b1110_0000 { // 0xE0-0xEFreturn3
} elseifb>=0b1100_0000 { // 0xC0-0xDFreturn2
}
return-1
}
The text was updated successfully, but these errors were encountered:
seankhliao
changed the title
proposal: utf8: given the first byte, determine how many bytes in the UTF-8 string
proposal: utf8: RuneStartLen to get the length of the rune from the first byte
Aug 4, 2024
This is a reasonable function, but it is rarely needed except by clients that are doing something unusually sophisticated, and it's a trivial consequence of the four constants that appear in the compact pictorial summary of UTF-8 found in any document on the subject--especially if you simplify each else if cond1 && cond2 to else if cond2. (Each first condition is trivially true as a consequence of the control flow.)
Proposal Details
I find myself in need of such a method to determine how many bytes in a UTF-8 string when iterating over bytes. Following RFC 3629, we can implement something like
utf8.RuneStartLen(b byte) int
.Zig and Rust have these implemented to provide this functionality. Go could have something like this to do the same.
The text was updated successfully, but these errors were encountered: