-
-
Notifications
You must be signed in to change notification settings - Fork 4
char
In Angle a char
is a unicode character represented as char32_t ≈ uint32.
This is different to c, where char is an ascii byte.
See string for character sequences
because there are many different aspects in the size of characters sequences, angle has different
ways of counting and iteration :
count bytes of "abc" == 3
count chars of "aβc" == 3
count graphemes of "
We intentionally boycott control codes that influence the color/appearence of characters, like
🫲 ≈ 🫲🏻
Of cause these can appear in angle strings, just don't use these where string indexing or manipultation is required. If you make use of these and rely on safe string manipultation, please use a third party library or wait for the grapheme iterator to be implemented.
Internally byte chars are known and used for optimization:
Bracket indexing is as a general rule close to metal:
"abc"[1]='B' manipultates the byte sequence blindly, whereas "abc"#2='β' sets the character unicode safely.
"abc"[1]='β' should give a strong compiler warning!
Likewise "aβc"#2 == 'β' is safe but "aβc"[3] can yield unexpected results.
The internal representation of strings as utf-8 sequence or char32_t sequence should be completely oblique to users/developers except for bare metal indexing.
Remeber:
In angle, char is shorthand for utf-8 character ( codepoint ), different to unsigned int of 8 bit == byte (historically ascii-char with ill defined 0x80-0xFF latin ... range)