-
-
Notifications
You must be signed in to change notification settings - Fork 4
string
The exposed string type is near identical to the internal string type: Both are a lightweight classes pointing to utf8 arrays, together with some meta information such as length.
Strings can be queried for their length in bytes chars graphemes and codepoint1:
text="你好👋"
print size of text in bytes
print length of text in chars
print count codepoint1 in text
size defaults to byte length, length defaults to codepoint1 (todo: graphemes).
otherwise size length and count
are identical and can be used to access all aspects
One can similarly iterate over a strings bytes chars graphemes and codepoint as such:
for x in text: ...
untyped x defaults to chars otherwise use
text="你好👋"
for bytes in text: print it
for chars in text: print it
for graphemes in text: print it
for codepoint1 in text: print it
Note that for extremely long texts, getting the different representations can become computationally expensive, but it is done only once internally.
Other built in iterator types:
for word in text: print it
for line in text: print it
custom iterators could be implemented such as:
for paragraph in text: print it
for Page in text: print it
for letter in text: print it
for syllable in word: print it
for letter in text: print it
for icon in text: print it
with the following dispatch
paragraphs of text:= split text by "<p>" // heuristics;)
that's it because for
loops over any list
(here returned by plural aware property selection 'in')
Unicode data is ~1 MB. It's needed to implement all the unicode-aware string functions like strings.EqualFold()
We may get away with some smart workarounds that cover 99.99 % of the use cases
template string are those which contain dynamic content which can be evaluated ad hoc or later:
"Hello $friend"
"It took ${time - start}" milliseconds.
$"{name} is {now as year - born} years old"
Whether a string is a template is checked at construction together with the length at no additional costs. Since unicode offers multiple quotation brackets some of them can be used for special templating behavior.
To avoid accidental interpolation use single quotes '$100' or special quotes “no $interpolation"