Skip to content
pannous edited this page Mar 12, 2024 · 11 revisions

The exposed string type is near identical to the internal string type: Both are a lightweight classes pointing to utf8 arrays, together with some meta information such as length.

different aspects of Unicode

Strings can be queried for their length in bytes chars graphemes and codepoint1:

text="你好👋"
print size of text in bytes
print length of text in chars
print count codepoint1 in text

size defaults to byte length, length defaults to codepoint1 (todo: graphemes). otherwise size length and count are identical and can be used to access all aspects

iteration

One can similarly iterate over a strings bytes chars graphemes and codepoint as such:

for x in text: ... untyped x defaults to chars otherwise use

text="你好👋"
for bytes in text: print it
for chars in text: print it
for graphemes in text: print it
for codepoint1 in text: print it

Note that for extremely long texts, getting the different representations can become computationally expensive, but it is done only once internally.

Other built in iterator types:

for word in text: print it
for line in text: print it

custom iterators could be implemented such as:

for paragraph in text: print it 
for Page in text: print it
for letter in text: print it
for syllable in word: print it
for letter in text: print it
for icon in text: print it

with the following dispatch paragraphs of text:= split text by "<p>" // heuristics;)

that's it because for loops over any list (here returned by plural aware property selection 'in')

Full unicode support ... not

Unicode data is ~1 MB. It's needed to implement all the unicode-aware string functions like strings.EqualFold()

We may get away with some smart workarounds that cover 99.99 % of the use cases

templates interpolation

template string are those which contain dynamic content which can be evaluated ad hoc or later:

"Hello $friend"
"It took ${time - start}" milliseconds.
$"{name} is {now as year - born} years old"

Whether a string is a template is checked at construction together with the length at no additional costs. Since unicode offers multiple quotation brackets some of them can be used for special templating behavior.

To avoid accidental interpolation use single quotes '$100' or special quotes “no $interpolation" ⚠️ $0 is a special keyword to access the first argument in functions (same for $1 … ) ⚠️ TODO also use in interpolation? Conflict with zero dollars!

Home

Philosophy

data & code blocks

features

inventions

evaluation

keywords

iteration

tasks

examples

todo : bad ideas and open questions

⚠️ specification and progress are out of sync

Clone this wiki locally