In this part of the series,
we add support for strings to EasyScript,
our simplified JavaScript implementation.
To make them fully useful, we also add static
(meaning, without the support for object inheritance)
methods, demonstrated on the example of the
charAt()
string method.
We support string literals starting with '
and "
(we do not support literals starting with `
,
as we don't want to deal with string interpolation -
since ${a}
can be transformed to '' + a + ''
at parse time,
string interpolation doesn't require any support in the interpreter,
but it complicates parsing significantly)
by adding a new string_literal
production to the literal
non-terminal in the
ANTLR grammar for EasyScript.
In the parser code,
we make sure to handle escape sequences with backslashes,
so that character pairs like \'
are turned to just '
.
We do that with the StringEscapeUtils.unescapeJson()
method
from the Apache Commons Text library,
which we add as a dependency to Gradle.
We will use the TruffleString
class
provided by Truffle to represent strings at runtime.
Given that, our string literal Node
simply creates an instance of that class from a Java string that we get from the parser.
We introduce a helper class, EasyScriptTruffleStrings
,
that contains static utility methods that reduce duplication when working with TruffleString
s
by centralizing things like the encoding used by the language.
Because of the introduction of strings, we need to modify a few of the existing expression nodes:
- The
EasyScriptExprNode
class needs changes in theexecuteBool()
method, as empty strings are considered "falsy" in JavaScript. - The
EqualityExprNode
class and theInequalityExprNode
class need a new specialization forTruffleString
s, as strings can be compared with===
and!==
in JavaScript. - The arithmetic comparison Nodes
(
GreaterExprNode
class,GreaterOrEqualExprNode
class,LesserExprNode
class andLesserOrEqualExprNode
class) all need an extra specialization to handleTruffleString
s, which be compared with operators like>
in JavaScript. - The
AdditionExprNode
class can now represent string concatenation if either of the arguments is complex (complex values in JavaScript are anything other than numbers, booleans,undefined
, andnull
). We add a "fast" specialization when both arguments areTruffleString
s, and a generic specialization for when at least one of the arguments is a complex value. In that specialization, we first coerce its arguments to Java strings using theconcatToStrings()
helper inEasyScriptTruffleStrings
. That helper simply uses thetoString()
method of the arguments passed to it, and because of that, we need to tell Graal partial evaluation to not introspect that method (as thosetoString()
s are fully virtual calls that cannot be statically resolved), which we do with theTruffleBoundary
annotation.
We add support for methods by introducing a new field in the
FunctionObject
class,
methodTarget
. It will be null
for function calls,
but non-null
for method calls.
With that field in place, we have to update the
FunctionDispatchNode
class
to pass the methodTarget
(if it's non-null
)
as the first argument when invoking the call()
method of either DirectCallNode
or IndirectCallNode
,
which we do by modifying the logic inside the existing extendArguments()
method.
The actual implementation of the charAt()
method is in the
CharAtMethodBodyExprNode
class.
It's very similar to the Nodes for the built-in functions,
the only difference is that it expects an extra argument,
which is the string the method was called on.
We use the Shared
annotation
as a small optimization to be able to share the TruffleString
operation Nodes between the two specializations -
which is possible, because these Nodes are stateless.
The CallTarget
for this built-in method is created in the
TruffleLanguage
implementation for this chapter,
similarly to the CallTarget
for the built-in functions,
and stored in a new class, StringPrototype
,
which is made available to Nodes by saving it in the
TruffleLanguage
Context
as a new, public
, field.
Since we're using TruffleString
s,
we don't want to wrap them in a TruffleObject
,
like we did for arrays with ArrayObject
in the
previous part of the series,
as that would negate the performance benefits of using TruffleString
s.
Instead, we have a dedicated Node
class, ReadTruffleStringPropertyNode
,
that implements the logic of reading properties of a TruffleString
.
The Node contains specializations for indexing into the string,
and also reading properties that are strings.
For the charAt
property, it creates a FunctionObject
pointing at the CallTarget
stored in the StringPrototype
available through EasyScriptLanguageContext
.
In order to improve performance, it tries to cache the created FunctionObject
,
but that caching is only valid if the target of the property read stays the same.
If we encounter more than 3 different targets for a given read of charAt
,
we abandon caching, and instead switch to always creating a new FunctionObject
.
With ReadTruffleStringPropertyNode
now in place,
we can use it in the existing property access Nodes.
Since introducing strings to our language now makes it possible to access an object's property in two different ways
(with "direct" access, in code like a.propName
,
and with "indexed" access, in code like a['propName']
),
we create a new class,
CommonReadPropertyNode
,
that contains the common logic of reading a property of an object.
Its first specialization covers the situation where the target of the read is a TruffleString
,
in which case we simply delegate to ReadTruffleStringPropertyNode
,
obtained through the @Cached
annotation,
as it's a stateless Node;
the remaining 3 specializations were moved from the PropertyReadExprNode
class,
as it was in the previous part of the series.
Because of this refactoring,
we can change the
PropertyReadExprNode
class
to simply delegate to CommonReadPropertyNode
.
For indexed property access,
we also use CommonReadPropertyNode
,
this time from the ArrayIndexReadExprNode
class,
but with an important addition:
we introduce specializations that handle the case when the index expression evaluates to a TruffleString
(in code like "a"['length']
) - when that happens,
we need to convert 'length'
from a TruffleString
to a Java string,
which is what CommonReadPropertyNode
expects.
We use the TruffleString.ToJavaStringNode
class
for that purpose.
We make sure to cache the Java String
we create from the TruffleString
,
but if a given indexed property access sees more than two different keys,
we switch to an uncached specialization instead.
We have a simple benchmark
that performs a million string operations in a loop -
in one variant, using direct access, like "abc".length
,
and in the other, an indexed access, like "abc"['length']
.
We run it also for the GraalVM JavaScript implementation, for comparison.
Here are the results I get on my laptop:
Benchmark Mode Cnt Score Error Units
StringLengthBenchmark.count_while_char_at_direct_prop_ezs avgt 5 576.093 ± 5.992 us/op
StringLengthBenchmark.count_while_char_at_direct_prop_js avgt 5 576.772 ± 3.865 us/op
StringLengthBenchmark.count_while_char_at_index_prop_ezs avgt 5 576.813 ± 7.087 us/op
StringLengthBenchmark.count_while_char_at_index_prop_js avgt 5 112404.250 ± 1012.309 us/op
As we can see, there is no difference in performance between indexed and direct property access in EasyScript,
mainly because of the caching we implemented in
ArrayIndexReadExprNode
.
However, indexed property access in the GraalVM JavaScript implementation is over 200
times slower than direct property access.
I've opened an issue about this to the project,
and apparently it's a bug, fixed in GraalVM release 23.1.0
.
In addition to the benchmark, there are some unit tests validating that the strings functionality works as expected.