In the prequel to this article (hint: make sure you've read it first), I showed one way to implement a range operator in PHP. Initial implementations, however, are rarely the best, and so it is the intention of this article to look at how the previous implementation can be improved.
Thanks once again to Nikita Popov for proofreading this article!
The initial implementation put all of the logic for the range operator into the
Zend VM, which forced computation to take place purely at runtime when the
ZEND_RANGE
opcode was executed. This not only meant that computation could
not be shifted to compile time for operands that were literal, but also meant
that some features would simply not work.
In this implementation, we will shift the range operator logic out of the Zend VM to enable for computation to be done at either compile time (for literal operands) or runtime (for dynamic operands). This will not only provide a small win for Opcache users, but will more importantly enable for constant expression features to be used with the range operator.
For example:
// as constant definitions
const AN_ARRAY = 1 |> 100;
// as initial property definitions
class A
{
private $a = 1 |> 2;
}
// as default values for optional parameters:
function a($a = 1 |> 2)
{
//
}
So without further ado, let's reimplement the range operator.
The lexer implementation remains exactly the same. The token is firstly registered in Zend/zend_language_scanner.l (line ~1200):
<ST_IN_SCRIPTING>"|>" {
RETURN_TOKEN(T_RANGE);
}
And then declared in Zend/zend_language_parser.y (line ~220):
%token T_RANGE "|> (T_RANGE)"
The tokenizer extension must again be regenerated by going into the ext/tokenizer directory and executing the tokenizer_data_gen.sh file.
The parser implementation is partially the same as before. We again start by
stating the operator's precedence and associativity by adding the T_RANGE
token onto the end of the following line (line ~70):
%nonassoc T_IS_EQUAL T_IS_NOT_EQUAL T_IS_IDENTICAL T_IS_NOT_IDENTICAL T_SPACESHIP T_RANGE
We then update the expr_without_variable
production rule again, though this
time the semantic action (the code within the curly braces) will be slightly
different. Update it with the following code (I placed it just below the
T_SPACESHIP
rule, line ~930):
| expr T_RANGE expr
{ $$ = zend_ast_create_binary_op(ZEND_RANGE, $1, $3); }
This time, we've used the zend_ast_create_binary_op
function (instead of the
zend_ast_create
function), which creates a ZEND_AST_BINARY_OP
node for us.
zend_ast_create_binary_op
takes an opcode name that will be used to
distinguish binary operations from one-another during the compilation stage.
Since we're reusing the ZEND_AST_BINARY_OP
node type now, there is no need to
define a new ZEND_AST_RANGE
node type as done before in the
Zend/zend_ast.h file.
This time, there is no need to update the Zend/zend_compile.c file since
it already contains the necessary logic
to handle binary operations. Thus, we are simply reusing this logic by making
our operator a ZEND_AST_BINARY_OP
node.
The following is a trimmed version of the zend_compile_binary_op
function:
void zend_compile_binary_op(znode *result, zend_ast *ast) /* {{{ */
{
zend_ast *left_ast = ast->child[0];
zend_ast *right_ast = ast->child[1];
uint32_t opcode = ast->attr;
znode left_node, right_node;
zend_compile_expr(&left_node, left_ast);
zend_compile_expr(&right_node, right_ast);
if (left_node.op_type == IS_CONST && right_node.op_type == IS_CONST) {
if (zend_try_ct_eval_binary_op(&result->u.constant, opcode,
&left_node.u.constant, &right_node.u.constant)
) {
result->op_type = IS_CONST;
zval_ptr_dtor(&left_node.u.constant);
zval_ptr_dtor(&right_node.u.constant);
return;
}
}
do {
// redacted code
zend_emit_op_tmp(result, opcode, &left_node, &right_node);
} while (0);
}
/* }}} */
As we can see, it is pretty similar to the zend_compile_range
function we
created last time. The two important differences are in regards to how the
opcode type is acquired and what happens when both operands are literals.
The opcode type is acquired from the AST node this time (as opposed to being
hardcoded, as seen last time), since the ZEND_AST_BINARY_OP
node stores this
value (as seen from the new production rule's semantic action) to differentiate
between binary operations. When both operands are literals, the
zend_try_ct_eval_binary_op
function will be invoked. This function looks as
follows:
static inline zend_bool zend_try_ct_eval_binary_op(zval *result, uint32_t opcode, zval *op1, zval *op2) /* {{{ */
{
binary_op_type fn = get_binary_op(opcode);
/* don't evaluate division by zero at compile-time */
if ((opcode == ZEND_DIV || opcode == ZEND_MOD) &&
zval_get_long(op2) == 0) {
return 0;
} else if ((opcode == ZEND_SL || opcode == ZEND_SR) &&
zval_get_long(op2) < 0) {
return 0;
}
fn(result, op1, op2);
return 1;
}
/* }}} */
The function obtains a callback from the get_binary_op
function
(source)
in Zend/zend_opcode.c
according to the opcode type. This means we will need to update this function
next to cater for the ZEND_RANGE
opcode. Add the following case statement to
the get_binary_op
function (line ~750):
case ZEND_RANGE:
return (binary_op_type) range_function;
Now we must define the range_function
function. This will be done in the
Zend/zend_operators.c
file alongside all of the other operators:
ZEND_API int ZEND_FASTCALL range_function(zval *result, zval *op1, zval *op2) /* {{{ */
{
zval tmp;
ZVAL_DEREF(op1);
ZVAL_DEREF(op2);
if (Z_TYPE_P(op1) == IS_LONG && Z_TYPE_P(op2) == IS_LONG) {
zend_long min = Z_LVAL_P(op1), max = Z_LVAL_P(op2);
zend_ulong size, i;
if (min > max) {
zend_throw_error(NULL, "Min should be less than (or equal to) max");
return FAILURE;
}
// calculate size (one less than the total size for an inclusive range)
size = max - min;
// the size cannot be greater than or equal to HT_MAX_SIZE
// HT_MAX_SIZE - 1 takes into account the inclusive range size
if (size >= HT_MAX_SIZE - 1) {
zend_throw_error(NULL, "Range size is too large");
return FAILURE;
}
// increment the size to take into account the inclusive range
++size;
// set the zval type to be a long
Z_TYPE_INFO(tmp) = IS_LONG;
// initialise the array to a given size
array_init_size(result, size);
zend_hash_real_init(Z_ARRVAL_P(result), 1);
ZEND_HASH_FILL_PACKED(Z_ARRVAL_P(result)) {
for (i = 0; i < size; ++i) {
Z_LVAL(tmp) = min + i;
ZEND_HASH_FILL_ADD(&tmp);
}
} ZEND_HASH_FILL_END();
} else if ( // if both operands are either integers or doubles
(Z_TYPE_P(op1) == IS_LONG || Z_TYPE_P(op1) == IS_DOUBLE)
&& (Z_TYPE_P(op2) == IS_LONG || Z_TYPE_P(op2) == IS_DOUBLE)
) {
long double min, max, size, i;
if (Z_TYPE_P(op1) == IS_LONG) {
min = (long double) Z_LVAL_P(op1);
max = (long double) Z_DVAL_P(op2);
} else if (Z_TYPE_P(op2) == IS_LONG) {
min = (long double) Z_DVAL_P(op1);
max = (long double) Z_LVAL_P(op2);
} else {
min = (long double) Z_DVAL_P(op1);
max = (long double) Z_DVAL_P(op2);
}
if (min > max) {
zend_throw_error(NULL, "Min should be less than (or equal to) max");
return FAILURE;
}
size = max - min;
if (size >= HT_MAX_SIZE - 1) {
zend_throw_error(NULL, "Range size is too large");
return FAILURE;
}
// we cast the size to an integer to get rid of the decimal places,
// since we only care about whole number sizes
size = (int) size + 1;
Z_TYPE_INFO(tmp) = IS_DOUBLE;
array_init_size(result, size);
zend_hash_real_init(Z_ARRVAL_P(result), 1);
ZEND_HASH_FILL_PACKED(Z_ARRVAL_P(result)) {
for (i = 0; i < size; ++i) {
Z_DVAL(tmp) = min + i;
ZEND_HASH_FILL_ADD(&tmp);
}
} ZEND_HASH_FILL_END();
} else {
zend_throw_error(NULL, "Unsupported operand types - only ints and floats are supported");
return FAILURE;
}
return SUCCESS;
}
/* }}} */
The function prototype contains two new macros: ZEND_API
and ZEND_FASTCALL
.
ZEND_API
is used to control the visibility of functions by making them
available to extensions that are compiled as shared objects. ZEND_FASTCALL
is
used to ensure a more efficient calling convention is used, where the first two
arguments will be passed using registers rather than the stack (more relevant
to 32bit builds than 64bit builds on x86).
The function body is very similar to what we had in the Zend/zend_vm_def.h
file in the previous article. The VM-specific stuff is no longer present,
including the HANDLE_EXCEPTION
macro calls (which have been replaced with
return FAILURE;
), and the ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION
macro calls
have been removed entirely (this check and operation needs to stay in the VM,
and so the macro will be invoked from the VM code later).
Another note-worthy difference is that we're applying ZVAL_DEFEF
to both
operands to ensure that references are handled properly. This was somethign
something that was previously done inside of the VM using the pseudo-macro
GET_OPn_ZVAL_PTR_DEREF
, but has now been shifted into this function. This was
done not because it is needed at compile time (since for compile time handling,
both operands would have to be literals, and they cannot be referenced), but
because it enables for other places inside the codebase to safely invoke
range_function
without having to worry about reference handling. As such,
referencing handling is performed by most of the operator functions instead of
in their VM opcode definition (except where performance matters).
Lastly, we must add the range_function
prototype to the
Zend/zend_operators.h
file:
ZEND_API int ZEND_FASTCALL range_function(zval *result, zval *op1, zval *op2);
Now we must once again update the Zend VM to handle the execution of the
ZEND_RANGE
opcode during runtime. Place the following code in
Zend/zend_vm_def.h (at the bottom):
ZEND_VM_HANDLER(182, ZEND_RANGE, CONST|TMPVAR|CV, CONST|TMPVAR|CV)
{
USE_OPLINE
zend_free_op free_op1, free_op2;
zval *op1, *op2;
SAVE_OPLINE();
op1 = GET_OP1_ZVAL_PTR(BP_VAR_R);
op2 = GET_OP2_ZVAL_PTR(BP_VAR_R);
range_function(EX_VAR(opline->result.var), op1, op2);
FREE_OP1();
FREE_OP2();
ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION();
}
(Again, the opcode number must be one greater than the current highest opcode number, which can be seen at the bottom of the Zend/zend_vm_opcodes.h file.)
The definition this time is far shorter since all of the work is handled in
range_function
. We simply invoke this function, passing in the result operand
of the current opline to hold the computed value. The exception checks and
skipping onto the next opcode that were removed from range_function
are still
handled in the VM by the call to ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION
at the
end. Also, as mentioned previously, we avoid handling references in the VM by
using the GET_OPn_ZVAL_PTR
pseudo-macros instead (rather than
GET_OPn_ZVAL_PTR_DEREF
).
Now regenerate the VM by executing the Zend/zend_vm_gen.php file.
Lastly, the pretty printer needs updating in the Zend/zend_ast.c file once again. Update the precedence table comment by specifying the new operator to have a priority of 170 (line ~520):
* 170 non-associative == != === !== |>
Then, insert a case
statement into the zend_ast_export_ex
function to
handle the ZEND_RANGE
opcode in the ZEND_AST_BINARY_OP
case statement (line
~1300):
case ZEND_RANGE: BINARY_OP(" |> ", 170, 171, 171);
This article has shown an alternative way to implement the range operator, where the computation logic was shifted out of the VM. This had the advantage of being able to use the range operator in constant expression contexts.
The third part to this article series will build upon this implementation by
covering how we can overload this operator. This will enable for objects to be
used as operands (such as those from the GMP library or those that implement an
__toString
method). It will also show how we can add proper support for
strings (not like the support seen with PHP's current range
function). But for now, I hope this has served as a nice demonstration of some
of ZE's further aspects when implementing operators into PHP.