Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include the source location in Closure names #13550

Merged
merged 14 commits into from
Apr 12, 2024
Merged

Conversation

TimWolla
Copy link
Member

This change makes stack traces involving Closures, especially multiple
different Closures, much more useful, because it's more easily visible which
closure was called for a given stack frame.

The implementation is similar to that of anonymous classes which already
include the file name and line number within their generated classname.


/cc @derickr, as Xdebug includes a similar functionality and he might've opinions.

This change makes stack traces involving Closures, especially multiple
different Closures, much more useful, because it's more easily visible *which*
closure was called for a given stack frame.

The implementation is similar to that of anonymous classes which already
include the file name and line number within their generated classname.
if (op_array->fn_flags & ZEND_ACC_CLOSURE) {
zend_string *filename = op_array->filename;
uint32_t start_lineno = decl->start_lineno;
unqualified_name = zend_strpprintf(0, "{closure:%s:%" PRIu32 "}", ZSTR_VAL(filename), start_lineno);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that may just be emitted for stack traces, instead of changing the closure name? Otherwise, this breaks de-duplication of file names for interned strings, storing the file name many many times. They can be very long too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the information readily available can also be useful for debuggers and tracers as outlined by my mention of Xdebug in the PR description and the adjustments to the phpdbg and observer tests.

How would this be different from including the filename for anonymous classes which is already done? Would truncating the filename to just the basename help? This would only slightly reduce the usefulness within the name, but could greatly reduce the memory consumption.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anonymous classes are arguably many factors less common than closures. The default interned string buffer is only 8MB, so this change may potentially fill a significant amount of this memory with repeating strings.

If Xdebug already solves this problem, is there an aspect of it that does not suffice? Similarly, it seems to me observers and phpdbg can access this information from C, and output it to the user.

Note that I'm not questioning the usefulness of displaying the information in general. This is very clearly a win in that regard.

Copy link
Member Author

@TimWolla TimWolla Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default interned string buffer is only 8MB, so this change may potentially fill a significant amount of this memory with repeating strings.

I don't know how it works, but is this not an issue with regular function and method names as well? I expect regular functions and methods to exceed the number of closures and to also have mostly distinct names. Or is this more about the length of the resulting function name where truncating the path to the basename would help reduce the impact?

If Xdebug already solves this problem, is there an aspect of it that does not suffice?

That's probably where the “readily available” bit comes into play. By natively storing the information it does not need to be reimplemented in every tool and it also avoids the redundant computation of the extended closure name in hot loops. It's not particularly expensive, but at will likely require repeated allocations of a zend_string (unless somehow cached, but then we're back to memory usage).

Copy link
Member

@iluuu1994 iluuu1994 Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how it works, but is this not an issue with regular function and method names as well? I expect regular functions and methods to exceed the number of closures and to also have mostly distinct names. Or is this more about the length of the resulting function name where truncating the path to the basename would help reduce the impact?

Right, the base name for all the file names is not deduplicated in the interned string buffer, nor are namespaces in class/function names, so there's already some redundancy. References to the same class/function are deduplicated. If a file contains hundreds of closures, the same name will be stored hundreds of time for just this file, one time for each closure.

Maybe this is a non-issue, it's just something to think about. Maybe we just need to bump the default interned string buffer size. 8MB is a fairly low amount. It would be good to know much this increases memory consumption for something like Symfony Demo, or Wordpress.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for checking! That's roughly +0.8%, which indeed doesn't sound too bad. Of course, it's hard to say whether this is representative, but at least we have a rough idea of the increase now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure, why do we need to keep the names for closures.
We may always reconstruct them, concatenating "filename" and "start_lineno" when necessary (in zend_fetch_debug_backtrace).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may always reconstruct them, concatenating "filename" and "start_lineno" when necessary (in zend_fetch_debug_backtrace).

The “full” name of the function is not just useful for backtraces. This PR also adds the name to the __debugInfo() output of closures, allowing users to easily see the closure origin when dumping a variable containing it. A reusable function for closure naming could be provided in php-src, but that would still require everyone to use it to achieve consistent naming results for the developer. By including it in the function name field there would be no inconsistencies. The additional function calls in various locations would also increase machine code size.

See also the second paragraph of my previous comment: #13550 (comment)


The question is whether the time-memory-tradeoff is worth it – both from a CPU time (for the repeated recomputation) and from a developer time (for the improved consistency) perspective.

Copy link
Member

@dstogov dstogov Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In production use we should rarely have calls to debug_backtrace and __debugInfo(), so that closure names are going to be never used (just kept in the memory). Both debug_backtrace and __debugInfo() already construct arrays and the additional name reconstruction won't make a big difference.

I think moving this name reconstruction code from zend_compile.c into zend_closures.c will only make the things more clear. This will also remove the new zend_string_copy()/zend_string_release() calls that you introduced there for each op_array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've realized that adding the filename to the function name made the namespace effectively redundant. As Closures previously had a static “unqualified name” there should be no risk of collision either. Dropping the namespace reduced the total memory usage with the Symfony demo to

  ["interned_strings_usage"]=>
  array(4) {
    ["buffer_size"]=>
    int(8388608)
    ["used_memory"]=>
    int(5067336)
    ["free_memory"]=>
    int(3321272)
    ["number_of_strings"]=>
    int(48739)
  }

which is 209 additional interned strings and 33816 bytes of additional memory usage. Truncating the filename to just the basename would result in 13840 bytes of additional memory usage (but just the basename without the namespace might be too unspecific).

This change also allowed me to simplify the implementation in zend_compile.c to just an additional if (op_array->fn_flags & ZEND_ACC_CLOSURE) {, without touching the logic for regular functions at all.

What do you think?

This is redundant with the included filename.
@dstogov
Copy link
Member

dstogov commented Mar 11, 2024

Finally I'm not sure if this feature is necessary at all.

The description of the patch notices only the changes of the Closure names in stack traces. But stack traces already include file names and lines numbers. So it's npt a big problem to locate the closure. The compound closure names may make sense but really not necessary, especially if they eat memory.

Introduction of name in closure debug info (var_dump output) wasn't mentioned in comments and wasn't discussed. I think changes like this shouldn't be introduced silently. I'm not sure if RFC is required, but at least they should be checked carefully.

How will this new name filed will work in case a subclass of Closure had its own name property?

@Girgias
Copy link
Member

Girgias commented Mar 11, 2024

How will this new name filed will work in case a subclass of Closure had its own name property?

Closure is a final class, so this last point shouldn't be a concern.

@dstogov
Copy link
Member

dstogov commented Mar 11, 2024

Closure is a final class, so this last point shouldn't be a concern.

agree.

@TimWolla
Copy link
Member Author

So it's npt a big problem to locate the closure.

From experience, I disagree. While the information is technically there, one needs to look into the callee stack frame to get the line number of the closure and even then it's not easy to see if two closures within a single stack trace are the same or different, because multiple different lines might refer to the same closure if the closure somehow calls itself recursively.

Consider the following example:

<?php

function error() {
	throw new \Exception();
}

function cleanup(array $array) {
	return array_map(function ($item) {
		if (is_array($item)) {
			return cleanup($item);
		}
		if ($item === 'error') {
			error();
		}
		return $item;
	}, $array);
}

cleanup([['asd', 'error']]);

It results in this stack trace:

PHP Fatal error:  Uncaught Exception in /tmp/stack-trace/test2.php:4
Stack trace:
#0 /tmp/stack-trace/test2.php(13): error()
#1 [internal function]: {closure}()
#2 /tmp/stack-trace/test2.php(8): array_map()
#3 /tmp/stack-trace/test2.php(10): cleanup()
#4 [internal function]: {closure}()
#5 /tmp/stack-trace/test2.php(8): array_map()
#6 /tmp/stack-trace/test2.php(19): cleanup()
#7 {main}
  thrown in /tmp/stack-trace/test2.php on line 4

I see that the closures perform a call in lines 10 and 13 respectively. But I can't easily see whether that's the same closure or not. With this PR it looks like this:

Fatal error: Uncaught Exception in /tmp/stack-trace/test2.php:4
Stack trace:
#0 /tmp/stack-trace/test2.php(13): error()
#1 [internal function]: {closure:/tmp/stack-trace/test2.php:8}('error')
#2 /tmp/stack-trace/test2.php(8): array_map(Object(Closure), Array)
#3 /tmp/stack-trace/test2.php(10): cleanup(Array)
#4 [internal function]: {closure:/tmp/stack-trace/test2.php:8}(Array)
#5 /tmp/stack-trace/test2.php(8): array_map(Object(Closure), Array)
#6 /tmp/stack-trace/test2.php(19): cleanup(Array)
#7 {main}
  thrown in /tmp/stack-trace/test2.php on line 4

It's clearly visible that both closures are the closure defined in test2.php on line 8, despite appearing twice in the stack trace with different line numbers of the callee frame.

How will this new name filed will work in case a subclass of Closure had its own name property?

Besides Gina's correct statement that \Closure is final, it's also not a property. It's just an additional field in the __debugInfo() output.

@TimWolla
Copy link
Member Author

I'm not sure if RFC is required, but at least they should be checked carefully.

I'm happy to write an RFC for this, if you feel that this is necessary. Of course I'd prefer to avoid the work if possible.

@dstogov
Copy link
Member

dstogov commented Mar 11, 2024

I don't care about this change a lot and I don't need RFC myself.
I don't support this (in the current state), but I won't object if other reviewers accept this.

@iluuu1994
Copy link
Member

iluuu1994 commented Mar 14, 2024

Two questions:

  • For var_dump, would it be more useful to show the actual file and line as keys, or location as file:line if you'd like to avoid two properties? That avoids the {closure:...} clutter.
  • Iff the answer to the first question is "no": Fake closures already show the function on var_dump as 'function', should the key be consistent? https://3v4l.org/ZgWNF#v8.3.4

I'd still prefer avoiding changing the name and manually "fixing" this as part of zend_fetch_debug_backtrace, especially since zend_closure_get_debug_info required changes anyway. But I don't feel strongly enough about it to fight it. If you insist on this approach, I'm ok with it.

staabm added a commit to staabm/silverstripe-framework that referenced this pull request May 25, 2024
staabm added a commit to staabm/PocketMine-MP that referenced this pull request May 25, 2024
staabm added a commit to staabm/scheduler that referenced this pull request May 25, 2024
staabm added a commit to staabm/kahlan that referenced this pull request May 25, 2024
staabm added a commit to staabm/injector that referenced this pull request May 25, 2024
staabm added a commit to staabm/kahlan that referenced this pull request May 25, 2024
samsonasik pushed a commit to kahlan/kahlan that referenced this pull request May 25, 2024
* Adjust pretty name of closures on PHP 8.4

related to php/php-src#13550

see analog symfony change: symfony/symfony#54614

* php php_cs to 3.7.2 temporarily
cmb69 added a commit to krakjoe/uopz that referenced this pull request Aug 3, 2024
andypost added a commit to andypost/opentelemetry-php-instrumentation that referenced this pull request Sep 5, 2024
andypost added a commit to andypost/opentelemetry-php-instrumentation that referenced this pull request Sep 6, 2024
brettmc pushed a commit to open-telemetry/opentelemetry-php-instrumentation that referenced this pull request Sep 6, 2024
otel-php-bot pushed a commit to opentelemetry-php/ext-opentelemetry that referenced this pull request Sep 6, 2024
dktapps pushed a commit to pmmp/PocketMine-MP that referenced this pull request Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants