Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip uninitialized typed properties when serializing objects #5396

Closed
wants to merge 1 commit into from

Conversation

nicolas-grekas
Copy link
Contributor

@nicolas-grekas nicolas-grekas commented Apr 15, 2020

Partially reverts 846b647: instead of throwing, this skips uninitialized typed properties when serializing objects.

This preserves the unserialize(serialize($obj)) == $obj identity.

Fixes bug https://bugs.php.net/79447

@nikic
Copy link
Member

nikic commented Apr 16, 2020

cc @TysonAndre

@nikic
Copy link
Member

nikic commented Apr 16, 2020

As before, I'm not really sure this is the right behavior, but I'm willing to accept it in the interest of pragmatism.

@nicolas-grekas
Copy link
Contributor Author

I'm willing to accept it in the interest of pragmatism.

that's all I need :)

I was wondering: can/should we narrow this case even further and restrict to properties that have no default value only? If yes, how do we achieve this?

@nicolas-grekas
Copy link
Contributor Author

I added a second commit to serialize uninitialized nullable typed properties with a default value to null. This means that such uninitialized properties will unserialize to null instead of the corresponding default value. This is a better approximation of the original state. WDYT?

ext/standard/var.c Outdated Show resolved Hide resolved
@nikic
Copy link
Member

nikic commented Apr 16, 2020

I added a second commit to serialize uninitialized nullable typed properties with a default value to null. This means that such uninitialized properties will unserialize to null instead of the corresponding default value. This is a better approximation of the original state. WDYT?

This would be inconsistent with normal serialization behavior though. I also don't like making the behavior dependent on whether the type is nullable or not.

ext/standard/var.c Outdated Show resolved Hide resolved
ext/standard/var.c Outdated Show resolved Hide resolved
prop = &prop_info->ce->default_properties_table[OBJ_PROP_TO_NUM(prop_info->offset)];
}
if (Z_ISUNDEF_P(prop) || !ZEND_TYPE_ALLOW_NULL(prop_info->type)) {
return SUCCESS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is going to return success, it might make some sense to add a placeholder such as IS_UNDEF zend_hash_add(ht, name, undef_zval) when building this, then check for IS_UNDEF when using the resulting table

Otherwise, __sleep will inconsistently fail to warn about return ['prop', 'prop']; when prop is an uninitialized typed property

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would that not break on unserialize(), since that's the point of this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think trying to preserve that warning is worth the complications (using IS_UNDEF for this is not possible -- IS_UNDEF in hashtable means the element doesn't exist).

@nicolas-grekas
Copy link
Contributor Author

nicolas-grekas commented Apr 16, 2020

This would be inconsistent with normal serialization behavior though.

How that? The normal serialization behavior is to serialize a null when __sleep() returns the name of a missing property. That's consistent to me, no?

I also don't like making the behavior dependent on whether the type is nullable or not.

Isn't this the best we can do with the current serialization format?
The alternative (the 1st commit) is to skip these properties, which means unserializing them to they default value (when the originally serialized property was uninitialized). null is closer to the original to me, thus this proposal.

--FILE--
<?php

class Test {
public ?int $w = 123;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it will do something I wouldn't expect for an unset public array $notNullable = ['someValue'];

class Example {
    public array $notNullable = ['someValue'];
    public function __sleep() { return ['notNullable']; }  // also ['notNullable', 'notNullable'] should emit a notice.
}
$x = new Example();
unset($x->notNullable);
serialize($x);  // I'd expect this to throw because unserialize(serialize($x)) != $x, but this code looks like it won't
			if (Z_ISUNDEF_P(prop) || !ZEND_TYPE_ALLOW_NULL(prop_info->type)) {
				return SUCCESS;

Also, further behaviors would probably be more readable in individual test files.

} else {
prop = &prop_info->ce->default_properties_table[OBJ_PROP_TO_NUM(prop_info->offset)];
}
if (Z_ISUNDEF_P(prop) || !ZEND_TYPE_ALLOW_NULL(prop_info->type)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably document that in the case that ZEND_TYPE_ALLOW_NULL(prop_info->type), the serializer will later convert IS_UNDEF to null in the serialized value as the closest available equivalent.

@nikic
Copy link
Member

nikic commented Apr 16, 2020

This would be inconsistent with normal serialization behavior though.

How that? The normal serialization behavior is to serialize a null when __sleep() returns the name of a missing property. That's consistent to me, no?

I mean the normal, non-sleep behavior. The premise of this change, to me, is to align the behavior of sleep and non-sleep serialization, in that both would completely ignore uninitialized properties.

I also don't like making the behavior dependent on whether the type is nullable or not.

Isn't this the best we can do with the current serialization format?
The alternative (the 1st commit) is to skip these properties, which means unserializing them to they default value (when the originally serialized property was uninitialized). null is closer to the original to me, thus this proposal.

I don't think it's really closer. I guess the angle you're going for here is that "null" also makes "isset" return false. On the other hand, the default is what you get if you don't initialize the property, one could reasonably argue that is closest to the "uninitialized" state.

If you take into account that you can only run into this special case if a) the property is nullable b) has a specified default value and c) has been explicitly unset, the special case does not seem worthwhile.

@nicolas-grekas
Copy link
Contributor Author

nicolas-grekas commented Apr 16, 2020

Oops, bad push, reopening.

I don't think it's really closer. I guess the angle you're going for here is that "null" also makes "isset" return false. On the other hand, the default is what you get if you don't initialize the property, one could reasonably argue that is closest to the "uninitialized" state.

If you take into account that you can only run into this special case if a) the property is nullable b) has a specified default value and c) has been explicitly unset, the special case does not seem worthwhile.

OK, works for me, I removed the second commit, back to simpler behavior that doesn't depend on defaults nor nullable types.

@TysonAndre
Copy link
Contributor

As an example of what I was referring to for non-nullable properties - this change makes sense for some use cases, but not others.

The intent of the earlier change was to make sure that unserialize(serialize()) continued to be as close as possible to the original data (and start throwing when the data would be unrepresentable)

<?php
class A {
    public int $x;
    public function __sleep() {
        return ['x'];
    }
}
class B {
    public int $x = 2;
    public function __sleep() {
        return ['x'];
    }
}
$a = new A();
$a->x = 2;
unset($a->x);
serialize($a);  // should not throw

$b = new B();
unset($b->x);
serialize($b);  // should throw instead of serializing a representation of the properties that's different from the actual properties when unserializing

@nicolas-grekas
Copy link
Contributor Author

nicolas-grekas commented Apr 16, 2020

I see what you mean @TysonAndre. I'm going to let @nikic arbitrate this one.
Please, just one request if we go with throwing in some cases: that Error should be turned into an Exception. See my previous comment on the topic.

@TysonAndre
Copy link
Contributor

My personal preference would be to only do this in the case where the property default is IS_UNDEF (i.e. public int $x;, public ?stdClass $y;, etc).

That was definitely an oversight in my previous change.

For the other change (for everything in __sleep), I'd understand it, but I think this might introduce an inconsistency.
I haven't run the code, and I'll get back to this tonight, but:

  • public ?int $x = 1; would be serialized as NULL if there was no __sleep(), and with the latest revision of this change, would not be serialized at all if it was mentioned in __sleep(), which is inconsistent in a different way?

@TysonAndre
Copy link
Contributor

TysonAndre commented Apr 16, 2020

that Error should be turned into an Exception. See my previous comment on the topic.

I'm fine with that
Any specific exception subclass in mind? https://www.php.net/manual/en/spl.exceptions.php

@nicolas-grekas
Copy link
Contributor Author

Not any specific in mind. RuntimeException maybe?

@nikic
Copy link
Member

nikic commented Apr 16, 2020

Please, just one request if we go with throwing in some cases: that Error should be turned into an Exception. See my previous comment on the topic.

I think Error is already the correct exception type here: It's the same as you get when accessing an uninitialized property directly.

The issue is that you're trying to fix a Doctrine bug ... inside Symfony. If we decide that this should continue throwing, then this issue really needs to be addressed directly inside Doctrine, by fixing their __sleep implementation. If they have a throwing __sleep, that affects more than just Symfony.


The rest of this discussion seems to have circled back around to all the issue we already discussed in #5027 :)

@nikic
Copy link
Member

nikic commented Apr 16, 2020

The intent of the earlier change was to make sure that unserialize(serialize()) continued to be as close as possible to the original data (and start throwing when the data would be unrepresentable)

I think the core problem here is that this is already not true for plain serialization, no __sleep involved:

<?php
class A {
    public int $x = 2;
}
$a = new A;
unset($a->x);
var_dump(unserialize(serialize($a)));

This will return A { $a = 2 } rather than A { $a = undef }. As we discussed before, the only way to fix this problem is to actually change the serialization format to introduce a new U; type.

Given that this is already the behavior we have for non-sleep serialization, I feel like it's not overly awful to also adopt the same behavior when it comes to __sleep.

@nicolas-grekas
Copy link
Contributor Author

nicolas-grekas commented Apr 16, 2020

I think Error is already the correct exception type here: It's the same as you get when accessing an uninitialized property directly.

Accessing an uninitialized property directly is quite different from accessing it indirectly via unserialize(). To me that's not an Error anymore, but a failure to serialize. The function could just return false actually, that'd be fine for what it does.

The place where we serialize is barely related to Doctrine: that's a quite common code that computes a hash for an arbitrary value. If we want to write this logic in a failsafe and generic way, we should not have to skip any throwables...

@nikic
Copy link
Member

nikic commented Apr 16, 2020

Accessing an uninitialized property directly is quite different from accessing it indirectly via unserialize(). To me that's not an Error anymore, but a failure to serialize. The function could just return false actually, that'd be fine for what it does.

The place where we serialize is barely related to Doctrine: that's a quite common code that computes a hash for an arbitrary value. If we want to write this logic in a failsafe and generic way, we should not have to skip any throwables...

Okay, I can see both sides of the argument here. It depends on whether one wants to view this as "your __sleep implementation is broken, go fix it" or as "this object does not support serialization at this time".

@TysonAndre
Copy link
Contributor

As we discussed before, the only way to fix this problem is to actually change the serialization format to introduce a new U; type.

Yeah, I was thinking about that as serialize($data, ['serialize_undefined' => true]) (config setting name to be determined)

TysonAndre#14 starts working on the bare minimum to serialize and unserialize U; as a property just to see if it was reasonably doable, and is extremely broken and incomplete. I probably won't get around to working on it this month.

An alternative would be serialize($data, ['minimum_php_version_id' => '8.0']), but

  • Some developers might expect apps to work exactly the same way as before for serialize(unserialize()). If an option such as ['binary_format' => true] gets added to serialize(), they might want to use that without changing the way undeclared properties are serialized.
    (Not necessarily a great argument, but still an argument)

@cmb69 cmb69 added the Bug label Apr 17, 2020
Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this in conjunction with #5413, I think we should go forward with this change.

prop = &prop_info->ce->default_properties_table[OBJ_PROP_TO_NUM(prop_info->offset)];
}
if (Z_ISUNDEF_P(prop) || !ZEND_TYPE_ALLOW_NULL(prop_info->type)) {
return SUCCESS;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think trying to preserve that warning is worth the complications (using IS_UNDEF for this is not possible -- IS_UNDEF in hashtable means the element doesn't exist).

ext/standard/var.c Outdated Show resolved Hide resolved
Partially reverts 846b647: instead of
throwing, this skips uninitialized typed properties when serializing objects.

This preserves the `unserialize(serialize($obj)) == $obj` identity.

Fixes bug https://bugs.php.net/79447
Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG from my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants