-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modification during iteration #17
Comments
I'm leaning towards 1, as a modcount can also be used to make other shortcuts, such as having The overhead is similar between 1 and 2, where in the former you increment during updates and compare during validation, and in the latter you increment during iterator init but compare during updates. |
However, shouldn't sorting also be forbidden while iterating? If you mix modcount usage for both, this will not be properly enforced, right? |
Sorting wouldn't be allowed for obvious reasons. You could therefore add an element, then sort, and the iterator wouldn't know about it. So it's a flawed idea, unless sort uses a 'last sorted modcount' rather than updating the modcount. That was just an example though, a potential bonus. |
I think modifying values during iteration should be fine. The only concern here is modifying the actual collection- i.e., changing the size of the collection. I feel like it would be fairly straightforward to template behaviour off Java in this instance. |
Going with the "please be sensible" option here, it's on the user for now until this causes a problem that can't be avoided or is a big risk. Maybe 2.0 can see a modification guard that is incremented and decremented by the iterator, and all functions that modify should check the guard first. |
I think this does need to be addressed with explicit exceptions. I'm currently in the process of migrating to $m = new \Ds\Map();
for($i = 0; $i < 100000; ++$i){
$m[$i] = $i;
}
foreach($m as $k => $v){
unset($m[$k]);
}
var_dump(count($m)); outputs |
I agree and will work to implement this asap. 👍 |
I'm sure I replied to your question outside of github ... but in case I didn't ... 1) ... |
@krakjoe I was actually thinking 2. We can assume that there will be more modifications than iterator instances, so was thinking an Modcount:
Iterator count:
There's also the possibility of modcount overflowing, though it would always be non-zero in that case, right? |
Completely disallowing modifications during iteration sounds more desirable, because it'll make it easier to find where a bug is coming from. If one part of user code adds a new hashtable key during iteration (without realizing it wasn't just overwriting) and the code only crashes later on when the next iterator |
With that said, it's also possible to just disallow size-changing modifications if |
I agree with that @dktapps |
So long as this is true:
It's all good ... |
I have a draft implementation here: 76eec53 This follows @dktapps suggestion of size-changing modifications, which makes sense to me. I don't think we should disallow all modifications - only those that would corrupt the iterator. The list I came up with and applied is:
Map:
Set:
Stack/Queue
This allows us to still do stuff like:
|
I will leave this open for discussion, but will clean up and release at some stage if no changes are suggested. I will need to update the tests and the polyfill also, but those are easy to do. |
This will be implemented in 2.0 using copy-on-write for internal buffers. |
If I understand you correctly @rtheunissen that means the iteration behaviour will be consistent with arrays (iterating on a copy, modification of original permitted), did I understand that correctly? |
That is correct. |
@dktapps it also means that a |
Still valid for 1.x |
is it safe to use IteratorIterator($collection) while modifying the $collection? as i know, ArrayIterator can do this, why Collection cannot? doesn't IteratorIterator create its own iterator or is it using the collection's one? |
e.g. if i want:
|
my keys implement Hashable, so if Collection cannot do this, i'll have to use ArrayIterator and instead of I just don't get why Collection cannot simply do what ArrayIterator can |
This is because arrays are copy-on-write/persistent and ext-ds structures do not support this yet. When you iterate an array it increases its reference count, so that when you modify the array it actually creates a copy of the entire array and the modification is done on the copy. The iteration will still continue on the previous version of the array.
I think it behaves a bit like this: <?php
// Collection
$collection = [1, 2, 3];
// IteratorIterator
$generator = function (iterable $i) {
yield from $i;
};
// new IteratorIterator
$iterator = $generator($collection);
var_dump($iterator->current());
I would strongly suggest against this pattern. The
Because arrays are copy-on-write/persistent. ext-ds 2.0 will focus on persistence, but there are currently no plans to backport that to the 1.x branch. If you must modify a collection that may be iterated somewhere, you should |
idk if i'm doing this right but.. according to ddd, i have an Entity and an EntityId classes. something like this:
in this case, |
@rtheunissen thank you for your replies, it was interesting and informative to read them 👍 |
fyi, i've just got to know, that in php 7.3 ArrayIterator has a bug and cannot do this too. e.g.: https://3v4l.org/Ft0Dh
after returning an undefined value on |
@githubeing Please create a report on bugs.php.net. That UNKNOWN:0 is definitely a bug. |
In your specific case, it seems like the hash is unique so could be used as a key, sure. It is not recommended when you can not guarantee that a hash will be unique. @githubeing if you use a generator rather than an |
i've done: https://bugs.php.net/bug.php?id=77903 you may vote for it if you want btw |
not sure i understand what you mean. in my specific case, i use the ArrayIterator as a cache to convert a given Generator to an Iterator which will also have the Map's |
the idea was to make a class that would wrap a generator, lazy cache it and perform fetches from database only at moments when the data is actually used. and the it prints the following (when all classes are loaded):
i.e. the entity isn't fetched until it's being used directly. |
@nikic , i've only just seen that you're the same person that fixed the bug 😂 cool 👍 |
I encountered this issue where iterating over a Map I needed to conditionally remove some keys. The iteration using |
Until this is fixed, this should really be in the documentation! (I was looking at the documentation of php.net of this extension. I do not know if it's actually generated from something in this repo, or that such a note would have to be added elsewhere...) I was aware of the potential problem, but without a note in the documentation, I assumed the foreach would either be unaffected or just get the new object. Instead, the foreach ended up encountering an int (despite the fact that no int had ever been in the Set in question). Truly undefined behavior indeed. |
@rtheunissen out of curiosity, how are you planning to implement COW? The only solution I can think of is to make the modification APIs fluent (return modified copy, or self if refcount == 1) and get rid of in-place modification (except for structures that specifically want to be modified in-place). |
@dktapps internally we track the refcount of the internal structure attached to each object, and we copy that structure as necessary on write. |
@rtheunissen My question was aimed at the PHP reference side of things. So far as I know we can't easily mutate variable referencing It's possible my understanding of PHP internals is flawed, so this may not apply at all. |
In code: $vec = new Vector();
$vec2 = $vec;
$vec->push(1); //here $vec would need to be changed to a new Vector
var_dump($vec2); //should be empty |
Each of those objects will hold a reference to an internal vector, and it is only that vector will which be copied. The values aren't stored in or alongside the PHP object itself. Two vectors can share the same internal vector. |
For v1: What about implementing This issue is the biggest thing stopping me from adopting ds fully and it doesn't look like v2 is coming anytime soon, so my attention is focused on v1. (I notice the polyfill already does this using generators. For example, |
I agree. It would be nice. See #166 |
It's currently possible to modify / update a collection during iteration. This leads to undefined behaviour, because the internal iterator is not aware of these modifications. There are three possible solutions here:
Could also just raise a warning instead of throwing an exception.
@krakjoe, curious to hear your thoughts. 💡
The text was updated successfully, but these errors were encountered: