Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge deleteat() into delete!() and deprecate it #20531

Closed
wants to merge 1 commit into from

Conversation

nalimilan
Copy link
Member

The generic definition of delete!() suits deleteat!() very well, and
the signatures do not conflict.

I have kept the internal _deleteat! function in base/array.jl since it mirrors jl_array_del_at from src/array.c, which itself is part of a consistent C API, with jl_array_grow_at, jl_array_grow_at_end, jl_array_grow_at_beg, jl_array_del_at_beg and jl_array_del_at_end.

I wasn't sure where to put the generic docstring, suggestions welcome. In particular, it would be great to find a way to have it appear before the more specific ones (which wasn't the case either before moving it out of helpdb).

@nalimilan nalimilan force-pushed the nl/deleteat2 branch 2 times, most recently from 8d65983 to a6d4207 Compare February 8, 2017 21:53
@nalimilan
Copy link
Member Author

(Damn, I thought I could guess the PR # before opening it, but Steven opened a new one a few seconds before I pushed the button...)

@StefanKarpinski
Copy link
Member

I'm left wondering what the original reason for having deleteat! and delete! as separate functions was... The litmus test I use for this sort of thing is to see if we can describe what this operation does in a generic but sensible way that applies to all relevant types.

@nalimilan
Copy link
Member Author

Yes, I've used the same test and it works pretty well (see docstring).

The PR which introduced deleteat! (in 0.3) was #5176.

@ararslan ararslan added the deprecation This change introduces or involves a deprecation label Feb 8, 2017
@StefanKarpinski
Copy link
Member

@tkelman points out that deleteat! shortens the integer-indexed collection, whereas delete! does not. I believe that's why these were made separate functions. cc @kmsquire, @JeffBezanson – do either of you remember the details of this?

@StefanKarpinski
Copy link
Member

To quote myself from over here:

The thinking behind deleteat! being separate from delete! is coming back to me and IIRC, it was precisely that delete! didn't make sense in an indexed collection because it has the effect of change all subsequent "keys" – i.e. indices. The delete! function, on the other hand only affects the key you ask to delete, which applies to sparse matrices/vectors and dicts, but not arrays.

@kmsquire
Copy link
Member

kmsquire commented Feb 9, 2017

In #5118 and #3023, it was argued that Vectors sometimes implement set operations, and therefore should support both delete! and deleteat!. Looking back now, I don't find that reasoning particularly compelling (Vectors aren't sets, even if they sometimes act like them), but @mlubin was the one who suggested this in #3023, so maybe he can justify it more.

@nalimilan
Copy link
Member Author

I don't find this argument convincing either. I don't see what real-world code would accept an argument without paying attention to wether it's an array or a dict, call delete! on it, and expect all other indices to remain valid. This combination of steps is very unlikely IMHO.

Regarding the specific case of sets mentioned at #3023, @mlubin said himself that the issue would be fixed by having union return sets when passing sets. More fundamentally, it seems to me that the abuse is to be allowed to write e.g. delete!(Set([1]), 1) to remove value 1 from sets. Indeed, delete!(c, k) is documented to remove entry with key k, not entry with value k. Yet, one cannot index a set with one of its values, nor is keys or indices defined for sets. The behavior of delete! on sets is completely undocumented.

So, if anything, delete!(::Set, k) should be deprecated in favor of a separate function like deleteval! or deleteeq! (for consistency with findeq from the Find & Search Julep). That function could be supported on vectors and would be equivalent to deleteeq!(x, vals) = delete!(x, findin(x, vals))

@StefanKarpinski
Copy link
Member

I believe the relevant view on sets is that they are like an associative mapping of each value to itself.

@mlubin
Copy link
Member

mlubin commented Feb 9, 2017

I don't have a strong opinion on this at the moment. I'll support the consensus.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Feb 9, 2017

FWIW, this is precisely one of the aspects of standard library APIs that I think needs review for 1.0.

@nalimilan
Copy link
Member Author

I believe the relevant view on sets is that they are like an associative mapping of each value to itself.

That makes sense, but then keys(::Set) and getindex(::Set, k) should be implemented. That would mean delete! can perfectly work for dicts, sets and arrays with a consistent definition (remove entry with key/index k).

@quinnj
Copy link
Member

quinnj commented Feb 9, 2017

FWIW, I don't think it'd be all that surprising as a user if delete! removed the index and the rest of the elements shifted. I feel like that's like an "interview 101" type of question, right? ("given this for-loop that modifies the collection it's iterating through...."). +1 to this change from me. If you're really concerned w/ preserving keys, you're using the wrong datastructure.

Are there performance costs for deleting and shifting though?

The generic definition of delete!() suits deleteat!() very well, and
the signatures do not conflict.
@StefanKarpinski
Copy link
Member

StefanKarpinski commented Feb 9, 2017

Even if it's unsurprising in some contexts, there's a data structure where these two generic actions conflict: sparse vectors. The delete! operation for sparse arrays is currently called dropstored!; it leaves all indices the same but deletes the values associated with the given keys, restoring the default value (zero). This is the same as for Dict (no default) and DefaultDict (chosen default). No other keys are affected. The deleteat! operation for sparse vectors should, by analogy to deleteat! for normal vectors, remove the given indices, shifting the rest of the indices down accordingly. There are two reasons this conflict isn't immediately apparent:

  1. delete! for sparse arrays has been called dropstored!
  2. deleteat! for sparse vectors hasn't been implemented.

@nalimilan
Copy link
Member Author

Are there performance costs for deleting and shifting though?

You can have a look at code touched by #20465: for vectors, deleteat! just copies entries one by one with a lag corresponding to the number of entries deleted up to the current position. So that's indeed much more work than for a dict AFAICT, except if the deleted entry is near the end of the vector.

@nalimilan
Copy link
Member Author

@StefanKarpinski I don't really buy that line of thinking. Sparse vectors differ from dicts in that their keys (indices) must be contiguous. So it's natural that removing an entry in the middle of the index range would shift subsequent entries down by one. Do you really suggest making dropstored! a delete!(::SparseVector, ...) method? IMHO that would be confusing.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Feb 9, 2017

So what's your proposal for the generic meaning of delete! then? I.e. one that makes sense for both associative collections and vectors.

@tkelman
Copy link
Contributor

tkelman commented Feb 9, 2017

Do you really suggest making dropstored! a delete!(::SparseVector, ...) method?

Yes, see discussion starting at #20516 (comment) - if delete! is considered primarily an Associative operation where you are removing the stored value at a single key and returning it to its default, without modifying anything else about the container (any other keys or values, size, etc) then dropstored! does fit pretty closely in the same abstraction as delete!.

deleteat! is more about linear queues, does mostly the same thing as splice! but without the replacement option and returning the collection instead of the deleted items. We don't define splice! on Associatives right now.

@nalimilan
Copy link
Member Author

So what's your proposal for the generic meaning of delete! then? I.e. one that makes sense for both associative collections and vectors.

I made a proposal in this PR, which could be improved to something like "Delete item(s) at the given key(s) or index(es) in a collection, and return the collection." Then we would specify what "delete item" means for AbstractVector in specific docstrings. I think that would be pretty natural (as expressed by several people in the previous discussions).

Though I see your point about merging dropstored! into delete! instead of exporting a new dedicated function.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Feb 9, 2017

IMO, that's just sweeping under the carpet whether it affects other keys or not, which is the crux of the question of whether these two functions can be semantically merged or not. And the sparse vector case is right in the middle of the semantic ambiguity: some people think it makes sense for delete! to shift the remain indices down, some people think it should leave them alone. You're basically saying that you get to make it up on a per-data structure basis, which seems like an inadequate definition to me.

@JeffBezanson
Copy link
Member

Also comes up in IndexedTables, where people want a delete! method for removing rows. It makes sense to convert among Arrays, IndexedTables, and Dicts, and have everything either work the same or not be defined. I could imagine delete! similarly removing single items for something like a NullableArray.

@nalimilan
Copy link
Member Author

OK, let's close this then.

@nalimilan nalimilan closed this Feb 9, 2017
@nalimilan nalimilan deleted the nl/deleteat2 branch February 9, 2017 20:49
@StefanKarpinski
Copy link
Member

FWIW, I don't love that we have two such similar functions as delete! and deleteat! – I'd love to figure out a point of view where these can be collapsed, but the sparse vector examples seems to preclude that.

@pabloferz
Copy link
Contributor

pabloferz commented Feb 9, 2017

If there is no foreseeable way to merge both functions, we might at least consider renaming deleteat! to shrink! or collapse!

@nalimilan
Copy link
Member Author

As @tkelman noted on the other issue, it looks like we need 2 of these three functions: deleteat!, delete!, dropstored!. Or we could (ab)use multiple dispatch by having delete(::SparseVector, ::Int) equivalent to deleteat!, and delete!(::SparseVector, ::Tuple{Int}) equivalent to dropstored!, considering that cartesian indices are keys for AbstractArray. That definition makes a lot of sense to me, but the change in behaviour is probably to radical for such a subtle difference.

@StefanKarpinski
Copy link
Member

Now that I look at it, the only difference between deleteat!(a, inds) and splice!(a, inds) is that the former returns a and the latter returns the deleted values. Of course constructing that collection of values to return is sometimes useful but expensive when not needed.

@ChristianKurz
Copy link
Contributor

ChristianKurz commented Sep 19, 2017

@pabloferez i do like renaming deleteat! to shrink!, but i think this moves the function closer to pop! and shift!.
In that case, I would expect it to return the deleted element, instead of the mutated array.

Edit: nevermind, i kind of missed Stefans' post.
So shouldn't we deprecate deleteat! in favor of splice! ?

Oh, and it would be nice to have some pointers in the docstrings of pop! and shift! to each other and to splice! - I could not guess the latter.

@StefanKarpinski
Copy link
Member

Oh, and it would be nice to have some pointers in the docstrings of pop! and shift! to each other and to splice! - I could not guess the latter.

That would be great, if you wouldn't mind making a PR adding them :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprecation This change introduces or involves a deprecation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants