-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Remove find type assertion to allow other iterables #16110
Conversation
nnzA = countnz(A) | ||
I = similar(A, Int, nnzA) | ||
function find(A) | ||
cA = typeof(A) <: AbstractArray ? A : collect(A) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think calling collect
is a good idea. For example, find(1:10000000)
is going to kill your computer, while the current code works perfectly fine. People will have to call collect
manually if they want to work on iterables on which one can only pass over once.
So just remove the ::AbstractArray
assertion and replace similar(Int, nnzA)
with Vector{Int}(nnzA)
, and you should be fine.
(BTW: it's always better to avoid branches on types inside functions, and create a separate method which calls the more specific one after calling collect
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had conditioned on the type to avoid unnecessarily collect
ing. (Side note: in your example, UnitRange <: AbstractArray
, so it wouldn't get collect
ed. But I get what you mean.) Great idea regarding Vector
, I'll do that. Thanks so much for the feedback! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Side note: in your example, UnitRange <: AbstractArray, so it wouldn't get collected. But I get what you mean.)
Good catch. Anyway, you get the idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to get in my daily dose of pedantry somehow 😄
Huh, it errored on OS X but it looks like it just timed out. Will close and reopen the PR. |
nnzA = countnz(A) | ||
I = similar(A, Int, nnzA) | ||
I = Vector{Int}(nnzA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this change the dimensionality of the output when the input is an AbstractArray
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think find
always returns a vector: this is what similar
did as a single dimension argument was passed. Anyway it doesn't make sense to keep the size of the input array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah these methods with size arguments to similar
always make me scratch my head in terms of what they're for. I guess similar(::SparseVector, T, n)
could return something other than Vector
, but wouldn't make too much sense to use that for find
.
LGTM. OK to merge? |
Thanks! |
@@ -362,6 +362,14 @@ a = [0,1,2,3,0,1,2,3] | |||
@test findprev(isodd, [2,4,5,3,9,2,0], 7) == 5 | |||
@test findprev(isodd, [2,4,5,3,9,2,0], 2) == 0 | |||
|
|||
# find with general iterables | |||
s = "julia" | |||
@test find(s) == [1,2,3,4,5] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test conflicts directly with #16024. I also think that this is bizarre behavior:
julia> find("foo\0bar")
6-element Array{Int64,1}:
1
2
3
5
6
7
Sure, that's what falls out of the definition, but how is this a useful or meaningful operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential solutions:
- change
a != 0
to!isequal(a, 0)
; won't find -0.0 - change
a != 0
to!(isequal(a, 0) | isequal(a, -0.0))
; weird and maybe inefficient - change
a != 0
toa != convert(eltype(A), 0)
The last retains the bizarre behavior for strings of finding the indices of all characters but '\0'
; the former two would at least lose that weird behavior by definition, which is nice, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah crap, I didn't realize this was getting the non-null indices; I thought it was getting all indices. (Though in retrospect it should have been obvious that it would do this. 😕) I definitely should have added a test for that. This behavior isn't what I had intended.
Would it make sense to add a separate method for strings that just does something like collect(1:endof(s))
? For things like Base.UTF8proc.GraphemeIterator
it would still do the string-to-int comparison and return all indices. That would only become an issue if the NotComparableError
thing happens.
What do you think about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote for either keeping the current behaviour, or adding a method just to raise an error. Do we have a use case for this (it didn't work until now...)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't have a use case in mind for find("string")
; I was primarily interested in opening up find(testf::Function, A)
for strings. Returning all indices for a string makes sense to me though, at least more so than returning indices of non-null bytes. I suppose that isn't so bad, just kind of weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but returning all valid indices isn't the definition of find
, so I don't think it's a good idea to use it for that. If needed, we could imagine using the new function from #16260 for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, find
finds the indices of all numerically nonzero values; the only reason why it currently gets non-null indices for strings is that 0 == '\0'
is true. If/when integer-character comparison goes away, it will fail for strings anyway. I'd be fine making find("string")
a MethodError
. What do you think, @StefanKarpinski?
This implements and tests #16022.