Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Remove find type assertion to allow other iterables #16110

Merged
merged 3 commits into from
May 4, 2016

Conversation

ararslan
Copy link
Member

This implements and tests #16022.

nnzA = countnz(A)
I = similar(A, Int, nnzA)
function find(A)
cA = typeof(A) <: AbstractArray ? A : collect(A)
Copy link
Member

@nalimilan nalimilan Apr 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think calling collect is a good idea. For example, find(1:10000000) is going to kill your computer, while the current code works perfectly fine. People will have to call collect manually if they want to work on iterables on which one can only pass over once.

So just remove the ::AbstractArray assertion and replace similar(Int, nnzA) with Vector{Int}(nnzA), and you should be fine.

(BTW: it's always better to avoid branches on types inside functions, and create a separate method which calls the more specific one after calling collect.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had conditioned on the type to avoid unnecessarily collecting. (Side note: in your example, UnitRange <: AbstractArray, so it wouldn't get collected. But I get what you mean.) Great idea regarding Vector, I'll do that. Thanks so much for the feedback! 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Side note: in your example, UnitRange <: AbstractArray, so it wouldn't get collected. But I get what you mean.)

Good catch. Anyway, you get the idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to get in my daily dose of pedantry somehow 😄

@ararslan
Copy link
Member Author

ararslan commented Apr 30, 2016

Huh, it errored on OS X but it looks like it just timed out. Will close and reopen the PR.

@ararslan ararslan closed this Apr 30, 2016
@ararslan ararslan reopened this Apr 30, 2016
nnzA = countnz(A)
I = similar(A, Int, nnzA)
I = Vector{Int}(nnzA)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this change the dimensionality of the output when the input is an AbstractArray ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think find always returns a vector: this is what similar did as a single dimension argument was passed. Anyway it doesn't make sense to keep the size of the input array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah these methods with size arguments to similar always make me scratch my head in terms of what they're for. I guess similar(::SparseVector, T, n) could return something other than Vector, but wouldn't make too much sense to use that for find.

@nalimilan
Copy link
Member

LGTM. OK to merge?

@Keno Keno merged commit 7a4d48a into JuliaLang:master May 4, 2016
@ararslan ararslan deleted the find_iterable branch May 4, 2016 22:47
@nalimilan
Copy link
Member

Thanks!

@@ -362,6 +362,14 @@ a = [0,1,2,3,0,1,2,3]
@test findprev(isodd, [2,4,5,3,9,2,0], 7) == 5
@test findprev(isodd, [2,4,5,3,9,2,0], 2) == 0

# find with general iterables
s = "julia"
@test find(s) == [1,2,3,4,5]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test conflicts directly with #16024. I also think that this is bizarre behavior:

julia> find("foo\0bar")
6-element Array{Int64,1}:
 1
 2
 3
 5
 6
 7

Sure, that's what falls out of the definition, but how is this a useful or meaningful operation?

Copy link
Member

@StefanKarpinski StefanKarpinski May 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential solutions:

  • change a != 0 to !isequal(a, 0); won't find -0.0
  • change a != 0 to !(isequal(a, 0) | isequal(a, -0.0)); weird and maybe inefficient
  • change a != 0 to a != convert(eltype(A), 0)

The last retains the bizarre behavior for strings of finding the indices of all characters but '\0'; the former two would at least lose that weird behavior by definition, which is nice, I guess.

Copy link
Member Author

@ararslan ararslan May 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah crap, I didn't realize this was getting the non-null indices; I thought it was getting all indices. (Though in retrospect it should have been obvious that it would do this. 😕) I definitely should have added a test for that. This behavior isn't what I had intended.

Would it make sense to add a separate method for strings that just does something like collect(1:endof(s))? For things like Base.UTF8proc.GraphemeIterator it would still do the string-to-int comparison and return all indices. That would only become an issue if the NotComparableError thing happens.

What do you think about that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for either keeping the current behaviour, or adding a method just to raise an error. Do we have a use case for this (it didn't work until now...)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have a use case in mind for find("string"); I was primarily interested in opening up find(testf::Function, A) for strings. Returning all indices for a string makes sense to me though, at least more so than returning indices of non-null bytes. I suppose that isn't so bad, just kind of weird.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but returning all valid indices isn't the definition of find, so I don't think it's a good idea to use it for that. If needed, we could imagine using the new function from #16260 for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, find finds the indices of all numerically nonzero values; the only reason why it currently gets non-null indices for strings is that 0 == '\0' is true. If/when integer-character comparison goes away, it will fail for strings anyway. I'd be fine making find("string") a MethodError. What do you think, @StefanKarpinski?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants