-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds the nth
function for iterables
#56580
base: master
Are you sure you want to change the base?
Conversation
""" | ||
nth(itr, n::Integer) | ||
Get the `n`th element of an iterable collection. Return `nothing` if not existing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning nothing
makes it impossible to distinguish between "the nth element was nothing
", and "there was no nth element". Perhaps return Union{Nothing, Some}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point.
Should it be Union{nothing, Some}
even in those cases where we know there can't be a nothing
value in the iterator (for sake of uniform api)? I.e. Count
Iterator or Repeated
(with its element different than nothing) or AbstractRanges
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should, otherwise it would be too confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just throw an error if there is no n
th element. There could also be a default
argument as in get
, where a user can pass a value that should be returned if no n
th element exists.
I don't really follow the logic that the spirit of iterators is to return nothing
in such cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree nothing
is weird, your iterator can produce that. Some
seems a bit technical & unfriendly? An error seems fine. Matches what first([])
does.
I suppose it can't literally be a method of get
since it goes by enumeration not keys:
julia> first(Dict('a':'z' .=> 'A':'Z'), 3)
3-element Vector{Pair{Char, Char}}:
'n' => 'N'
'f' => 'F'
'w' => 'W'
julia> nth(Dict('a':'z' .=> 'A':'Z'), 3)
'w' => 'W'
base/iterators.jl
Outdated
``` | ||
""" | ||
nth(itr, n::Integer) = _nth(IteratorSize(itr), itr, n) | ||
nth(itr::AbstractArray, n::Integer) = n > length(itr) ? nothing : itr[n] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes one-based indexing. Perhaps do itr[begin + n - 1]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are absolutely correct.
would something like getindex(itr, nth(eachindex(IndexLinear(), itr), n))
be too overkill?
and adding a specialization with nth(itr::AbstractRange, n::Integer) = getindex(itr, n)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with the probably overkill approach, if it's too much i'll revert back to your suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AbstractRange
s are not always one-based either, so that approach runs into the same issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I could gather that is included in the getindex
already, since it ends up calling
unsafe_getindex(v::AbstractRange{T}, i::Integer) where T = convert(T, first(v) + (i - oneunit(i))*step_hp(v))
which should pretty much be the same sa [begin + n -1]
unless I'm missing the point completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line nth(itr::AbstractRange, n) = getindex(itr, n)
will for sure fail on the axes of an OffsetArray. (In fact, it will first be ambiguous, as n::Any
is less specific.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was overthinking it, I'll just stick with [begin + n - 1]
. Sorry.
how would this compare to a more naive implementation like
? |
|
add docs explaining interaction with Stateful iterators change test to be Any vectors instead of tuples (actually way faster as well)
Seems like a lot of code. I reproduced the above benchmark here: No strong position on whether this needs a name or not, but perhaps this first PR can focus on that, and let the implementation be just: nth(itr, n::Integer) = first(Iterators.drop(itr, n-1))
nth(itr::AbstractArray, n::Integer) = itr[begin-1+n] |
A lot of the code is for optimizing out of bound checking. If we go with davidantoff suggestion of letting |
I disagree with throwing an error. In cases where you don't know if an nth element exists, that forces a try-catch which is both slow and brittle. I would imagine that most ordered iterators with a known length support indexing, so this would probably mostly be used precisely when the length is unknown. |
I think another consideration here is consistency: the other functions we have that take an individual element from an iterator are I agree with @jakobnissen that in some situations being able to handle this without an exception would be nice, but on the flip side, I can also see scenarios where an error seems much better, in particular in interactive sessions where I might be playing around with some data and this function could be very useful. And especially in an interactive scenario it would be super inconvenient if Maybe the best design would be to allow for both scenarios. Say something like nth(itr, n, nothrow=false) So the default would be that an exception is thrown if the |
We could also opt for relying on the
Although I see the similarity with
the error in lastindex(a::AbstractArray) = (@inline; last(eachindex(IndexLinear(), a))) # equals to last(OneTo(0)) Similarly, both From this my idea that in principle iterators are non throwing by default, any throwing should be done one level higher and not at the iterator level itself (like how |
I have to admit, I think that is the option I like least of all of the proposed options so far :) It would make it very tricky to write generic code that uses the
To me
Agreed, but the whole difference between I still think that my proposal with an argument like |
Is there any precedent for a We could also follow |
I think it's already hard to write generic code that covers both generic collections and
Not really, I don't have particularly hard opinions about it. In the original issue I had proposed something similar with
My proposal for |
throwing together some "PR litterature review" for cross reference since I think this PR can depend/interact on/with these: |
Having thought about it, I do have some sympathy for the argument of @davidanthoff that it should behave like I do see myself wanting to use it in code like: fourth_field = @something nth(eachsplit(line, '\t'), 4) throw(FormatError(
lazy"Line $lineno does not contain four tab-separated fields fields"
)) Which would now instead be fourth_field = first(@something iterate(drop(eachsplit(line, '\t'), 3)) throw(FormatError(
lazy"Line $lineno does not contain four tab-separated fields fields"
))) That's certainly doable (especially since, for iterators of unknown length, most of the clever tricks that |
What is the semantic difference between this function and |
matching |
Yes, agreed! Having two distinct functions probably also helps with type stability. Another naming scheme I thought about is |
Julia has a bunch of patterns for handling this already, so one has some freedom to choose "consistent with what?" :) |
I see 4 ways of handling errors in Julia:
Personally I'd be happy with Base having both |
The 5th option is Union{T,S} where you supply S -- like
It takes the iteration count, not the index. (Same on Vector, different on Dict, or OffsetArray.)
That's one's not as bad, as it's either an index or |
I see, thanks.
The problem with "just assume users do what you expect" is that (1) nobody ever documents what they expect and (2) even if they did it's still error prone. No library function using
|
Hi,
I've turned the open ended issue #54454 into an actual PR.
Tangentially related to #10092 ?
This PR introduces the
nth(itr, n)
function to iterators to give agetindex
type of behaviour.I've tried my best to optimize as much as possible by specializing on different types of iterators.
In the spirit of iterators any OOB access returns
nothing
. (edit: instead of throwing an error, i.e.first(itr, n)
andlast(itr, n)
)here is the comparison of running the testsuite (~22 different iterators) using generic
nth
and specializednth
: