-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reverse iteration for eachline #42225
Conversation
I don't think non-seekable IO can satisfy the Lines 456 to 460 in 4805d54
A similar discussion on |
It would be easy to restrict this to seekable Update: I've changed the PR to drop non-seekable support, since that seemed controversial. (I saved the non-seekable branch at https://github.com/stevengj/julia/tree/sgj/reverse-eachline-nonseekable.) |
Unrelated CI error on freebsd64:
|
52f8cb4
to
b72fdf3
Compare
b72fdf3
to
e26b3d2
Compare
Apparently unrelated error on freebsd64 (
This PR should be ready for review? |
e26b3d2
to
905ba00
Compare
Updated to handle CRLF line endings, with a test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
(CI failures on freebsd64 and win32 are unrelated: |
Looks like JuliaLang/LinearAlgebra.jl#873? |
c812e9e
to
c93259e
Compare
Rebased. Hopefully can be merged if green? |
This PR implements
Iterators.Reverse
iteration for theEachLine
iterator, so that now e.g.last(eachline(io))
will work. (This seems to have been requested multiple times on discourse etcetera, and is nontrivial for users implement efficiently themselves. It also seems to be a pretty common question in Python, where the most common answers involve reading the whole file to get the last few lines., or executing an externaltail
program.)For seekable streams, it works by reading the data from the end in 4kiB chunks. Performance seems comparable to forwards iteration.
For non-seekable streams,
it falls back to reading the whole stream into memory. However, forit throws an exception whenlast(eachline(...))
andtake(Reverse(eachline(...)), n)
it uses an algorithm (similar to how the Unixtail
function works) that requires only bounded memory — it reads the file in a sequence of 4kiB chunks, but discards earlier chunks if they are not needed for the requested number of lines.seekend
orseek
is called.I noticed that
first(r::Reverse)
was callinglast(r.itr)
, which fails for non-indexable iterators becauselast(itr)
defaults toitr[end]
, so I changed it to use the default implementation (in abstractarrays.jl) which iterates once. Otherwise, you can't implementlast(itr::MyType)
byfirst(Iterators.Reverse(itr))
. This is analogous to howlast(itr, n)
is already implemented. (It might be good to change the defaultlast(itr)
method, but that's a topic for another PR.)