-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Change 'readdir' to return an iterator. #27450
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from iterator name bikeshedding, looks good to me.
base/file.jl
Outdated
@@ -572,14 +572,41 @@ struct uv_dirent_t | |||
typ::Cint | |||
end | |||
|
|||
struct ReadDirIter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unnecessarily abbreviated. How about ReadDirIterator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
It would be simpler (and faster and more memory efficient) to implement this as: lazy_readdir(name) = (path for path in readdir(name)) Just my 2¢ |
Wait, how is that lazy? It would still be loading all the paths into memory
at once, wouldnt it?
…On Wed, Jun 6, 2018 at 10:28 AM Jameson Nash ***@***.***> wrote:
It would be simpler (and faster and more memory efficient) to implement
this as:
lazy_readdir(name) = (path for path in readdir(name))
Just my 2¢
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#27450 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA8SvaFfxgBbBvhBo73ZgMLKWw3GXGspks5t5-cWgaJpZM4Ub10q>
.
|
They are already all loaded into memory, this PR would just make accessing them slower. Support for the lazy version was proposed a few years ago, so it might land soon(ish) in libuv master: libuv/libuv#416 |
I guess the question for us at this point is: do we want to expose an iterator interface to |
I agree with that. Ill also run a quick benchmark just in case there really
is a non trivial performance hit.
…On Wed, Jun 6, 2018 at 10:41 AM Stefan Karpinski ***@***.***> wrote:
I guess the question for us at this point is: do we want to expose an
iterator interface to readdir even though the current libuv
implementation is always eager in anticipation of libuv soon having a
truly lazy implementation for which an iterator interface can be
advantageous. I would say yes. The choice of implementation doesn't matter
much and @malmaud <https://github.com/malmaud>'s one here has the
advantage of possibly not needing to change when libuv changes theirs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#27450 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA8Sva49xA3SNBWMRokfDYdOzG88T3Rkks5t5-oggaJpZM4Ub10q>
.
|
It's hitting the file system. Speed doesn't matter. |
I don't see why an iterator is better. Arrays support more operations, e.g. |
The fact that it's an issue for libuv would seem to indicate that there are cases where it matters. Otherwise they would presumably not be bothering to change this. |
Yes, I agree that the fact that most other libraries already do this lazily or have open issues from people running into trouble from it not being lazy should tell us something. I think there's no practical limit in Linux anymore on how many files can be in a directory. Thus someone, somewhere, will run into performance issues that we can never remedy once we've committed to an API where We could potentially just add another function like Jameson's |
Ok, I'm convinced, we can make it an iterator. |
Big mistake. |
Note that to address the "double file system call" issue in nodejs/node#15699 we would need to return some kind of |
OTOH, I think it's just good practice to have streaming APIs to things that can be done incrementally. That will guide people naturally towards writing more scalable code for this sort of thing. |
And fail fast if path name is invalid.
Bump, it's now or never. This is unlikely to break too terribly many usages since the most common usage is simply to iterate through the output of |
It's still not clear to me how often you would need to use this function on directories with more than tens of millions of entries, such that you would want to slightly increase the cost and complexity of using |
If that's the call, ok. If it becomes an issue in the future, we could have |
Alright, shall we just do Stefan's suggestion then (presumably a type-stable variant) at some point then? |
cf #27393