-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load DirectoryTree
contents in a worker
#2545
Conversation
As I work on what's to come (loading DirectoryTree with a worker), I'm going to want to try and construct slow loads so I can test the effectiveness of the changes. This means a desire to fake a very slow source of directory information. So let's drop this into its own method so we can then do silly things like add a sleep to really show stuff down.
This isn't the final form, not even close, this is more to help test out the idea and how well it will work. Note the very deliberate sleep in the code that's there to emulate loading from a slow blocking source. This will be removed and tidied up before a final PR, of course. The main aim here is to emulate a worst-case scenario so that the use of a worker can be tried out with some confidence. See Textualize#2456.
Move the node population code into its own method, the idea here being that the update happens in one call to call_from_thread rather than spawning lots of calls to it.
Having got the initial version of background loading of nodes in the directory tree working, this moves to a gentler approach where only so many loads run at once, and a queue of jobs that need to be completed is kept. This is an end-of-coding-session WiP commit; there's more to come on this. But at the moment I'm happy with the way it's going.
So far this is working fine, but there was an issue where, if the load of a very large directory was started, and then the application was cancelled right away, the application would close down but there would be a long pause until the shell prompt came back, the cause presumably being that we were waiting for that particular thread to end. So here I make sure I check the cancelled state of the worker. This would also suggest that, while I turned the use of iterdir into a loop so I could throw the sleep in to emulate a slow directory load, I *actually* want to do this in a loop so I can test the cancelled state as we stream in the directory content.
Setting the path to anything other than "." is going to result in a reset happening, so we need the tracking support in place first.
One instance of a call to _load_directory that I missed.
This marks all current jobs as cancelled and also removes all pending jobs.
Explain some about the decisions made, and also throw in a bit of over-cautious worker cancellation checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me and I couldn't find any issues... But I am not familiar with the @work
decorator!
Some thread-safety issues here. The max concurrency is, in effect, a thread pool. But implemented not quite so optimally. Suggest we don't try to implement the max concurrency right now, and plan an extension to the worker system. Chat about it on Monday. |
The main reason I did this is that, in conversation last week, we spoke about adding a version of this to Would be good to hear the thread-safety issues you're seeing here; I tried as much as possible to follow all the advice in the worker API docs so if I've missed something or misunderstood something it'd be great to get that cleared up. |
Recent changes meant it wasn't needed any more.
Turns out, there's a maximum number of threads you can have going in the underlying pool, that's tied to the number of CPUs. As such, there was a limit on how many directory trees you could have up and running before it would start to block all sorts of operations in the surrounding application (in Parallels on macOS, with the Windows VM appearing to have just the one CPU, it would give up after 8 directory trees). So here we move to a slightly different approach: have the main loader still run "forever", but be an async task; it then in turn farms the loading out to threads which close once the loading is done. So far tested on macOS and behaves as expected. Next to test on Windows.
The code that was using this was removed earlier.
Plan C; or is it plan D? Something like that. Anyway... in this approach we keep a single "forever" async task worker per directory tree, which in turn looks at the async Queue, and when a new node appears on it it starts a short-lived thread to load the directory data. This seems to be working fine on macOS. Next up is testing on Windows.
Normally it's not a great idea to eat and hide exceptions within library code; but I think it makes sense to make an exception here. This is a UI element that lets the user navigate about a filesystem. If there is something they don't have permission for, that should not cause an exception, it should just give up with the best possible outcome. If actually doing something with the exception is important, the developer using this could use the filter to do tests and act accordingly. See Textualize#2564.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just nitpicks.
In doing so fix an issue where, after the previous change, I wasn't marking the root of the tree as loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, to make sure I follow, this iteration just gets a single queue of nodes to be loaded instead of pooling the load jobs?
worker: The worker that the loading is taking place in. | ||
|
||
Yields: | ||
Path: A entry within the location. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path: A entry within the location. | |
Path: An entry within the location. |
Ish, kinda, sorta. So there's an async Queue into which nodes that need loading get dropped. There's there an exclusive async worker that loops around, waiting for stuff to show up in the queue. When something does turn up in the queue a thread worker is created and waited upon that does the work of loading the data. |
This PR updates
DirectoryTree
so that the contents of any given directory associated with a node in the tree are loaded in a worker. In most situations this should appear to make no difference to the user. However, in environments where loading filesystem data is slow, or even when attempting to load a directory with a lot of content, this should keep the rest of the application responsive.The worst case envisaged here is that someone is viewing a very large directory, which in turn contains lots of directories, which in turn contain lots of entries, and then an
expand_all
was performed1; this PR is designed such that every request to load is placed in a queue, with the queue being processed by a worker thread that services the queue.Note that this PR has been through at least 3 iterations. It's been a helpful shakedown of the
@work
decorator and we've managed to learn a couple of things that we should be aware of and warn people about (going to add a card to the backlog about making a note of these things in the docs).So, anyway, here goes background-loading directory tree plan C ¯\(ツ)/¯
PS: Also contains code to address #2564. Not strictly related to this but while I was working on loading code it seemed to make sense to roll it in here.
Footnotes
Note that due to the nature of
DirectoryTree
's lazy loading, anexpand_all
won't expand every single level in the filesystem below that node, because the underlyingTree
code can't see non-leaf nodes that haven't been loaded yet. Technically a bug, I guess, but in this case likely best considered a feature because performing anexpand_all
on/
in some filesystem is going to be painful in a few ways. ↩