-
Notifications
You must be signed in to change notification settings - Fork 45
Async Population(multiple workers) of DataCollection #361
Conversation
Added sort and sort test for DataCollections
wait, I'm confused, didn't we agree this would make things more likely to deadlock if there's lots of folders? |
@ashmrtn There are 2 components and as long as one is making progress, we will not deadlock. On the left, the current implementation's GC will block on the first directory and not move. KW's threads are arbitrarily assigned and not on the blocked directory and the instance will time out after ten minutes. On the right, the GC will move to the next directory after it serializes all the messages in the first directory. I am in no way saying that this will stop all the failures for Issue #356. The current implementation can be extremely slow if KW is idle waiting on the smallest directories (see Issue #362 ) It will stop the deadlock though. But, of course, you are correct because that is not how Golang works. We don't want to worry about the size of the channel; however, we do need to place an arbitrary number in there that we should reasonably not exceed in testing. So, we are changing the title to increase the limit and working from there. |
ok, that makes more sense. I was confused because the patch was removing the channel buffer which meant that it would block the first time it was pulling data for a folder outside the set of folders kopia was currently uploading |
@ryanfkeepers, added a change to be able to call a function |
|
||
for aFolder := range tasklist { | ||
// async call to populate | ||
collections, err := gc.launchProcesses(ctx, tasklist, user) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as launchProcesses
may do synchronous work if it reaches the limit on workers, wouldn't this cause it to block buffering data until launchProcesses
returns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a large BFS the processes are used up and the main thread would take the next folder. Ideally, the main would return and more processes would have finished in the process. I think I will add a sleep function instead for the main because it would be unfortunate if the main took a folder and was held up by the number of items for the channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In regard to the sorting question @ashmrtn , you are 110% correct. I left the sorting in for the requirement involving fault tolerance. The key point of the proposal that I saw was that a lot of these mechanisms are going to be thrown away later. However, in the interim, it is an interesting look at some of the capabilities and workflows we can create with Go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: launchProcesses
, I think eventually we can solve the blocking requirement and streamline the code by doing a couple of things:
- decouple getting the data collections to return from getting the data in said collections
- if we don't switch to a solution where KW/the
DataCollection
launch getting items, we may want to use a worker pool to fetch the items
1/ is already mostly implemented, thanks to how the functions are broken down. The major change will be what is actually executed in the background. For example,
collections := serializeMessages(...)
go func() {
// loop through task list and either call populateItems(...) or
// create another goroutine to run populateItems(...)
}
return collections
will get the collections synchronously, return them, and then fetch all the data in the background. The code that is currently in the anonymous function could be pulled inot a separate function if desired (the above is just a quick example). Dispatch of folder -> goroutine with a bound on the number of goroutines running at any one time can then be done in launchProcesses
For 2/, the worker pool will ensure good parallelism even if we have situations where the workers got folders with very few items while the dispatching goroutine ended up fetching a folder with many items (thus keeping it from dispatching more work)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a sleep command for the interim
Added sleep call, context -> ctx, and removal of debug statements throught fmt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a block - let's not merge till we agree on final design.
This PR is to be closed in favor of implementing a PR that fixes the deadlock circumstance. |
Removes the cap on items in the DataCollection. This will remove the deadlocking condition for now