Async Population(multiple workers) of DataCollection #361

dadams39 · 2022-07-19T03:12:35Z

Removes the cap on items in the DataCollection. This will remove the deadlocking condition for now

Added sort and sort test for DataCollections

ashmrtn · 2022-07-19T17:28:27Z

wait, I'm confused, didn't we agree this would make things more likely to deadlock if there's lots of folders?

dadams39 · 2022-07-19T18:41:25Z

@ashmrtn There are 2 components and as long as one is making progress, we will not deadlock. On the left, the current implementation's GC will block on the first directory and not move. KW's threads are arbitrarily assigned and not on the blocked directory and the instance will time out after ten minutes. On the right, the GC will move to the next directory after it serializes all the messages in the first directory. I am in no way saying that this will stop all the failures for Issue #356. The current implementation can be extremely slow if KW is idle waiting on the smallest directories (see Issue #362 ) It will stop the deadlock though.

But, of course, you are correct because that is not how Golang works. We don't want to worry about the size of the channel; however, we do need to place an arbitrary number in there that we should reasonably not exceed in testing. So, we are changing the title to increase the limit and working from there.

ashmrtn · 2022-07-19T19:04:22Z

ok, that makes more sense. I was confused because the patch was removing the channel buffer which meant that it would block the first time it was pulling data for a folder outside the set of folders kopia was currently uploading

Fixes Issue #364, Fixes deadlock from #360

src/internal/connector/graph_connector.go

dadams39 · 2022-07-20T00:22:29Z

@ryanfkeepers, added a change to be able to call a function MergeStatus(num int) which allows the status for all things in the pipe to be waited on and have all the status compiled into one.

src/internal/connector/exchange_data_collection.go

src/internal/connector/graph_connector.go

src/internal/connector/support/status.go

src/internal/connector/graph_connector.go

ashmrtn · 2022-07-20T16:46:28Z

src/internal/connector/graph_connector.go


-	for aFolder := range tasklist {
+	// async call to populate
+	collections, err := gc.launchProcesses(ctx, tasklist, user)


as launchProcesses may do synchronous work if it reaches the limit on workers, wouldn't this cause it to block buffering data until launchProcesses returns?

With a large BFS the processes are used up and the main thread would take the next folder. Ideally, the main would return and more processes would have finished in the process. I think I will add a sleep function instead for the main because it would be unfortunate if the main took a folder and was held up by the number of items for the channel.

In regard to the sorting question @ashmrtn , you are 110% correct. I left the sorting in for the requirement involving fault tolerance. The key point of the proposal that I saw was that a lot of these mechanisms are going to be thrown away later. However, in the interim, it is an interesting look at some of the capabilities and workflows we can create with Go

re: launchProcesses, I think eventually we can solve the blocking requirement and streamline the code by doing a couple of things:

decouple getting the data collections to return from getting the data in said collections

if we don't switch to a solution where KW/the DataCollection launch getting items, we may want to use a worker pool to fetch the items

1/ is already mostly implemented, thanks to how the functions are broken down. The major change will be what is actually executed in the background. For example,

collections := serializeMessages(...) go func() { // loop through task list and either call populateItems(...) or // create another goroutine to run populateItems(...) } return collections

will get the collections synchronously, return them, and then fetch all the data in the background. The code that is currently in the anonymous function could be pulled inot a separate function if desired (the above is just a quick example). Dispatch of folder -> goroutine with a bound on the number of goroutines running at any one time can then be done in launchProcesses

For 2/, the worker pool will ensure good parallelism even if we have situations where the workers got folders with very few items while the dispatching goroutine ended up fetching a folder with many items (thus keeping it from dispatching more work)

Added a sleep command for the interim

Added sleep call, context -> ctx, and removal of debug statements throught fmt.

vkamra

Adding a block - let's not merge till we agree on final design.

dadams39 · 2022-07-21T20:34:12Z

This PR is to be closed in favor of implementing a PR that fixes the deadlock circumstance.

Issue #360: Remove limit on the data collection.

6d0df48

dadams39 linked an issue Jul 19, 2022 that may be closed by this pull request

Refactor GC populator to address deadlock with Kopia uploader #360

Closed

3 tasks

dadams39 requested a review from ashmrtn July 19, 2022 03:12

dadams39 had a problem deploying to Testing July 19, 2022 03:12 Failure

dadams39 requested review from ryanfkeepers and removed request for ashmrtn July 19, 2022 03:12

dadams39 had a problem deploying to Testing July 19, 2022 03:12 Failure

dadams39 requested review from ashmrtn and vkamra July 19, 2022 03:12

dadams39 added the bug Something isn't working label Jul 19, 2022

dadams39 self-assigned this Jul 19, 2022

Removal of unused variable.

ba8914a

dadams39 temporarily deployed to Testing July 19, 2022 03:29 Inactive

dadams39 had a problem deploying to Testing July 19, 2022 03:29 Failure

Formatted.

1a7af75

dadams39 temporarily deployed to Testing July 19, 2022 03:34 Inactive

dadams39 had a problem deploying to Testing July 19, 2022 03:34 Failure

dadams39 had a problem deploying to Testing July 19, 2022 12:07 Failure

dadams39 added 2 commits July 19, 2022 10:21

Issue #362: Sort optimization

7dc7505

Added sort and sort test for DataCollections

Pushed final test requirements.

8b553f1

dadams39 marked this pull request as draft July 19, 2022 14:51

ryanfkeepers approved these changes Jul 19, 2022

View reviewed changes

dadams39 changed the title ~~Issue #360: Remove limit on the data collection.~~ Issue #360: Increase limit on the data collection. Jul 19, 2022

Issue #360: Fixes and changes

4171fe8

Fixes Issue #364, Fixes deadlock from #360

ashmrtn reviewed Jul 19, 2022

View reviewed changes

src/internal/connector/graph_connector.go Show resolved Hide resolved

Merge branch 'main' into gc-remove-limit

0d44e8d

dadams39 temporarily deployed to Testing July 20, 2022 00:19 Inactive

dadams39 marked this pull request as ready for review July 20, 2022 00:20

dadams39 requested review from ryanfkeepers and ashmrtn July 20, 2022 00:20

Merge branch 'main' into gc-remove-limit

e5c4515

dadams39 temporarily deployed to Testing July 20, 2022 00:23 Inactive

Merge branch 'main' into gc-remove-limit

c1932ff

dadams39 temporarily deployed to Testing July 20, 2022 13:01 Inactive

ryanfkeepers reviewed Jul 20, 2022

View reviewed changes

ashmrtn reviewed Jul 20, 2022

View reviewed changes

dadams39 changed the title ~~Issue #360: Increase limit on the data collection.~~ Async Population(multiple workers) of DataCollection Jul 20, 2022

dadams39 added 2 commits July 20, 2022 14:39

Removal of unnecessary work and changed test suite to support.

c19fdc6

Updates made for PR merge:

c13127e

Added sleep call, context -> ctx, and removal of debug statements throught fmt.

dadams39 temporarily deployed to Testing July 20, 2022 18:40 Inactive

dadams39 added 3 commits July 20, 2022 14:43

Removal of last print statement

64021ac

Added checks for nil pointers.

96f31e2

Merge branch 'main' into gc-remove-limit

885887e

dadams39 temporarily deployed to Testing July 20, 2022 18:58 Inactive

vkamra suggested changes Jul 20, 2022

View reviewed changes

dadams39 marked this pull request as draft July 20, 2022 20:47

dadams39 closed this Jul 21, 2022

dadams39 mentioned this pull request Aug 5, 2022

Integration CI Testing Failing on Multiple Branches #356

Closed

1 task

dadams39 deleted the gc-remove-limit branch August 19, 2022 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async Population(multiple workers) of DataCollection #361

Async Population(multiple workers) of DataCollection #361

dadams39 commented Jul 19, 2022

ashmrtn commented Jul 19, 2022

dadams39 commented Jul 19, 2022 •

edited

Loading

ashmrtn commented Jul 19, 2022

dadams39 commented Jul 20, 2022

ashmrtn Jul 20, 2022

dadams39 Jul 20, 2022

dadams39 Jul 20, 2022

ashmrtn Jul 20, 2022

dadams39 Jul 20, 2022

vkamra left a comment

dadams39 commented Jul 21, 2022

Async Population(multiple workers) of DataCollection #361

Async Population(multiple workers) of DataCollection #361

Conversation

dadams39 commented Jul 19, 2022

ashmrtn commented Jul 19, 2022

dadams39 commented Jul 19, 2022 • edited Loading

ashmrtn commented Jul 19, 2022

dadams39 commented Jul 20, 2022

ashmrtn Jul 20, 2022

Choose a reason for hiding this comment

dadams39 Jul 20, 2022

Choose a reason for hiding this comment

dadams39 Jul 20, 2022

Choose a reason for hiding this comment

ashmrtn Jul 20, 2022

Choose a reason for hiding this comment

dadams39 Jul 20, 2022

Choose a reason for hiding this comment

vkamra left a comment

Choose a reason for hiding this comment

dadams39 commented Jul 21, 2022

dadams39 commented Jul 19, 2022 •

edited

Loading