Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Execute exchange tasks as new coroutines. #1962

Merged
merged 1 commit into from
Dec 16, 2024

Conversation

robinsons
Copy link
Contributor

@robinsons robinsons commented Dec 10, 2024

This fix changes how exchange tasks are executed in order to allow multiple tasks to run concurrently.

In the past, exchange tasks were launched as child coroutines of a specified CoroutineScope. At some point this logic was updated so that ExchangeTaskExecutor.execute() would create its own scope and launch jobs to execute tasks like so:

// ExchangeTaskExecutor.kt, shorted to emphasize the relevant parts
suspend fun execute(exchangeStep: ApiClient.ClaimedExchangeStep) = coroutineScope {
  launch {
    // executes exchangeStep in a try/catch block
  }
}

Since the outer coroutineScope { ... } block doesn't return until all of its child coroutines have completed, this effectively made it so that at most 1 exchange task could be executed at a time.

With this fix, we instead launch child coroutines from the site which calls ExchangeTaskExecutor.execute(), which once again allows us to run multiple exchange tasks concurrently.

@wfa-reviewable
Copy link

This change is Reviewable

@robinsons robinsons force-pushed the robinsons-panelmatch-task-launcher branch 2 times, most recently from 04d5f39 to e03ff9c Compare December 12, 2024 17:10
@robinsons robinsons marked this pull request as ready for review December 12, 2024 17:37
Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: 0 of 7 files reviewed, 3 unresolved discussions (waiting on @robinsons and @stevenwarejones)


src/main/kotlin/org/wfanet/panelmatch/client/deploy/ExchangeWorkflowDaemon.kt line 125 at r1 (raw file):

   */
  protected open suspend fun runDaemon() = coroutineScope {
    throttler.loopOnReady {

I give a more detailed response in the other comment, but using launch from an outside coroutine scope inside a throttler ready block doesn't make sense.


src/main/kotlin/org/wfanet/panelmatch/client/deploy/ExchangeWorkflowDaemon.kt line 152 at r1 (raw file):

  protected open suspend fun runCronJob() = coroutineScope {
    do {
      throttler.onReady {

I'm confused at the use of a Throttler combined with launch. In addition to providing a timing guarantee, a Throttler is used to ensure that only one execution happens at a time. Do you want this to be throttled, or do you want to execute steps in parallel?

If you just want to throttle step claiming, do just the claim inside of onReady, which can return the step. You can then launch the execution in the outer coroutine scope if you don't want the next claim to wait for execution to occur.


src/main/kotlin/org/wfanet/panelmatch/client/deploy/ExchangeWorkflowDaemon.kt line 171 at r1 (raw file):

        }
      }
    } while (coroutineContext.isActive && coroutineContext.job.children.any { it.isActive })

You shouldn't need this part. If the parent coroutine is cancelled, all of its children will be cancelled too.

Code quote:

coroutineContext.job.children.any { it.isActive }

@robinsons robinsons force-pushed the robinsons-panelmatch-task-launcher branch 2 times, most recently from 4b0fdfd to db554ca Compare December 12, 2024 19:03
Copy link
Contributor Author

@robinsons robinsons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 3 unresolved discussions (waiting on @SanjayVas and @stevenwarejones)


src/main/kotlin/org/wfanet/panelmatch/client/deploy/ExchangeWorkflowDaemon.kt line 125 at r1 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

I give a more detailed response in the other comment, but using launch from an outside coroutine scope inside a throttler ready block doesn't make sense.

Done.


src/main/kotlin/org/wfanet/panelmatch/client/deploy/ExchangeWorkflowDaemon.kt line 152 at r1 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

I'm confused at the use of a Throttler combined with launch. In addition to providing a timing guarantee, a Throttler is used to ensure that only one execution happens at a time. Do you want this to be throttled, or do you want to execute steps in parallel?

If you just want to throttle step claiming, do just the claim inside of onReady, which can return the step. You can then launch the execution in the outer coroutine scope if you don't want the next claim to wait for execution to occur.

Good point. The goal is to throttle task claiming, but allow claimed tasks to execute in parallel.


src/main/kotlin/org/wfanet/panelmatch/client/deploy/ExchangeWorkflowDaemon.kt line 171 at r1 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

You shouldn't need this part. If the parent coroutine is cancelled, all of its children will be cancelled too.

The intent of checking if any children are active was to give the cron job a stop condition. If there are no child jobs active, then any available tasks must have been claimed and run to completion, and there are no further tasks available at this time. In that case, the job can stop until it gets scheduled again.

I tweaked this to try to make that intent more clear, but I'm open to suggestions if there's a better way of going about this.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 7 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @robinsons and @stevenwarejones)


src/main/kotlin/org/wfanet/panelmatch/client/deploy/ExchangeWorkflowDaemon.kt line 171 at r1 (raw file):

Previously, robinsons wrote…

The intent of checking if any children are active was to give the cron job a stop condition. If there are no child jobs active, then any available tasks must have been claimed and run to completion, and there are no further tasks available at this time. In that case, the job can stop until it gets scheduled again.

I tweaked this to try to make that intent more clear, but I'm open to suggestions if there's a better way of going about this.

Okay, so you want to run until you're out of exchange steps to claim. I think what you want is to exit the loop once claimExchangeStep() returns null. The coroutineScope will still ensure that all children complete before returning.

Copy link
Contributor Author

@robinsons robinsons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @SanjayVas and @stevenwarejones)


src/main/kotlin/org/wfanet/panelmatch/client/deploy/ExchangeWorkflowDaemon.kt line 171 at r1 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Okay, so you want to run until you're out of exchange steps to claim. I think what you want is to exit the loop once claimExchangeStep() returns null. The coroutineScope will still ensure that all children complete before returning.

As long as the job is running (executing any step), I'd prefer to keep checking for new steps even if a previous call to claimExchangeStep() returned null.

The reason is that some steps take a long time to run, for example a few hours. If a new step becomes available during that time, we'd might as well execute it rather than wait for the next instance of the cron job.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @robinsons and @stevenwarejones)


src/main/kotlin/org/wfanet/panelmatch/client/deploy/ExchangeWorkflowDaemon.kt line 171 at r1 (raw file):

Previously, robinsons wrote…

As long as the job is running (executing any step), I'd prefer to keep checking for new steps even if a previous call to claimExchangeStep() returned null.

The reason is that some steps take a long time to run, for example a few hours. If a new step becomes available during that time, we'd might as well execute it rather than wait for the next instance of the cron job.

Gotcha.

nit: Maybe update the method KDoc to explain that even after the steps are exhausted, this will continue claiming steps while any step is still being executed? There's probably a better way to phrase that, it's just that saying only "until all available steps are exhausted" isn't technically true.

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test that fails without this fix?

Reviewed 3 of 7 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @robinsons)

@robinsons robinsons force-pushed the robinsons-panelmatch-task-launcher branch from db554ca to e2f891a Compare December 13, 2024 21:29
@robinsons robinsons force-pushed the robinsons-panelmatch-task-launcher branch from e2f891a to a0692c0 Compare December 13, 2024 21:36
Copy link
Contributor Author

@robinsons robinsons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Reviewable status: 6 of 9 files reviewed, all discussions resolved (waiting on @SanjayVas and @stevenwarejones)

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 7 files at r1, 3 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @SanjayVas)

@robinsons robinsons merged commit f72f537 into main Dec 16, 2024
4 checks passed
@robinsons robinsons deleted the robinsons-panelmatch-task-launcher branch December 16, 2024 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants