Pending job batch queue for compaction job creation #3678

patchwork01 · 2024-11-12T14:17:28Z

Background

Split from:

Improve compaction job throughput #3643

Description

We'd like to parallelise compaction job creation by adding a separate step to actually send the jobs. The compaction job creation lambda can just decide what compaction jobs should be run, create large batches of them and send them to a pending job batch queue. A separate lambda can send the jobs, with multiple instances handling batches in parallel.

This should avoid requiring the compaction job creation lambda to make a large number of API calls to both create and send all the compaction jobs.

Analysis

The code in CreateCompactionJobs.batchCreateJobs is what we want to move to another lambda, as it takes some time to execute for all the batches. For each batch, we can write the jobs to S3, then put a message on a pending jobs queue pointing to the batch. This would allow multiple lambda instances to receive those batches and create the jobs. This should allow us to make the batch size much bigger (sleeper.table.compaction.job.send.batch.size), and to create a much larger number of compaction jobs in a single invocation (sleeper.compaction.job.creation.limit).

For each batch, as we add it to the pending jobs queue we can also submit a file assignment commit for the whole batch to the state store committer. The lambda that receives the batch can check if the file assignment has been applied, and if not it can put it back on the queue with a delay. A batch that has been retried enough times without file assignment can go to the dead letter queue.

We can avoid status store updates in the compaction job creation lambda, and do it in the pending jobs lambda instead. We could keep the created status and apply that if we're putting a batch back on the queue. We could record in the message that we've already applied that status when we put it back on the queue.

The text was updated successfully, but these errors were encountered:

patchwork01 · 2024-12-05T11:29:18Z

All child issues are done. Also considering closing the following on-hold issue:

Handle partially assigned batches in compaction dispatcher #3719

Closing.

patchwork01 added the parent-issue An issue that is or should be split into multiple sub-issues label Nov 12, 2024

patchwork01 added this to the 0.27.0 milestone Nov 12, 2024

patchwork01 mentioned this issue Nov 12, 2024

Improve compaction job throughput #3643

Open

patchwork01 added the enhancement New feature or request label Nov 13, 2024

This was referenced Nov 26, 2024

System test creating many compaction jobs at once #3799

Closed

Move compaction creation status update to dispatch lambda #3819

Closed

Handle partial batch failures in compaction job dispatcher #3840

Closed

patchwork01 closed this as completed Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pending job batch queue for compaction job creation #3678

Pending job batch queue for compaction job creation #3678

patchwork01 commented Nov 12, 2024 •

edited

Loading

patchwork01 commented Dec 5, 2024 •

edited

Loading

Pending job batch queue for compaction job creation #3678

Pending job batch queue for compaction job creation #3678

Comments

patchwork01 commented Nov 12, 2024 • edited Loading

Background

Description

Analysis

patchwork01 commented Dec 5, 2024 • edited Loading

patchwork01 commented Nov 12, 2024 •

edited

Loading

patchwork01 commented Dec 5, 2024 •

edited

Loading