Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pending job batch queue for compaction job creation #3678

Closed
patchwork01 opened this issue Nov 12, 2024 · 1 comment
Closed

Pending job batch queue for compaction job creation #3678

patchwork01 opened this issue Nov 12, 2024 · 1 comment
Labels
enhancement New feature or request parent-issue An issue that is or should be split into multiple sub-issues
Milestone

Comments

@patchwork01
Copy link
Collaborator

patchwork01 commented Nov 12, 2024

Background

Split from:

Description

We'd like to parallelise compaction job creation by adding a separate step to actually send the jobs. The compaction job creation lambda can just decide what compaction jobs should be run, create large batches of them and send them to a pending job batch queue. A separate lambda can send the jobs, with multiple instances handling batches in parallel.

This should avoid requiring the compaction job creation lambda to make a large number of API calls to both create and send all the compaction jobs.

Analysis

The code in CreateCompactionJobs.batchCreateJobs is what we want to move to another lambda, as it takes some time to execute for all the batches. For each batch, we can write the jobs to S3, then put a message on a pending jobs queue pointing to the batch. This would allow multiple lambda instances to receive those batches and create the jobs. This should allow us to make the batch size much bigger (sleeper.table.compaction.job.send.batch.size), and to create a much larger number of compaction jobs in a single invocation (sleeper.compaction.job.creation.limit).

For each batch, as we add it to the pending jobs queue we can also submit a file assignment commit for the whole batch to the state store committer. The lambda that receives the batch can check if the file assignment has been applied, and if not it can put it back on the queue with a delay. A batch that has been retried enough times without file assignment can go to the dead letter queue.

We can avoid status store updates in the compaction job creation lambda, and do it in the pending jobs lambda instead. We could keep the created status and apply that if we're putting a batch back on the queue. We could record in the message that we've already applied that status when we put it back on the queue.

@patchwork01
Copy link
Collaborator Author

patchwork01 commented Dec 5, 2024

All child issues are done. Also considering closing the following on-hold issue:

Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request parent-issue An issue that is or should be split into multiple sub-issues
Projects
None yet
Development

No branches or pull requests

1 participant