Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2.2] Port #22508 and prereqs (ContainerRuntime: Process incoming batches op-by-op instead of waiting for the whole batch) #22654

Merged

Conversation

markfields
Copy link
Member

@markfields markfields commented Sep 26, 2024

Description

Porting the following commits from main to release/client/2.2:

  • ce0a14c (Offline: Model empty batches differently, exposing the empty Grouped Batch message to more of the system)
  • e024b86 (ContainerRuntime: Refactor batch processing code to support either op-by-op or batch-all-at-once semantics)
  • 709f085 (ContainerRuntime: Process incoming batches op-by-op instead of waiting for the whole batch)

Reviewer Guidance

There was a ton of churn between 2.2 and 2.3 in this code (subset of these changes to container-runtime dir), so the cherry-pick turned into more like a manual re-implementing of the change. In some ways it's a smaller diff compared to what needed to happen in main.

So please review carefully as you would an original PR.

Offline: Model empty batches differently, exposing the empty Grouped Batch message to more of the system (microsoft#22343)

Empty batches are a unique case, where we submit a Grouped Batch message
with no inner runtime messages. This is needed for tracking batches
between forks of a container (each trying to resubmit the same local
content).

There are several flows that need the message or at least
sequenceNumber-related metadata:
* BatchStart/BatchEnd events
* DeltaScheduler
* savedOps mechanism in PendingStateManager
* The DataProcessingError thrown if the container forks
* Future (See microsoft#22310): MinimumSequenceNumber

We used to include the `emptyBatchSequenceNumber` and that partially
addressed this, but it's better to include the whole message. So add
this as the `keyMessage` to `InboundBatch` type - and for non-empty
batches this is just the first message (we were already referring to
that in certain places).
ContaineRuntime: Refactor batch processing code to support either op-by-op or batch-all-at-once semantics (microsoft#22501)

No behavior change here, just refactoring to be able to express the
result of RemoteMessageProcessor in terms of ops or batches.

Follow-up change to adjust the logic will be much more scoped then.
ContainerRuntime: Process incoming batches op-by-op instead of waiting for the whole batch (microsoft#22508)

We are concerned that holding batch messages in ContainerRuntime even
while DeltaManager advances its tracked sequence numbers through the
batch could have unintended consequences. So this PR restores the old
behavior of processing each message in a batch one-by-one, rather than
holding until the whole batch arrives.

Note that there's no change in behavior here for Grouped Batches.

PR microsoft#21785 switched the RemoteMessageProcessor from returning ungrouped
batch ops as they arrived, to holding them and finally returning the
whole batch once the last arrived. The downstream code was also updated
to take whole batches, whereas before it would take individual messages
and use the batch metadata to detect batch start/end.

Too many other changes were made after that PR to straight revert it.
Logic was added throughout CR and PSM that looks at info about that
batch which is found on the first message in the batch. So we can
reverse the change and process one-at-a-time, but we need a way to carry
around that "batch start info" with the first message in the batch.

So we are now modeling the result that RMP yields as one of three cases:

- A full batch of messages (could be from a single-message batch or a
Grouped Batch)
- The first message of a multi-message batch
- The next message in a multi-message batch

The first two cases include the "batch start info" needed for the recent
Offline work. The third case just indicates whether it's the last
message or not.

"batch start info" and updating the downstream code to use that instead
of reading it off the old "Inbound Batch" type. This PR now adds those
other two cases to the RMP return type and handles processing them
throughout CR and PSM.
@github-actions github-actions bot added area: runtime Runtime related issues changeset-present base: release PRs targeted against a release branch labels Sep 26, 2024
Copy link
Contributor

Warning

WARNING: This PR is targeting a release branch!

All changes must first be merged into main and then backported to the target release branch.
Please include a link to the main PR in the description of this PR.

Changes to release branches require approval from the Patch Triage group before merging.
You should have already discussed this change with them so they know to expect it.

For more details, see our internal documentation for the patch policy and processes for
patch releases.

@markfields markfields marked this pull request as ready for review September 27, 2024 19:39
@markfields markfields requested a review from a team as a code owner September 27, 2024 19:39
@msfluid-bot
Copy link
Collaborator

msfluid-bot commented Sep 27, 2024

Could not find a usable baseline build with search starting at CI 1d1d489

Generated by 🚫 dangerJS against f488a2a

@markfields
Copy link
Member Author

@jzaffiro - Thanks for the feedback on the changeset! I started to incorporate your suggestions, but then thought that since this changeset has already been published with the 2.3 release notes, it's probably better to leave it alone here, so it's obvious that it's the same change, if anyone's looking closely.

Thoughts? @tylerbutler? I'm open to whatever you two think is best.

@jzaffiro
Copy link
Contributor

@jzaffiro - Thanks for the feedback on the changeset! I started to incorporate your suggestions, but then thought that since this changeset has already been published with the 2.3 release notes, it's probably better to leave it alone here, so it's obvious that it's the same change, if anyone's looking closely.

Thoughts? @tylerbutler? I'm open to whatever you two think is best.

Good point - I would say leave it then :)

Copy link
Contributor

@dannimad dannimad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@frankmueller-msft frankmueller-msft merged commit 567feff into microsoft:release/client/2.2 Oct 1, 2024
29 checks passed
@markfields markfields deleted the v2.2/cr-batch-fix branch October 1, 2024 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: runtime Runtime related issues base: release PRs targeted against a release branch changeset-present
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants