Fix the transfer batching design for the position handler #3488

PaulGregoryBaker · 2023-08-22T15:35:34Z

Goal:

As a development team working on performance enhancement
I want to make use of the batching design
so that I can implement the batching design to get performance gains for Mojaloop transfers

Acceptance Criteria:

Complexity: <High|Medium|Low> > A short comment to remind the reason for the rating

Uncertainty: <High|Medium|Low> > A short comment to remind the reason for the rating

Tasks:

Go though the current batch implementation in position handler
Discuss with team and document various options / designs for batch processing
Create sequence diagram for the design
Present it to all the team members internally and agree on one design
Discuss sequence diagrams with team and agree on that
Document the final design document

Done

Acceptance Criteria pass
Designs are up-to date
Unit Tests pass
Integration Tests pass
Code Style & Coverage meets standards
Changes made to config (default.json) are broadcast to team and follow-up tasks added to update helm charts and other deployment config.
TBD

Pull Requests:

feat(mojaloop/#3488): enable batch processing documentation#415

Follow-up:

N/A

Dependencies:

N/A

Accountability:

Owner: TBC
QA/Review: TBC

The text was updated successfully, but these errors were encountered:

PaulGregoryBaker · 2023-08-22T16:12:18Z

Hey team! Please add your planning poker estimate with Zenhub @kleyow @mdebarros @sri-miriyala @vijayg10 @aaronreynoza @oderayi

vijayg10 · 2023-08-30T06:48:12Z

Design - 1: Binned mixed message processing (Rev: 2)

Binning:

The initial batch size is determined by the batchSize configuration
The consumed messages will be stored in memory
Assign a batch of messages into bins based on the impacted position account ID (Prepare: Payer Position Account, Fulfil: Payee Position Account) and action
Audit each message as it's assigned to a bin.
Sort each bin from smallest to largest amount (future enhancement).
Note: The messages in bins should be references to the message content in memory (No duplication of message content)

Handler:
- Start Transaction (TX) on MySQL.
- For each bin:
  - Call the Bin Processor with the bin and TX.
  - Send and audit notifications based on the transfers result list from the Batch Processor (To Be Confirmed). Note: Batch messages can also be sent to Kafka.
- Commit the offset.
- End the TX (Commit TX to MySQL). Rollback is reserved for complete infrastructure failure only.
Bin Processor:
- Input: Accepts a bin and TX to be processed
- Output: Provides a list of transfers in either a reserved or aborted state.
- Order:
  - Fulfil
  - Aborts and Timeouts
  - Prepare
Supports single message processing as well.

Questions

Message Replay Prevention: How do we ensure that we do not replay messages?

Issues

Locking Issue: The above high-level algorithm might suffer from long "locks" on the position table when Kafka messages are not pre-partitioned correctly by the position account ID. This issue applies when processing mixed messages or when the number of partitions exceeds consumers.

Mitigation

Ensure a sufficient number of consumers, ideally more than partitions.
Have enough partitions to ensure unique binning (no message key hash conflicts).
Use the correct message key when producing position events by the position account ID and the scenario (Prepare: Payer Position Account, Fulfil: Payee Position Account).

Future Enhancements

Enhanced Auditing: Support batch events auditing (e.g., audits for ingress/egress, etc).

Sequence Diagrams

TBD

vijayg10 · 2023-08-30T06:55:54Z

Design - 2: Batching on the fly in combination with Kafka partitioning and assignment strategy

Overview

This is a proposed algorithm for enhancing the existing batch processing implementation with minimal deviation. The primary objective is to improve processing efficiency while addressing potential Kafka misconfiguration issues.

Algorithm Description

The algorithm introduces a new configuration parameter named maxBatchSize. This parameter determines the maximum number of messages that should be processed in a single batch. The algorithm operates as follows:

The consumer consumes incoming messages from Kafka, and as it processes them, it keeps track of the accountID associated with each message.
Whenever the consumer detects a change in the accountID or if the batch size reaches maxBatchSize, a batch is formed using the collected messages.
The formed batch is then sent for processing to the designated handler function. After successful processing, the consumer commits the offset, ensuring the messages in the batch are considered processed.
The consumer continues to repeat steps 1-3, efficiently processing messages in batches based on the accountID and the maxBatchSize configuration.

Example

Let's illustrate the algorithm with an example:

Assuming the accountIDs of consumed messages from Kafka are: 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, and maxBatchSize is set to 5.

The first batch formed will contain (1, 1, 1). This batch is processed by the handler function and its offset is committed.
Subsequently, a new batch (2, 2) is created and processed, followed by the offset being committed.
Finally, a batch (1, 1, 1, 1, 1) is formed, processed, and its offset is committed.

Advantages

This algorithm brings several advantages to the table:

No Sorting Required: Unlike some other batch processing approaches, this algorithm eliminates the need for sorting messages before processing. This simplifies the implementation and potentially improves processing speed.
Fault Tolerance: The algorithm accounts for potential misconfiguration of Kafka by ensuring that messages are efficiently processed in distinct batches based on accountID. This adds a layer of fault tolerance against misconfigurations.
Minimal Deviation: The proposed algorithm builds upon the existing batch processing implementation, requiring only additional logic for forming batches and processing them in the handler function.

Important Note

It's crucial to note that the algorithm performs optimally when Kafka's partitioning and assignment strategy is well-aligned with accountIDs. If messages are mixed across partitions in a way that doesn't match the accountID grouping, the actual batch size might be smaller than maxBatchSize.

In summary, this algorithm strives to enhance batch processing for account-based messages, offering improved fault tolerance without straying far from the current implementation.

vijayg10 · 2023-08-30T07:10:09Z

Design - 3: Position Message Aggregator

Overview

This option involves the introduction of a new service called position-message-aggregator. This service focuses on constructing bins from an existing topic, committing offsets, and producing these bins to a new topic. This can streamline the processing for the position handlers.

Algorithm Description

position-message-aggregator:
- Listens to the existing topic for incoming messages.
- Aggregates messages into bins based on accountID.
- Commits offsets to the Kafka Consumer.
- Produces the constructed bins to a new topic.
Position Handlers:
- Consume the constructed bins directly from the new topic.
- Process each bin one by one, effectively handling a batch of messages.

Advantages of this approach:

Optimized Consumption: The position handlers can directly consume pre-aggregated bins, reducing the complexity of message sorting and binning.
Efficient Processing: Handling bins instead of individual messages can improve processing efficiency and reduce overhead.
Simplified Logic: The message aggregation and bin creation are isolated in the position-message-aggregator, simplifying the handlers' responsibilities.

Considerations:

Scalability: Ensure that the position-message-aggregator and the Kafka Consumer scale appropriately based on message volume.
Bin Definition: Define the criteria for constructing bins to align with the requirements of the position handlers.

Future Enhancements:

Dynamic Binning: Explore the possibility of dynamically adjusting bin creation criteria based on real-time metrics or patterns.

This approach offers a potential way to streamline the processing of position messages by centralising the aggregation step and providing a cleaner workflow for the position handlers.

PaulGregoryBaker added core-dev-squad pi-22 story to-be-refined This story is ready to be groomed labels Aug 22, 2023

PaulGregoryBaker mentioned this issue Aug 22, 2023

Implement the batching design for position prepare messages #3489

Closed

32 tasks

PaulGregoryBaker removed the to-be-refined This story is ready to be groomed label Aug 22, 2023

mdebarros mentioned this issue Aug 25, 2023

feat(mojaloop/3471): profile cl position handler mojaloop/ml-perf-characterization#4

Merged

PaulGregoryBaker assigned vijayg10 Aug 28, 2023

This was referenced Sep 19, 2023

Implement batch functionality in the position handler for fulfil actions #3524

Closed

Implement batch functionality for the Abort and Timeout actions in the position handler #3528

Open

Fix broken batch processing logic in central position handler #3500

Closed

JaneS321 closed this as completed Oct 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the transfer batching design for the position handler #3488

Fix the transfer batching design for the position handler #3488

PaulGregoryBaker commented Aug 22, 2023 •

edited by vijayg10

Loading

PaulGregoryBaker commented Aug 22, 2023

vijayg10 commented Aug 30, 2023 •

edited

Loading

vijayg10 commented Aug 30, 2023 •

edited

Loading

vijayg10 commented Aug 30, 2023 •

edited

Loading

Fix the transfer batching design for the position handler #3488

Fix the transfer batching design for the position handler #3488

Comments

PaulGregoryBaker commented Aug 22, 2023 • edited by vijayg10 Loading

Goal:

Pull Requests:

Follow-up:

Accountability:

PaulGregoryBaker commented Aug 22, 2023

vijayg10 commented Aug 30, 2023 • edited Loading

Design - 1: Binned mixed message processing (Rev: 2)

Questions

Issues

Mitigation

Future Enhancements

Sequence Diagrams

vijayg10 commented Aug 30, 2023 • edited Loading

Design - 2: Batching on the fly in combination with Kafka partitioning and assignment strategy

Overview

Algorithm Description

Example

Advantages

Important Note

vijayg10 commented Aug 30, 2023 • edited Loading

Design - 3: Position Message Aggregator

Overview

Algorithm Description

Advantages of this approach:

Considerations:

Future Enhancements:

PaulGregoryBaker commented Aug 22, 2023 •

edited by vijayg10

Loading

vijayg10 commented Aug 30, 2023 •

edited

Loading

vijayg10 commented Aug 30, 2023 •

edited

Loading

vijayg10 commented Aug 30, 2023 •

edited

Loading