Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the transfer batching design for the position handler #3488

Closed
6 of 24 tasks
PaulGregoryBaker opened this issue Aug 22, 2023 · 4 comments
Closed
6 of 24 tasks

Fix the transfer batching design for the position handler #3488

PaulGregoryBaker opened this issue Aug 22, 2023 · 4 comments

Comments

@PaulGregoryBaker
Copy link

PaulGregoryBaker commented Aug 22, 2023

Goal:

As a development team working on performance enhancement
I want to make use of the batching design
so that I can implement the batching design to get performance gains for Mojaloop transfers

Acceptance Criteria:

  • Verify that the design caters for a mix of different message types within a batch. And verify that the batch algorithm is incrementing (prepare) and decrementing (fulfil, abort/timeout) the position base on they message type
    E.g.
    • prepare
    • fulfil
    • abort
    • timeout
    • batch equivalents
  • Verify that the design includes the partitioning strategy so that the participant account impacted by the position change, is used as the message key. e.g. Prepare (reservation) with be the Payer DFSP account Id (participantCurrencyId), for the Fulfil will be the Payee DFSP Account Id, and abort with the Payer DFSP Account Id.
  • Verify that a new partitioning strategy is designed that utilises the account id to publish the message to partition with the same Id.
  • Verify that admin API is designed to update the Kafka Partitions as new accounts for participants are created.
  • Verify that the sequence diagrams as part of the Mojaloop documentation is updated. docs

Complexity: <High|Medium|Low> > A short comment to remind the reason for the rating

Uncertainty: <High|Medium|Low> > A short comment to remind the reason for the rating


Tasks:

  • Go though the current batch implementation in position handler
  • Discuss with team and document various options / designs for batch processing
  • Create sequence diagram for the design
  • Present it to all the team members internally and agree on one design
  • Discuss sequence diagrams with team and agree on that
  • Document the final design document

Done

  • Acceptance Criteria pass
  • Designs are up-to date
  • Unit Tests pass
  • Integration Tests pass
  • Code Style & Coverage meets standards
  • Changes made to config (default.json) are broadcast to team and follow-up tasks added to update helm charts and other deployment config.
  • TBD

Pull Requests:

Follow-up:

  • N/A

Dependencies:

  • N/A

Accountability:

  • Owner: TBC
  • QA/Review: TBC
@PaulGregoryBaker
Copy link
Author

@vijayg10
Copy link

vijayg10 commented Aug 30, 2023

Design - 1: Binned mixed message processing (Rev: 2)

Binning Design drawio

  1. Binning:
  • The initial batch size is determined by the batchSize configuration
  • The consumed messages will be stored in memory
  • Assign a batch of messages into bins based on the impacted position account ID (Prepare: Payer Position Account, Fulfil: Payee Position Account) and action
  • Audit each message as it's assigned to a bin.
  • Sort each bin from smallest to largest amount (future enhancement).
  • Note: The messages in bins should be references to the message content in memory (No duplication of message content)
  1. Handler:

    • Start Transaction (TX) on MySQL.
    • For each bin:
      • Call the Bin Processor with the bin and TX.
      • Send and audit notifications based on the transfers result list from the Batch Processor (To Be Confirmed). Note: Batch messages can also be sent to Kafka.
    • Commit the offset.
    • End the TX (Commit TX to MySQL). Rollback is reserved for complete infrastructure failure only.
  2. Bin Processor:

    • Input: Accepts a bin and TX to be processed
    • Output: Provides a list of transfers in either a reserved or aborted state.
    • Order:
      • Fulfil
      • Aborts and Timeouts
      • Prepare
  3. Supports single message processing as well.

Questions

  • Message Replay Prevention: How do we ensure that we do not replay messages?

Issues

  • Locking Issue: The above high-level algorithm might suffer from long "locks" on the position table when Kafka messages are not pre-partitioned correctly by the position account ID. This issue applies when processing mixed messages or when the number of partitions exceeds consumers.

Mitigation

  • Ensure a sufficient number of consumers, ideally more than partitions.
  • Have enough partitions to ensure unique binning (no message key hash conflicts).
  • Use the correct message key when producing position events by the position account ID and the scenario (Prepare: Payer Position Account, Fulfil: Payee Position Account).

Future Enhancements

  • Enhanced Auditing: Support batch events auditing (e.g., audits for ingress/egress, etc).

Sequence Diagrams

TBD

@vijayg10
Copy link

vijayg10 commented Aug 30, 2023

Design - 2: Batching on the fly in combination with Kafka partitioning and assignment strategy

Batch Processing Algorithm 2 drawio

Overview

This is a proposed algorithm for enhancing the existing batch processing implementation with minimal deviation. The primary objective is to improve processing efficiency while addressing potential Kafka misconfiguration issues.

Algorithm Description

The algorithm introduces a new configuration parameter named maxBatchSize. This parameter determines the maximum number of messages that should be processed in a single batch. The algorithm operates as follows:

  1. The consumer consumes incoming messages from Kafka, and as it processes them, it keeps track of the accountID associated with each message.

  2. Whenever the consumer detects a change in the accountID or if the batch size reaches maxBatchSize, a batch is formed using the collected messages.

  3. The formed batch is then sent for processing to the designated handler function. After successful processing, the consumer commits the offset, ensuring the messages in the batch are considered processed.

  4. The consumer continues to repeat steps 1-3, efficiently processing messages in batches based on the accountID and the maxBatchSize configuration.

Example

Let's illustrate the algorithm with an example:

Assuming the accountIDs of consumed messages from Kafka are: 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, and maxBatchSize is set to 5.

  • The first batch formed will contain (1, 1, 1). This batch is processed by the handler function and its offset is committed.

  • Subsequently, a new batch (2, 2) is created and processed, followed by the offset being committed.

  • Finally, a batch (1, 1, 1, 1, 1) is formed, processed, and its offset is committed.

Advantages

This algorithm brings several advantages to the table:

  • No Sorting Required: Unlike some other batch processing approaches, this algorithm eliminates the need for sorting messages before processing. This simplifies the implementation and potentially improves processing speed.

  • Fault Tolerance: The algorithm accounts for potential misconfiguration of Kafka by ensuring that messages are efficiently processed in distinct batches based on accountID. This adds a layer of fault tolerance against misconfigurations.

  • Minimal Deviation: The proposed algorithm builds upon the existing batch processing implementation, requiring only additional logic for forming batches and processing them in the handler function.

Important Note

It's crucial to note that the algorithm performs optimally when Kafka's partitioning and assignment strategy is well-aligned with accountIDs. If messages are mixed across partitions in a way that doesn't match the accountID grouping, the actual batch size might be smaller than maxBatchSize.

In summary, this algorithm strives to enhance batch processing for account-based messages, offering improved fault tolerance without straying far from the current implementation.

@vijayg10
Copy link

vijayg10 commented Aug 30, 2023

Design - 3: Position Message Aggregator

Batch Processing Algorithm 3 drawio

Overview

This option involves the introduction of a new service called position-message-aggregator. This service focuses on constructing bins from an existing topic, committing offsets, and producing these bins to a new topic. This can streamline the processing for the position handlers.

Algorithm Description

  1. position-message-aggregator:

    • Listens to the existing topic for incoming messages.
    • Aggregates messages into bins based on accountID.
    • Commits offsets to the Kafka Consumer.
    • Produces the constructed bins to a new topic.
  2. Position Handlers:

    • Consume the constructed bins directly from the new topic.
    • Process each bin one by one, effectively handling a batch of messages.

Advantages of this approach:

  • Optimized Consumption: The position handlers can directly consume pre-aggregated bins, reducing the complexity of message sorting and binning.
  • Efficient Processing: Handling bins instead of individual messages can improve processing efficiency and reduce overhead.
  • Simplified Logic: The message aggregation and bin creation are isolated in the position-message-aggregator, simplifying the handlers' responsibilities.

Considerations:

  • Scalability: Ensure that the position-message-aggregator and the Kafka Consumer scale appropriately based on message volume.
  • Bin Definition: Define the criteria for constructing bins to align with the requirements of the position handlers.

Future Enhancements:

  • Dynamic Binning: Explore the possibility of dynamically adjusting bin creation criteria based on real-time metrics or patterns.

This approach offers a potential way to streamline the processing of position messages by centralising the aggregation step and providing a cleaner workflow for the position handlers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants