Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][broker] PIP-192 Added Deleted and Init states in ServiceUnitState #19546

Merged

Conversation

heesung-sn
Copy link
Contributor

@heesung-sn heesung-sn commented Feb 17, 2023

Master Issue: #16691

Motivation

There is a possible edge case where bundle ownership could be in an invalid state in the new load balancer.

Let's say strategic compaction happened in the middle of the bundle transfer.
Owned->Transfer_Assigned // compaction happened.

After the compaction, when a tableview joins and builds its cache from the compacted topic, it will first see a transition, null -> Transfer_Assigned. However, because this is invalid, the tableview will skip this msg, causing an inconsistent view.

To avoid this issue, this transition(null -> Transfer_Assigned) has been changed to a valid transition in the state diagram, but I realized that this could bring another wrong state when transfer and split occur concurrently.

Racing condition
leader : Owned -> Transfer_Assigned -> Released -> Owned
Broker-2 : Owned -> Splitting -> null(parent-bundle)

Wrong state: a parent bundle is owned by another broker after the split.
Owned -> Splitting -> null -> Transfer_Assigned -> Released -> Owned

To break this invalid transition, we need semi-terminal states(Deleted, and Free) and Init state to better represent tombstoned bundles.

Owned -> Splitting -> Deleted -> Transfer_Assigned // invalid
Owned -> Released -> Free -> Transfer_Assigned // invalid

Modifications

This PR

  • added Deleted and Init States in ServiceUnitState.
  • added the Owned- > Released transition to make the Unload command a two-phase protocol.
  • scheduled monitorOwnerships() on the leader broker's ServiceUnitStateChannel, which overrides states to Owned or Init for service units(bundles)
    • from inactive brokers (to Owned)
    • in in-flight states too long (orphan bundles) (to Owned)
    • in semi-terminal states( Deleted and Free) long enough to tombstone (to Init).

Bundle States (14)

Future Work / Discussion

Also, we may need to think about pre-creating all bundles in Owned state whenever a new namespace is created.

If all bundles are pre-created and do exist in the service units state channel, tombstone can only happen on parent bundles after split(Deleted state).

Also, with the bundle pre-creations and the transfer command, there will be no need for the assign command,(which could cause thundering herds impact when many brokers concurrently issue bundle ownership). This will simplify the state transitions by removing the 'Free' state and 'Free -> Assigned' transition.

However, this pre-creation requires some additional memory space from keeping bundles in Owned state(some might not be used at all).

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • *Updaated unit tests.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

We will have separate PRs to update the Doc later.

Matching PR in forked repository

PR in forked repository: heesung-sn#32

}
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we add waitForReconnection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When seek() resets the cursor, this reader will be temporarily disconnected.

Then, when calling acknowledgeCumulativeAsync() at the end of the compaction(below code), the reader might throw an exception because state == Connecting. This issue could likely happen if there is only one message to compact.

                .thenCompose(v -> {
                    log.info("Acking ledger id {}", phaseOneResult.firstId);
                    return ((CompactionReaderImpl<T>) reader)
                            .acknowledgeCumulativeAsync(
                                    phaseOneResult.lastId, Map.of(COMPACTED_TOPIC_LEDGER_PROPERTY,
                                            ledger.getId()));
                })

@heesung-sn heesung-sn changed the title [improve][broker] PIP-192 Added Disabled and Init states in ServiceUnitState [improve][broker] PIP-192 Added Disabled, Deleted, and Init states in ServiceUnitState Feb 17, 2023
@heesung-sn heesung-sn force-pushed the pip-192-service-unit-state-change branch from 3f7c850 to c485357 Compare February 17, 2023 22:41
Copy link
Contributor

@gaoran10 gaoran10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Left some comments.

@heesung-sn heesung-sn changed the title [improve][broker] PIP-192 Added Disabled, Deleted, and Init states in ServiceUnitState [improve][broker] PIP-192 Added Deleted and Init states in ServiceUnitState Feb 20, 2023
@heesung-sn heesung-sn force-pushed the pip-192-service-unit-state-change branch from 7d4eac5 to e2290b3 Compare February 22, 2023 03:15
@heesung-sn heesung-sn force-pushed the pip-192-service-unit-state-change branch from 993b8c6 to 2fbafc9 Compare February 22, 2023 07:50
Copy link
Member

@Demogorgon314 Demogorgon314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@heesung-sn heesung-sn force-pushed the pip-192-service-unit-state-change branch from 5feff8a to 77209e7 Compare February 22, 2023 18:05
@heesung-sn heesung-sn force-pushed the pip-192-service-unit-state-change branch from 77209e7 to 0689771 Compare February 22, 2023 19:58
@Demogorgon314 Demogorgon314 reopened this Feb 23, 2023
@Technoboy- Technoboy- merged commit 389792b into apache:master Feb 23, 2023
@Technoboy- Technoboy- added this to the 3.0.0 milestone Feb 23, 2023
@Technoboy- Technoboy- added type/feature The PR added a new feature or issue requested a new feature area/broker labels Feb 23, 2023
@heesung-sn heesung-sn deleted the pip-192-service-unit-state-change branch April 2, 2024 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker doc-not-needed Your PR changes do not impact docs ready-to-test type/feature The PR added a new feature or issue requested a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants