-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][ml] There are two same-named managed ledgers in the one broker #18688
[fix][ml] There are two same-named managed ledgers in the one broker #18688
Conversation
9089c62
to
c3bf336
Compare
c3bf336
to
b439bcd
Compare
The pr had no activity for 30 days, mark with Stale label. |
b439bcd
to
e491665
Compare
Codecov Report
@@ Coverage Diff @@
## master #18688 +/- ##
============================================
+ Coverage 63.30% 63.90% +0.60%
+ Complexity 26123 3490 -22633
============================================
Files 1836 1843 +7
Lines 134416 135163 +747
Branches 14772 14859 +87
============================================
+ Hits 85087 86371 +1284
+ Misses 41649 40949 -700
- Partials 7680 7843 +163
Flags with carried forward coverage won't be shown. Click here to find out more.
|
The pr had no activity for 30 days, mark with Stale label. |
@poorbarcode I assume this PR is still relevant? Please merge latest changes from master and check that tests still pass. |
f64173d
to
35a073e
Compare
Yes.
Done. Could you please help review this PR? |
@poorbarcode The change to production code itself looks good, but I don't like the test. Addressing that would require more changes to ML factory to make it possible to sub class it for tests with test hooks instead of relying on Mockito. One way would be to extract a protected method for the logic here: pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerFactoryImpl.java Lines 376 to 379 in 152b4a2
and here to be able to use a test specific ManagedLedgerFactoryImpl subclass where it's possible to create a custom ManagerLedgerImpl subclass with the required hooks for testing: pulsar/managed-ledger/src/test/java/org/apache/bookkeeper/test/MockedBookKeeperTestCase.java Line 86 in 152b4a2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch @poorbarcode. The production code change is good. I've suggested some changes to the way the test is implemented.
private ManagedLedgerImpl makeManagedLedgerWorksWithStrictlySequentially(ManagedLedgerImpl originalManagedLedger, | ||
ProcessCoordinator processCoordinator) | ||
throws Exception { | ||
ManagedLedgerImpl sequentiallyManagedLedger = spy(originalManagedLedger); | ||
// step-1. | ||
doAnswer(invocation -> { | ||
synchronized (originalManagedLedger) { | ||
// step-3. | ||
// Wait for `managedLedger.close`, then do task: "asyncCreateLedger()". | ||
// Because the thread selector in "managedLedger.executor" is random logic, so it is possible to fail. | ||
// Adding 1000 tasks to stuck the executor gives a high chance of success. | ||
for (int i = 0; i < 1000; i++) { | ||
originalManagedLedger.getExecutor().execute(() -> { | ||
processCoordinator.waitPreviousAndSetStep(3); | ||
}); | ||
} | ||
LedgerHandle lh = (LedgerHandle) invocation.getArguments()[0]; | ||
processCoordinator.waitPreviousAndSetStep(1); | ||
originalManagedLedger.ledgerClosed(lh); | ||
} | ||
return null; | ||
}).when(sequentiallyManagedLedger).ledgerClosed(any(LedgerHandle.class)); | ||
// step-2. | ||
doAnswer(invocation -> { | ||
processCoordinator.waitPreviousAndSetStep(2); | ||
originalManagedLedger.close(); | ||
return null; | ||
}).when(sequentiallyManagedLedger).close(); | ||
return sequentiallyManagedLedger; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a hack, especially the loop of adding 1000 tasks to the executor. The "ProcessCoordinator" implementation looks like something that could be handled with java.util.concurrent.Phaser
.
It would be better to modify ManagedLedgerFactoryImpl so that it's possible to override a method that creates the ledger instance. That way it would be possible to have a way to override the method for tests and inject test logic without relying on Mockito, which isn't thread safe. That itself could cause issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion. I have rewritten the test to make it simpler. Could you take a look again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I added other changes[1]:
- fix the wrong state of the closed managed ledger.
- release the
ledgerHandle
, which is created after the ML is closed
35a073e
to
40298db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Good work @poorbarcode .
…pache#18688) (cherry picked from commit d7186a6)
As discussed on the mailing list https://lists.apache.org/thread/w4jzk27qhtosgsz7l9bmhf1t7o9mxjhp, there is no plan to release 2.9.6, so I am going to remove the release/2.9.6 label |
Motivation
In PR #17526, we know that a
topic
can be closed multiple times and that it is possible to have two same-named objects of classPersistenttopic
in the samebroker
instance.We know that closing the
topic
triggers the closure of theManagedLedger
. Thetopic
object can be closed multiple times which means theManagedLedger
can be closed multiple times. This PR is used to prove: If aManagedLedger
is closed more than once, and switchedledgerHandle
operation ofManagedLedger
and method closed executed concurrently, there will be two of the same-namedManagedledger
in the same broker, possibly with different numbers of cursors.If both Managedledgers are available and there are different numbers of cursors, this can cause the operation
trimLedgers
to delete too many ledgers from the meta ofManagedLedger
.Here is the process:
managedLedger_1.close
switch ledgerHandle(managedLedger_1)
create managedLedger_2
create managedLedger_3
LedgerHandle
LedgerHandle
Closed
managedLedger_1
fromManagedFactory.ledgers
LedgerHandle
LedgerOpened
managedLedger_2
managedLedger_2
toManagedFactory.ledgers
managedLedger_2
fromManagedFactory.ledgers
cursor_1
intomanagedLedger_2
managedLedger_3
managedLedger_3
toManagedFactory.ledgers
cursor_1
from metacursor_2
intomanagedLedger_3
cursor_1
cursor_1
,cursor_2
Modifications
remove(k,v)
instead ofremove(k)
when deletingManagedLedger
fromManagedLedgerFactroy.ledgers
.ledgerHandle
, which is created after the ML is closedDocumentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: