-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-16701: Race condition on PRS enabled collection deletion #1460
SOLR-16701: Race condition on PRS enabled collection deletion #1460
Conversation
e42ffb4
to
ae5a503
Compare
solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java
Outdated
Show resolved
Hide resolved
solr/solrj/src/java/org/apache/solr/common/util/CommonTestInjection.java
Show resolved
Hide resolved
} else { | ||
log.info( | ||
"Breakpoint with key {} is triggered but there's no implementation set. Skipping...", | ||
key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should log here (and fwiw there is no corresponding logging for absent injectDelay()
. The issue is that if you were to enable assertions on a running system (not in a test context), you might get a ton of spam from this. If you want to leave it, maybe set log level from log.info()
to log.debug()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes let's change that to debug, i was a bit concerned if someone accidentally used different key names meant for the same Breakpoint for injectBreakpoint
and setImplementation
, so with this log message they can see whether the breakpoint is triggered.
Though I guess this could be really noisy as you pointed out, and I assume the dev can search whether there's "Breakpoint with key ... is triggered" message instead and absence of that means either the execution path does not reach the breakpoint or the breakpoint has no implementation set
public void setImplementation(String key, Breakpoint implementation) { | ||
if (breakpoints.containsKey(key)) { | ||
throw new IllegalArgumentException( | ||
"Cannot refine Breakpoint implementation with key " + key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "modify" or "replace" rather than "refine"? We'd be replacing, not refining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh was a typo, supposed to be "redefine"
3b805af
to
cc5d9c3
Compare
Added unit test case in ZkStateReaderTest#testDeletePrsCollection
…undException as their conditions/handling are quite different.
…erReplicaStatesOps)
cc5d9c3
to
e5b21a9
Compare
(cherry picked from commit f22a51c)
https://issues.apache.org/jira/browse/SOLR-16701
Description
This fixes a race condition on PRS enabled collection deletion, which triggers the exception:
This could be triggered by:
fetchCollectionState
is called, and the state.json is fetchedfetchCollectionState
fetches the PRS entries, the collection state.json/PRS are deleted by someone elsefetchCollectionState
would throw below exception when it reaches the PRS fetching logic as the Zk node state.json is no longer aroundSolution
Create a specific exception
PrsZkNodeNotFoundException
(that extendsSolrException
) when the PRS entries cannot be fetched. Then inZkStateReader#fetchCollectionState
, we have added a new catch clause for this exception, we will use the similar check as the handling forNoNodeException
, but in this case if exists is null, then it's the expected race condition (prs entries deleted after state.json is fetched), andfetchCollectionState
should just return null (collection gone), otherwise it will rethrow thePrsZkNodeNotFoundException
case (unexpected, PRS entries gone but state.json is still around)Tests
Added
ZkStateReaderTest#testDeletePrsCollection
which reproduce such race condition, and verify that:ZkStateReader#fetchCollectionState
should not throw exception, instead, it should eventually returnnull
which indicates the collection is deletedPrsZkNodeNotFoundException
was indeed triggeredPlease take note that the test case was built on the
Breakpoint
introduced by another PR #1457Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.