-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAI-5162: Add sysprop solrcloud.publishDownOnStart
to controller whether publish down on node start or not
#233
SAI-5162: Add sysprop solrcloud.publishDownOnStart
to controller whether publish down on node start or not
#233
Conversation
…publish down on node start or not
QQ: is this PRS message to add the replica as down "core_node2:54:D:L" ? |
@patsonluk Here are the tests which I think Ishan/Noble used to run. Would be good to run those tests with 9.7 and 9.3 to compare the results.
|
Yes
Thanks I will run those tests! |
I have only run the We probably want to run it using solrperf clusters, however i suspect the impact will be very similar to the test that isolate out the ZK fetching part (https://fullstory.atlassian.net/browse/SAI-5162 description -> benchmarking -> Cluster state fetching) Testing against 9.7 vs 9.3 could also hide performance issue of such change as other changes (?) might actually speed up start up (we even see 9.7 has faster startup with the solrbench test), that however, does not mean publish downnode on start has no performance impact. That being said, I think we should still run 9.7 vs 9.3 benchmarking (with the FS changes and setup). Which is similar to what we have run for Solr 8 -> 9 migration https://fullstory.atlassian.net/issues/SAI-4430?jql=text%20~%20%22benchmark%20solr%209%2A%22) + another test for restart with high number of collections/replicas. Even though the new test will not pinpoint the publish downnode on start change, however, it should still give us confidence on restart performance in general. |
QQ: is this message |
No. For PRS, the downnode change is applied from the data node to ZK directly as in here |
@hiteshk25 can we get this into our fs/branch_9x. This is likely to be temporary and we could totally remove it after we confirm the performance of |
solrcloud.skipPublishDownOnStart
to controller whether publish down on node start or notsolrcloud.publishDownOnStart
to controller whether publish down on node start or not
which means if such flag is NOT defined (hence solrcloud.publishDownOnStart=false), then by default it will bypass publishAndWaitForDownStates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ether publish down on node start or not (#233) * Add sysprop `solrcloud.skipPublishDownOnStart` to controller whether publish down on node start or not * Use RTimer instead * Added timer for the whole publish down ops, including persist ops * ./gradlew tidy * Changed solrcloud.skipPublishDownOnStart to solrcloud.publishDownOnStart which means if such flag is NOT defined (hence solrcloud.publishDownOnStart=false), then by default it will bypass publishAndWaitForDownStates
Descriptions
Detailed in https://fullstory.atlassian.net/browse/SAI-5162
We are adding a sys prop
solrcloud.publishDownOnStart
to give us an option to bypass downnode publishing upon node start. If set totrue
, it will publish down on start (as in the 9.7 behavior). However, if set tofalse
or undefined, it will NOT publish down on start (bypass the fix in 9.7)Also adding logging to assess actual overhead of the downnode call to determine if we need further action
This change is likely to be temporary, depending on the latency reported, we might pursue further optimization or just take 9.7 change as is.