-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change synonyms index auto-expand replicas to 0-1 #115078
Change synonyms index auto-expand replicas to 0-1 #115078
Conversation
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and that causes failures in CI related to #113478.
Note we've just reverted this in #115019 due to an incident (for another component). However, we'll be re-introducing it in the near future, so it's indeed best to have the synonyms test working for that future.
I can't think of a simple change in tests that would fix this
Another one I mentioned would be to ignore the timeout of the green status like:
- do:
cluster.health:
index: .synonyms
timeout: 1m
wait_for_status: green
ignore: 408
Your tests either way previously worked with 2 unassigned replicas, because the synonyms were got/searched in the 2 assigned replicas. It was in yellow state. A 1m timeout will give time to the search shard in stateless to be ready for searches. However it does not explicitly require it, e.g., if for some reason the search shard takes more than 1m to start, it will break, although that should be rare I hope.
Changes auto-expand replicas in synonyms index to 0-1 from 0-all.
Yes this should work then after reading your description! However, please consider whether 0-1 is fine for your use case -- I am not an expert in this, so I'm reviewing just by the perspective of making the test(s) work. So please get approval(s) from your team as well.
@carlosdelest I have a stupid question. Why can't we wait for |
I can answer that @benwtrent I believe: yellow means only primary is ready. In stateless though that means the search shard might not yet be ready to service searches, so the test fails. |
@kingherc seems like we are chasing the wrong thing here. That seems like a bug in how we determine health in serverless. Traditionally, yellow means "Data is available but degraded", now you are saying yellow in serverless means "Your data isn't available". That seems bad. @carlosdelest why did we have this replica to all data nodes to begin with? What happens if for a search, we hit a node where a replica doesn't exist? Maybe I am misunderstanding the purpose of this system index. |
I don't think we were looking closely at this when we created the system index, as no other system index does have 0-all. I don't remember a specific reason - do you @mayya-sharipova ?
The content of this index is retrieved when building the analyzers (either shard recovery or analyzer reloading). So it doesn't matter in terms of replica availability as long as we could load the synonyms from the system index when the analyzers were loaded. The index is not needed for the search operation after analyzers are created. |
OK, this means at the time of index opening, we must serialize potentially large things between nodes. Besides the load time delay this will add, it will add more inter-node traffic costs to our users. With With this change, every time an analyzer is opened, the cost will be larger. |
I think there have been discussion on that for serverless, but I also believe that these APIs are internal in serverless. Note also that there are some subtle differences between using the health API and the cluster health API. I would say this discussion is out of the scope of this PR, but we can open it elsewhere/broadly if needed. From my point of view, it's weird that the test was running with yellow index to begin with -- I expect it to be green for OK/passing tests. So at least 0-1 does that, although I'm not familiar to say whether it's good for the index and use case. |
It's a tradeoff:
Synonyms retrieval being limited to shard recovery and synonyms updates, and retrieving just the synonyms used by the analyzer, seem enough guarantee to me that this traffic should be limited. I don't necessarily see this as a bad tradeoff, unless I'm missing something? |
I would say its 100% in scope as we are making a production code change because of our inability to correctly test it. What we want is "Is this index searchable, please wait until its searchable", that is what we need. Does that not exist at all? We don't care about green, yellow, etc. Just, is this index searchable, please wait until it is and that our settings for the index are applied in such a way that eventually the index is searchable. |
Yes there is. Wait for green. What I meant to say is that we should not delve into the details of yellow. Green makes total sense for a test like this.
No need for me to make a production code change if there's another way to make it wait for green BTW. Since synonyms are out of my realm of expertise, I will revoke my approval, just to avoid a misinterpretation that the production change is generally approved. Search team definitely should agree on best way to wait for green in the test. |
Until best way to wait for green is agreed upon.
@kingherc I'm afraid that won't work - there is no need for replicas to be present in stateful. BwC tests do use replicas as there are more than 1 node, but default YAML tests are run against a single node - so no replicas. We would need different test conditions for stateful and BwC / serverless 😢 |
True. I actually deleted my comment as soon as I posted it 😆 Still believe somehow waiting for green health is best. But I do not have an immediate idea of an easy way to do that in the upgrade/bwc case. |
@carlosdelest another idea I got from the team is that you can try to wait for yellow combined with wait_for_no_initializing_shards=true . Could you check? |
Sorry for being late with the conversation, I am +1 to set auto-expand replicas to 0-1. Reasons for that:
|
Changes auto-expand replicas in synonyms index to
0-1
from0-all
. This is the only system index that uses a 0-all setting, and that causes failures in CI related to #113478.#113478 changed fast indices so now they read from search shards. This broke some CI tests, as synonyms system index is only created when accessed for the first time, and accessing it immediately afterwards did not guarantee that a search replica shard had been created yet (see elasticsearch serverless issue 2922).
The obvious solution was to wait to shards been green, which was introduced in #114400. However, this broke BwC tests in non-serverless CI.
The reason is the index cannot be green in BwC tests:
This change should fix the above problem when backported to 8.x, and is a prerequirement to unmute synonyms tests (see issues #114432, #114443, #114444). Once this is merged, we can fix the CI tests by waiting for a green index status, as 1 replica will be able to be created (1 primary and 1 replica corresponding to the two nodes updated to the new version).
I can't think of a simple change in tests that would fix this (as we have non-bwc and bwc tests involved, which have different number of nodes), and this would align synonyms system index replica settings with the rest of the system indices.