-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible Bulk Update Optimizations #26802
Comments
@imotov in |
@jpountz you are right, thanks a lot for the clarification. I have updated the issue. |
We spoke about this in fixit-friday. While this optimization is doable on the client side (for now) we would like to first look into a more generic optimization that would hold a secondary (engine-private) index reader that is used internally that would not pay the price of refreshing caches, load global ords, rebuild parent-child relationships etc. It would also mean that we can honor the refresh semantics if a realtime get is used. I will keep this issue open for now and ope a new one to explore the opportunities. |
…er to disk" Today, when ES detects it's using too much heap vs the configured indexing buffer (default 10% of JVM heap) it opens a new searcher to force Lucene to move the bytes to disk, clear version map, etc. But this has the unexpected side effect of making newly indexed/deleted documents visible to future searches, which is not nice for users who are trying to prevent that, e.g. elastic#3593. This is also an indirect spinoff from elastic#26802 where we potentially pay a big price on rebuilding caches etc. when updates / realtime-get is used. We are refreshing the internal reader for realtime gets which causes for instance global ords to be rebuild. I think we can gain quite a bit if we'd use a reader that is only used for GETs and not for searches etc. that way we can also solve problems of searchers being refreshed unexpectedly aside of replica recovery / relocation. Closes elastic#15768 Closes elastic#26912
…er to disk (#26972) Today, when ES detects it's using too much heap vs the configured indexing buffer (default 10% of JVM heap) it opens a new searcher to force Lucene to move the bytes to disk, clear version map, etc. But this has the unexpected side effect of making newly indexed/deleted documents visible to future searches, which is not nice for users who are trying to prevent that, e.g. #3593. This is also an indirect spinoff from #26802 where we potentially pay a big price on rebuilding caches etc. when updates / realtime-get is used. We are refreshing the internal reader for realtime gets which causes for instance global ords to be rebuild. I think we can gain quite a bit if we'd use a reader that is only used for GETs and not for searches etc. that way we can also solve problems of searchers being refreshed unexpectedly aside of replica recovery / relocation. Closes #15768 Closes #26912
@imotov I got the index reader change in I was talking about. While batching up those kind of updates might be a two sided sword I am still interested in seeing how it would look like code wise, would it be as simple as sorting the updates by ID and use insertion order as a secondary sort and then run them all consecutively? I think it can be as simple as checking if the previous ID is the same and don't fetch the document again? I really wonder if we can have a prototype that does the sorting in the test as a start? |
@s1monw yes, I think this rearrangement shouldn't cause any differences in the final result of the bulk operation (at least on the level we can have any guarantee about). We will have to have a special handling for deletes (clean buffer), but otherwise that should optimize one of the issues. We will still have an issue of indexing the same record over and over again and sending multiple copies of the same record to replicas, but I am fully sold on progress over perfection here :) |
Can you clarify what you mean here? We have to send these operations to maintain the semantics of the translog as an operation log and for semantics of sequence IDs. |
@jasontedor yes if we reindex the same record again and again on the primary shard, we have to send it to replicas as many times. I was just thinking, maybe, if we can index it on the primary only after all sequential operations on the same record are performed we will need to send only one copy to replicas. But as I said, we don't need to bundle that in, it is a somewhat separate optimization. |
@imotov For a shard-level bulk request, we still send a single bulk request to the replicas containing the results of all the indexing operations on the primary. That is, a shard bulk request is a single replication operation. |
@jasontedor yes, the question here is, if a bulk request contains 100 script updates on the same record do we need to send 100 slightly different copies of this record to replicas or we can just send the final version and ignore all intermediate states on both primary and replicas? |
On the replica we can not do that, if we apply an operation the primary, we have to send it to the replica and we can not coalesce operations from the primary to the replica as doing so would violate operation and sequence ID semantics. |
…er to disk (#26972) Today, when ES detects it's using too much heap vs the configured indexing buffer (default 10% of JVM heap) it opens a new searcher to force Lucene to move the bytes to disk, clear version map, etc. But this has the unexpected side effect of making newly indexed/deleted documents visible to future searches, which is not nice for users who are trying to prevent that, e.g. #3593. This is also an indirect spinoff from #26802 where we potentially pay a big price on rebuilding caches etc. when updates / realtime-get is used. We are refreshing the internal reader for realtime gets which causes for instance global ords to be rebuild. I think we can gain quite a bit if we'd use a reader that is only used for GETs and not for searches etc. that way we can also solve problems of searchers being refreshed unexpectedly aside of replica recovery / relocation. Closes #15768 Closes #26912
@jasontedor if I get the suggestion the idea is to do one get from the engine, apply multiple updates and the indexed the versioned results once. This will result in one write operation to the engine with one issues seq#. I think that's fine? All that said - I'm really not happy with the state of the TransportBulkShardAction and it's complexity. I think we should figure what we want to do there (I don't have a good suggestion yet) before introducing more complexity. |
We historically removed reading from the transaction log to get consistent results from _GET calls. There was also the motivation that the read-modify-update principle we apply should not be hidden from the user. We still agree on the fact that we should not hide these aspects but the impact on updates is quite significant especially if the same documents is updated before it's written to disk and made serachable. This change adds back the ability to read from the transaction log but only for update calls. Calls to the _GET API will always do a refresh if necessary to return consistent results ie. if stored fields or DocValues Fields are requested. Closes #26802
We had several reports of slow bulk updates in 5.0 and above. Some of them are summarized in #23792. The main issue that led to significant slow down seems to the refresh that is now performed before each update operation. Basically, in order to perform a bulk requests with 1000 updates on the same record we will have to perform refresh 1000 times. This problem can be avoided if we could combine all update operations on the same record within a shard batch into a single update operation, this way we could get away with performing refresh only once.
For example, if we have the following bulk request:
it currently translates into
If we combine all updates together we could transform it
We would still have an issue if we have a mix of index or delete operations with update operations on the same record in the same bulk request, but, perhaps we can either fall back to the current slow way of doing things or optimize them only as much as we can, for example, we can ignore all updates before index operation and all updates and index operation before delete, etc.
The text was updated successfully, but these errors were encountered: