-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bust the request cache when the mapping changes #66295
Conversation
This makes sure that we only serve a hit from the request cache if it was build using the same mapping and that the entire search phase is processed using the same mapping. These are both bugs that Elasticsearch has had for a long, long time but now that we have runtime fields they are much more likely to matter because runtime fields can be changed *drastically* from moment to moment. It does by adding a counter to the local mapping that is incremented whenever the mapping changes. This counter and the mapping is stored under a single `volatile` variable that is updated to point to the new mapping whenever the mapping changes. When the search phase begins we read the contents of the `volatile` and use it to power the entire phase, and including the version counter into the cache key. Before this change we'd go back to the `volatile` over and over again, potentially getting a new version of the mapping every time. This was convenient but made it possible that we'd see the mapping update half way through the search phase which could cause trouble. Mechanically, this creates a `snapshot()` method on `MapperService` to do the volatile read of the current mapping. The snapshot does all of the reading we need from here on out. I kept all of the methods on `MapperService` that read directly from the mapping but deprecated all of them because using them can lead to seeing the mapping updates when you don't expect them. We can slowly remove the usages in follow up changes. This change is big enough as is. Sadly, the `Mapper`s are also mutable. Luckilly, we only mutate them before sticking them behind the `volatile` reference. Hopefully. There is a lot of code there and it is hard to be sure. But it looks like we *intend* to do this. We should make them properly immutable in a follow up change. Closes elastic#62033
Draft so I can iterate on this while Jenkins runs the tests for me. |
Now that elastic#66295 has landed in 7.11 we can run the bwc tests it adds against that branch.
This uses the mapping snapshot that we built for the search phase in elastic#66295 for fetching nested documents. This is simpler to reason about because the mapping snapshot is immutable.
…66780) Currently, an incoming document is parsed through `DocumentMapper#parse`, which in turns calls `DocumentParser#parseDocument` providing `this` among other arguments. As part of the effort to reduce usages of `DocumentMapper` when possible, as it represents the mutable side of mappings (through mappings updates) and involves complexity, we can carry around only the needed components. This does add some required arguments to `DocumentParser#parseDocument` , though it makes dependencies clearer. This change does not affect end consumers as they all go through DocumentMapper anyways, but by not needed to provide DocumentMapper to parseDocument, we may be able to unblock further improvements down the line. Relates to #66295
…lastic#66780) Currently, an incoming document is parsed through `DocumentMapper#parse`, which in turns calls `DocumentParser#parseDocument` providing `this` among other arguments. As part of the effort to reduce usages of `DocumentMapper` when possible, as it represents the mutable side of mappings (through mappings updates) and involves complexity, we can carry around only the needed components. This does add some required arguments to `DocumentParser#parseDocument` , though it makes dependencies clearer. This change does not affect end consumers as they all go through DocumentMapper anyways, but by not needed to provide DocumentMapper to parseDocument, we may be able to unblock further improvements down the line. Relates to elastic#66295
The refactor #66877 made me notice that we aren't guaranteed to use the same
It's great we covered 1, as it was a critical issue with runtime fields. I think point 2 is a strong goal, but not a blocker. Does this match your understanding too? We could file a dedicated issue to track it, listing the places where it's not yet consistent. |
Yeah. We don't do anything in particular to protect against |
…66780) Currently, an incoming document is parsed through `DocumentMapper#parse`, which in turns calls `DocumentParser#parseDocument` providing `this` among other arguments. As part of the effort to reduce usages of `DocumentMapper` when possible, as it represents the mutable side of mappings (through mappings updates) and involves complexity, we can carry around only the needed components. This does add some required arguments to `DocumentParser#parseDocument` , though it makes dependencies clearer. This change does not affect end consumers as they all go through DocumentMapper anyways, but by not needed to provide DocumentMapper to parseDocument, we may be able to unblock further improvements down the line. Relates to #66295
As part of elastic#66295 we made QueryShardContext perform mapping lookups through MappingLookup rather than MapperService. That helps as MapperService relies on DocumentMapper which may change througout the execution of the search request. At search time, the percolate query also needs to parse documents, which made us add a parse method to MappingLookup.Such parse method currently relies on calling DocumentMapper#parseDocument through a function, but we would like to rather make this easier to follow. (see https://github.com/elastic/elasticsearch/pull/66295/files#r544639868) We recently removed the need to provide the entire DocumentMapper to DocumentParser#parse, opening the possibility for using DocumentParser directly when needing to parse a document at query time. This commit adds everything that is needed (namely Mapping, IndexSettings and IndexAnalyzers) to MappingLookup so that it can parse a document through DocumentParser without relying on DocumentMapper. As a bonus, given that MappingLookup holds a reference to these three additional objects, we can make DocumentMapper rely on MappingLookup to retrieve those and not hold its own same references to them. Along the same lines, given that MappingLookup holds all that's necessary to parse a document, the signature of DocumentParser#parse can be simplified by replacing most of its arguments with MappingLookup and retrieving what is needed from it.
As part of #66295 we made QueryShardContext perform mapping lookups through MappingLookup rather than MapperService. That helps as MapperService relies on DocumentMapper which may change througout the execution of the search request. At search time, the percolate query also needs to parse documents, which made us add a parse method to MappingLookup.Such parse method currently relies on calling DocumentMapper#parseDocument through a function, but we would like to rather make this easier to follow. (see https://github.com/elastic/elasticsearch/pull/66295/files#r544639868) We recently removed the need to provide the entire DocumentMapper to DocumentParser#parse, opening the possibility for using DocumentParser directly when needing to parse a document at query time. This commit adds everything that is needed (namely Mapping, IndexSettings and IndexAnalyzers) to MappingLookup so that it can parse a document through DocumentParser without relying on DocumentMapper. As a bonus, given that MappingLookup holds a reference to these three additional objects, we can make DocumentMapper rely on MappingLookup to retrieve those and not hold its own same references to them. Along the same lines, given that MappingLookup holds all that's necessary to parse a document, the signature of DocumentParser#parse can be simplified by replacing most of its arguments with MappingLookup and retrieving what is needed from it.
As part of elastic#66295 we made QueryShardContext perform mapping lookups through MappingLookup rather than MapperService. That helps as MapperService relies on DocumentMapper which may change througout the execution of the search request. At search time, the percolate query also needs to parse documents, which made us add a parse method to MappingLookup.Such parse method currently relies on calling DocumentMapper#parseDocument through a function, but we would like to rather make this easier to follow. (see https://github.com/elastic/elasticsearch/pull/66295/files#r544639868) We recently removed the need to provide the entire DocumentMapper to DocumentParser#parse, opening the possibility for using DocumentParser directly when needing to parse a document at query time. This commit adds everything that is needed (namely Mapping, IndexSettings and IndexAnalyzers) to MappingLookup so that it can parse a document through DocumentParser without relying on DocumentMapper. As a bonus, given that MappingLookup holds a reference to these three additional objects, we can make DocumentMapper rely on MappingLookup to retrieve those and not hold its own same references to them. Along the same lines, given that MappingLookup holds all that's necessary to parse a document, the signature of DocumentParser#parse can be simplified by replacing most of its arguments with MappingLookup and retrieving what is needed from it.
As part of #66295 we made QueryShardContext perform mapping lookups through MappingLookup rather than MapperService. That helps as MapperService relies on DocumentMapper which may change througout the execution of the search request. At search time, the percolate query also needs to parse documents, which made us add a parse method to MappingLookup.Such parse method currently relies on calling DocumentMapper#parseDocument through a function, but we would like to rather make this easier to follow. (see https://github.com/elastic/elasticsearch/pull/66295/files#r544639868) We recently removed the need to provide the entire DocumentMapper to DocumentParser#parse, opening the possibility for using DocumentParser directly when needing to parse a document at query time. This commit adds everything that is needed (namely Mapping, IndexSettings and IndexAnalyzers) to MappingLookup so that it can parse a document through DocumentParser without relying on DocumentMapper. As a bonus, given that MappingLookup holds a reference to these three additional objects, we can make DocumentMapper rely on MappingLookup to retrieve those and not hold its own same references to them. Along the same lines, given that MappingLookup holds all that's necessary to parse a document, the signature of DocumentParser#parse can be simplified by replacing most of its arguments with MappingLookup and retrieving what is needed from it.
This uses the mapping snapshot that we built for the search phase in #66295 for fetching nested documents. This is simpler to reason about because the mapping snapshot is immutable.
…66877) This uses the mapping snapshot that we built for the search phase in elastic#66295 for fetching nested documents. This is simpler to reason about because the mapping snapshot is immutable.
This makes sure that we only serve a hit from the request cache if it
was build using the same mapping and that the entire search phase is
processed using the same mapping. These are both bugs that Elasticsearch
has had for a long, long time but now that we have runtime fields they
are much more likely to matter because runtime fields can be changed
drastically from moment to moment.
Closes #62033