Bust the request cache when the mapping changes #66295

nik9000 · 2020-12-14T19:44:01Z

This makes sure that we only serve a hit from the request cache if it
was build using the same mapping and that the entire search phase is
processed using the same mapping. These are both bugs that Elasticsearch
has had for a long, long time but now that we have runtime fields they
are much more likely to matter because runtime fields can be changed
drastically from moment to moment.

Closes #62033

This makes sure that we only serve a hit from the request cache if it was build using the same mapping and that the entire search phase is processed using the same mapping. These are both bugs that Elasticsearch has had for a long, long time but now that we have runtime fields they are much more likely to matter because runtime fields can be changed *drastically* from moment to moment. It does by adding a counter to the local mapping that is incremented whenever the mapping changes. This counter and the mapping is stored under a single `volatile` variable that is updated to point to the new mapping whenever the mapping changes. When the search phase begins we read the contents of the `volatile` and use it to power the entire phase, and including the version counter into the cache key. Before this change we'd go back to the `volatile` over and over again, potentially getting a new version of the mapping every time. This was convenient but made it possible that we'd see the mapping update half way through the search phase which could cause trouble. Mechanically, this creates a `snapshot()` method on `MapperService` to do the volatile read of the current mapping. The snapshot does all of the reading we need from here on out. I kept all of the methods on `MapperService` that read directly from the mapping but deprecated all of them because using them can lead to seeing the mapping updates when you don't expect them. We can slowly remove the usages in follow up changes. This change is big enough as is. Sadly, the `Mapper`s are also mutable. Luckilly, we only mutate them before sticking them behind the `volatile` reference. Hopefully. There is a lot of code there and it is hard to be sure. But it looks like we *intend* to do this. We should make them properly immutable in a follow up change. Closes elastic#62033

nik9000 · 2020-12-14T19:44:21Z

Draft so I can iterate on this while Jenkins runs the tests for me.

Now that elastic#66295 has landed in 7.11 we can run the bwc tests it adds against that branch.

Now that #66295 has landed in 7.11 we can run the bwc tests it adds against that branch.

This uses the mapping snapshot that we built for the search phase in elastic#66295 for fetching nested documents. This is simpler to reason about because the mapping snapshot is immutable.

…66780) Currently, an incoming document is parsed through `DocumentMapper#parse`, which in turns calls `DocumentParser#parseDocument` providing `this` among other arguments. As part of the effort to reduce usages of `DocumentMapper` when possible, as it represents the mutable side of mappings (through mappings updates) and involves complexity, we can carry around only the needed components. This does add some required arguments to `DocumentParser#parseDocument` , though it makes dependencies clearer. This change does not affect end consumers as they all go through DocumentMapper anyways, but by not needed to provide DocumentMapper to parseDocument, we may be able to unblock further improvements down the line. Relates to #66295

…lastic#66780) Currently, an incoming document is parsed through `DocumentMapper#parse`, which in turns calls `DocumentParser#parseDocument` providing `this` among other arguments. As part of the effort to reduce usages of `DocumentMapper` when possible, as it represents the mutable side of mappings (through mappings updates) and involves complexity, we can carry around only the needed components. This does add some required arguments to `DocumentParser#parseDocument` , though it makes dependencies clearer. This change does not affect end consumers as they all go through DocumentMapper anyways, but by not needed to provide DocumentMapper to parseDocument, we may be able to unblock further improvements down the line. Relates to elastic#66295

jtibshirani · 2021-01-04T19:25:22Z

The refactor #66877 made me notice that we aren't guaranteed to use the same MappingLookup for the query and fetch phases. This is because we create a new QueryShardContext in each phase. Here's my understanding of the current situation:

When the mapping changes, we will never serve a cached response that used an old version of the mappings.
We try hard to use the same mappings/ settings through the search request. However there are some cases where that's not true -- for example we could use a newer mapping version when fetching vs. querying.

It's great we covered 1, as it was a critical issue with runtime fields. I think point 2 is a strong goal, but not a blocker. Does this match your understanding too? We could file a dedicated issue to track it, listing the places where it's not yet consistent.

nik9000 · 2021-01-04T21:08:02Z

Yeah. We don't do anything in particular to protect against can_match getting a different mapping either. But the things that can cause it to return true or false are mostly immutable at the moment.

…66780) Currently, an incoming document is parsed through `DocumentMapper#parse`, which in turns calls `DocumentParser#parseDocument` providing `this` among other arguments. As part of the effort to reduce usages of `DocumentMapper` when possible, as it represents the mutable side of mappings (through mappings updates) and involves complexity, we can carry around only the needed components. This does add some required arguments to `DocumentParser#parseDocument` , though it makes dependencies clearer. This change does not affect end consumers as they all go through DocumentMapper anyways, but by not needed to provide DocumentMapper to parseDocument, we may be able to unblock further improvements down the line. Relates to #66295

As part of elastic#66295 we made QueryShardContext perform mapping lookups through MappingLookup rather than MapperService. That helps as MapperService relies on DocumentMapper which may change througout the execution of the search request. At search time, the percolate query also needs to parse documents, which made us add a parse method to MappingLookup.Such parse method currently relies on calling DocumentMapper#parseDocument through a function, but we would like to rather make this easier to follow. (see https://github.com/elastic/elasticsearch/pull/66295/files#r544639868) We recently removed the need to provide the entire DocumentMapper to DocumentParser#parse, opening the possibility for using DocumentParser directly when needing to parse a document at query time. This commit adds everything that is needed (namely Mapping, IndexSettings and IndexAnalyzers) to MappingLookup so that it can parse a document through DocumentParser without relying on DocumentMapper. As a bonus, given that MappingLookup holds a reference to these three additional objects, we can make DocumentMapper rely on MappingLookup to retrieve those and not hold its own same references to them. Along the same lines, given that MappingLookup holds all that's necessary to parse a document, the signature of DocumentParser#parse can be simplified by replacing most of its arguments with MappingLookup and retrieving what is needed from it.

As part of #66295 we made QueryShardContext perform mapping lookups through MappingLookup rather than MapperService. That helps as MapperService relies on DocumentMapper which may change througout the execution of the search request. At search time, the percolate query also needs to parse documents, which made us add a parse method to MappingLookup.Such parse method currently relies on calling DocumentMapper#parseDocument through a function, but we would like to rather make this easier to follow. (see https://github.com/elastic/elasticsearch/pull/66295/files#r544639868) We recently removed the need to provide the entire DocumentMapper to DocumentParser#parse, opening the possibility for using DocumentParser directly when needing to parse a document at query time. This commit adds everything that is needed (namely Mapping, IndexSettings and IndexAnalyzers) to MappingLookup so that it can parse a document through DocumentParser without relying on DocumentMapper. As a bonus, given that MappingLookup holds a reference to these three additional objects, we can make DocumentMapper rely on MappingLookup to retrieve those and not hold its own same references to them. Along the same lines, given that MappingLookup holds all that's necessary to parse a document, the signature of DocumentParser#parse can be simplified by replacing most of its arguments with MappingLookup and retrieving what is needed from it.

As part of elastic#66295 we made QueryShardContext perform mapping lookups through MappingLookup rather than MapperService. That helps as MapperService relies on DocumentMapper which may change througout the execution of the search request. At search time, the percolate query also needs to parse documents, which made us add a parse method to MappingLookup.Such parse method currently relies on calling DocumentMapper#parseDocument through a function, but we would like to rather make this easier to follow. (see https://github.com/elastic/elasticsearch/pull/66295/files#r544639868) We recently removed the need to provide the entire DocumentMapper to DocumentParser#parse, opening the possibility for using DocumentParser directly when needing to parse a document at query time. This commit adds everything that is needed (namely Mapping, IndexSettings and IndexAnalyzers) to MappingLookup so that it can parse a document through DocumentParser without relying on DocumentMapper. As a bonus, given that MappingLookup holds a reference to these three additional objects, we can make DocumentMapper rely on MappingLookup to retrieve those and not hold its own same references to them. Along the same lines, given that MappingLookup holds all that's necessary to parse a document, the signature of DocumentParser#parse can be simplified by replacing most of its arguments with MappingLookup and retrieving what is needed from it.

As part of #66295 we made QueryShardContext perform mapping lookups through MappingLookup rather than MapperService. That helps as MapperService relies on DocumentMapper which may change througout the execution of the search request. At search time, the percolate query also needs to parse documents, which made us add a parse method to MappingLookup.Such parse method currently relies on calling DocumentMapper#parseDocument through a function, but we would like to rather make this easier to follow. (see https://github.com/elastic/elasticsearch/pull/66295/files#r544639868) We recently removed the need to provide the entire DocumentMapper to DocumentParser#parse, opening the possibility for using DocumentParser directly when needing to parse a document at query time. This commit adds everything that is needed (namely Mapping, IndexSettings and IndexAnalyzers) to MappingLookup so that it can parse a document through DocumentParser without relying on DocumentMapper. As a bonus, given that MappingLookup holds a reference to these three additional objects, we can make DocumentMapper rely on MappingLookup to retrieve those and not hold its own same references to them. Along the same lines, given that MappingLookup holds all that's necessary to parse a document, the signature of DocumentParser#parse can be simplified by replacing most of its arguments with MappingLookup and retrieving what is needed from it.

This uses the mapping snapshot that we built for the search phase in #66295 for fetching nested documents. This is simpler to reason about because the mapping snapshot is immutable.

…66877) This uses the mapping snapshot that we built for the search phase in elastic#66295 for fetching nested documents. This is simpler to reason about because the mapping snapshot is immutable.

…67451) This uses the mapping snapshot that we built for the search phase in #66295 for fetching nested documents. This is simpler to reason about because the mapping snapshot is immutable.

nik9000 added 21 commits November 24, 2020 09:17

Refactor

8c808e7

Refactor again

3fc61e8

Javadoc

1dedc0a

Moar javadoc

fddfdb0

Merge branch 'master' into cache_bust

b42f493

Add example in core

58af6fe

Merge branch 'master' into cache_bust

5f54e5b

WIP

177e4d8

WIP

e597164

Merge branch 'master' into cache_bust

72887cd

ooops:

3587385

Compile plz

82b2a53

Fix test

d30da9a

Checkstyle

bcd8e52

Merge branch 'master' into cache_bust

c3ace97

WIP

81b9824

Merge branch 'master' into cache_bust_take_two

a36ea7f

Checkstyle

dde4996

Implement sourceEnabled

7519d77

Fixup test

56cc647

nik9000 added 5 commits December 15, 2020 09:24

Merge branch 'master' into cache_bust_take_two

a075886

Fix test

76a512c

Merge branch 'master' into cache_bust_take_two

5769100

Merge branch 'master' into cache_bust_take_two

d4fe787

Merge branch 'master' into cache_bust_take_two

c1cf01b

nik9000 added :Search/Search Search-related issues that do not fall into other categories >bug v7.11.0 labels Dec 15, 2020

nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Dec 23, 2020

Update skip after backport of elastic#66295

488ff8d

Now that elastic#66295 has landed in 7.11 we can run the bwc tests it adds against that branch.

nik9000 mentioned this pull request Dec 23, 2020

Update skip after backport of #66295 #66808

Merged

nik9000 added a commit that referenced this pull request Dec 23, 2020

Update skip after backport of #66295 (#66808)

dd1ffe3

Now that #66295 has landed in 7.11 we can run the bwc tests it adds against that branch.

nik9000 removed the backport pending label Dec 23, 2020

nik9000 added a commit that referenced this pull request Dec 23, 2020

Update skip after backport of #66295 (#66807)

d231202

Now that #66295 has landed in 7.11 we can run the bwc tests it adds against that branch.

nik9000 mentioned this pull request Dec 29, 2020

Use a mapping snapshot for fetching nested docs #66877

Merged

javanna mentioned this pull request Jan 4, 2021

Don't require DocumentMapper as an argument when parsing a document #66930

Merged

javanna mentioned this pull request Jan 7, 2021

Remove last DocumentMapper reference from MappingLookup #67157

Merged

javanna mentioned this pull request Jan 12, 2021

Remove last DocumentMapper reference from MappingLookup (#67157) #67340

Merged

nik9000 mentioned this pull request Jan 13, 2021

Use a mapping snapshot for fetching nested docs (backport of #66877) #67451

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bust the request cache when the mapping changes #66295

Bust the request cache when the mapping changes #66295

nik9000 commented Dec 14, 2020 •

edited

Loading

nik9000 commented Dec 14, 2020

jtibshirani commented Jan 4, 2021

nik9000 commented Jan 4, 2021

Bust the request cache when the mapping changes #66295

Bust the request cache when the mapping changes #66295

Conversation

nik9000 commented Dec 14, 2020 • edited Loading

nik9000 commented Dec 14, 2020

jtibshirani commented Jan 4, 2021

nik9000 commented Jan 4, 2021

nik9000 commented Dec 14, 2020 •

edited

Loading