Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bust the request cache when the mapping changes #66295

Merged
merged 62 commits into from
Dec 23, 2020

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Dec 14, 2020

This makes sure that we only serve a hit from the request cache if it
was build using the same mapping and that the entire search phase is
processed using the same mapping. These are both bugs that Elasticsearch
has had for a long, long time but now that we have runtime fields they
are much more likely to matter because runtime fields can be changed
drastically from moment to moment.

Closes #62033

This makes sure that we only serve a hit from the request cache if it
was build using the same mapping and that the entire search phase is
processed using the same mapping. These are both bugs that Elasticsearch
has had for a long, long time but now that we have runtime fields they
are much more likely to matter because runtime fields can be changed
*drastically* from moment to moment.

It does by adding a counter to the local mapping that is incremented
whenever the mapping changes. This counter and the mapping is stored
under a single `volatile` variable that is updated to point to the new
mapping whenever the mapping changes. When the search phase begins we
read the contents of the `volatile` and use it to power the entire
phase, and including the version counter into the cache key.

Before this change we'd go back to the `volatile` over and over again,
potentially getting a new version of the mapping every time. This was
convenient but made it possible that we'd see the mapping update half
way through the search phase which could cause trouble.

Mechanically, this creates a `snapshot()` method on `MapperService` to
do the volatile read of the current mapping. The snapshot does all of
the reading we need from here on out. I kept all of the methods on
`MapperService` that read directly from the mapping but deprecated all
of them because using them can lead to seeing the mapping updates when
you don't expect them. We can slowly remove the usages in follow up
changes. This change is big enough as is.

Sadly, the `Mapper`s are also mutable. Luckilly, we only mutate them
before sticking them behind the `volatile` reference. Hopefully. There
is a lot of code there and it is hard to be sure. But it looks like we
*intend* to do this. We should make them properly immutable in a follow
up change.

Closes elastic#62033
@nik9000
Copy link
Member Author

nik9000 commented Dec 14, 2020

Draft so I can iterate on this while Jenkins runs the tests for me.

@nik9000 nik9000 added :Search/Search Search-related issues that do not fall into other categories >bug v7.11.0 labels Dec 15, 2020
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Dec 23, 2020
Now that elastic#66295 has landed in 7.11 we can run the bwc tests it adds
against that branch.
nik9000 added a commit that referenced this pull request Dec 23, 2020
Now that #66295 has landed in 7.11 we can run the bwc tests it adds
against that branch.
nik9000 added a commit that referenced this pull request Dec 23, 2020
Now that #66295 has landed in 7.11 we can run the bwc tests it adds
against that branch.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Dec 29, 2020
This uses the mapping snapshot that we built for the search phase
in elastic#66295 for fetching nested documents. This is simpler to reason about
because the mapping snapshot is immutable.
javanna added a commit that referenced this pull request Jan 4, 2021
…66780)

Currently, an incoming document is parsed through `DocumentMapper#parse`, which in turns calls `DocumentParser#parseDocument` providing `this` among other arguments. As part of the effort to reduce usages of `DocumentMapper` when possible, as it represents the mutable side of mappings (through mappings updates) and involves complexity, we can carry around only the needed components. This does add some required arguments to `DocumentParser#parseDocument` , though it makes dependencies clearer. This change does not affect end consumers as they all go through DocumentMapper anyways, but by not needed to provide DocumentMapper to parseDocument, we may be able to unblock further improvements down the line.

Relates to #66295
javanna added a commit to javanna/elasticsearch that referenced this pull request Jan 4, 2021
…lastic#66780)

Currently, an incoming document is parsed through `DocumentMapper#parse`, which in turns calls `DocumentParser#parseDocument` providing `this` among other arguments. As part of the effort to reduce usages of `DocumentMapper` when possible, as it represents the mutable side of mappings (through mappings updates) and involves complexity, we can carry around only the needed components. This does add some required arguments to `DocumentParser#parseDocument` , though it makes dependencies clearer. This change does not affect end consumers as they all go through DocumentMapper anyways, but by not needed to provide DocumentMapper to parseDocument, we may be able to unblock further improvements down the line.

Relates to elastic#66295
@jtibshirani
Copy link
Contributor

The refactor #66877 made me notice that we aren't guaranteed to use the same MappingLookup for the query and fetch phases. This is because we create a new QueryShardContext in each phase. Here's my understanding of the current situation:

  1. When the mapping changes, we will never serve a cached response that used an old version of the mappings.
  2. We try hard to use the same mappings/ settings through the search request. However there are some cases where that's not true -- for example we could use a newer mapping version when fetching vs. querying.

It's great we covered 1, as it was a critical issue with runtime fields. I think point 2 is a strong goal, but not a blocker. Does this match your understanding too? We could file a dedicated issue to track it, listing the places where it's not yet consistent.

@nik9000
Copy link
Member Author

nik9000 commented Jan 4, 2021

Yeah. We don't do anything in particular to protect against can_match getting a different mapping either. But the things that can cause it to return true or false are mostly immutable at the moment.

javanna added a commit that referenced this pull request Jan 5, 2021
…66780)

Currently, an incoming document is parsed through `DocumentMapper#parse`, which in turns calls `DocumentParser#parseDocument` providing `this` among other arguments. As part of the effort to reduce usages of `DocumentMapper` when possible, as it represents the mutable side of mappings (through mappings updates) and involves complexity, we can carry around only the needed components. This does add some required arguments to `DocumentParser#parseDocument` , though it makes dependencies clearer. This change does not affect end consumers as they all go through DocumentMapper anyways, but by not needed to provide DocumentMapper to parseDocument, we may be able to unblock further improvements down the line.

Relates to #66295
javanna added a commit to javanna/elasticsearch that referenced this pull request Jan 7, 2021
As part of elastic#66295 we made QueryShardContext perform mapping lookups through MappingLookup rather than MapperService. That helps as MapperService relies on DocumentMapper which may change througout the execution of the search request. At search time, the percolate query also needs to parse documents, which made us add a parse method to MappingLookup.Such parse method currently relies on calling DocumentMapper#parseDocument through a function, but we would like to rather make this easier to follow. (see https://github.com/elastic/elasticsearch/pull/66295/files#r544639868)

We recently removed the need to provide the entire DocumentMapper to DocumentParser#parse, opening the possibility for using DocumentParser directly when needing to parse a document at query time. This commit adds everything that is needed (namely Mapping, IndexSettings and IndexAnalyzers) to MappingLookup so that it can parse a document through DocumentParser without relying on DocumentMapper.

As a bonus, given that MappingLookup holds a reference to these three additional objects, we can make DocumentMapper rely on MappingLookup to retrieve those and not hold its own same references to them.
Along the same lines, given that MappingLookup holds all that's necessary to parse a document, the signature of DocumentParser#parse can be simplified by replacing most of its arguments with MappingLookup and retrieving what is needed from it.
javanna added a commit that referenced this pull request Jan 12, 2021
As part of #66295 we made QueryShardContext perform mapping lookups through MappingLookup rather than MapperService. That helps as MapperService relies on DocumentMapper which may change througout the execution of the search request. At search time, the percolate query also needs to parse documents, which made us add a parse method to MappingLookup.Such parse method currently relies on calling DocumentMapper#parseDocument through a function, but we would like to rather make this easier to follow. (see https://github.com/elastic/elasticsearch/pull/66295/files#r544639868)

We recently removed the need to provide the entire DocumentMapper to DocumentParser#parse, opening the possibility for using DocumentParser directly when needing to parse a document at query time. This commit adds everything that is needed (namely Mapping, IndexSettings and IndexAnalyzers) to MappingLookup so that it can parse a document through DocumentParser without relying on DocumentMapper.

As a bonus, given that MappingLookup holds a reference to these three additional objects, we can make DocumentMapper rely on MappingLookup to retrieve those and not hold its own same references to them.
Along the same lines, given that MappingLookup holds all that's necessary to parse a document, the signature of DocumentParser#parse can be simplified by replacing most of its arguments with MappingLookup and retrieving what is needed from it.
javanna added a commit to javanna/elasticsearch that referenced this pull request Jan 12, 2021
As part of elastic#66295 we made QueryShardContext perform mapping lookups through MappingLookup rather than MapperService. That helps as MapperService relies on DocumentMapper which may change througout the execution of the search request. At search time, the percolate query also needs to parse documents, which made us add a parse method to MappingLookup.Such parse method currently relies on calling DocumentMapper#parseDocument through a function, but we would like to rather make this easier to follow. (see https://github.com/elastic/elasticsearch/pull/66295/files#r544639868)

We recently removed the need to provide the entire DocumentMapper to DocumentParser#parse, opening the possibility for using DocumentParser directly when needing to parse a document at query time. This commit adds everything that is needed (namely Mapping, IndexSettings and IndexAnalyzers) to MappingLookup so that it can parse a document through DocumentParser without relying on DocumentMapper.

As a bonus, given that MappingLookup holds a reference to these three additional objects, we can make DocumentMapper rely on MappingLookup to retrieve those and not hold its own same references to them.
Along the same lines, given that MappingLookup holds all that's necessary to parse a document, the signature of DocumentParser#parse can be simplified by replacing most of its arguments with MappingLookup and retrieving what is needed from it.
javanna added a commit that referenced this pull request Jan 12, 2021
As part of #66295 we made QueryShardContext perform mapping lookups through MappingLookup rather than MapperService. That helps as MapperService relies on DocumentMapper which may change througout the execution of the search request. At search time, the percolate query also needs to parse documents, which made us add a parse method to MappingLookup.Such parse method currently relies on calling DocumentMapper#parseDocument through a function, but we would like to rather make this easier to follow. (see https://github.com/elastic/elasticsearch/pull/66295/files#r544639868)

We recently removed the need to provide the entire DocumentMapper to DocumentParser#parse, opening the possibility for using DocumentParser directly when needing to parse a document at query time. This commit adds everything that is needed (namely Mapping, IndexSettings and IndexAnalyzers) to MappingLookup so that it can parse a document through DocumentParser without relying on DocumentMapper.

As a bonus, given that MappingLookup holds a reference to these three additional objects, we can make DocumentMapper rely on MappingLookup to retrieve those and not hold its own same references to them.
Along the same lines, given that MappingLookup holds all that's necessary to parse a document, the signature of DocumentParser#parse can be simplified by replacing most of its arguments with MappingLookup and retrieving what is needed from it.
nik9000 added a commit that referenced this pull request Jan 13, 2021
This uses the mapping snapshot that we built for the search phase
in #66295 for fetching nested documents. This is simpler to reason about
because the mapping snapshot is immutable.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jan 13, 2021
…66877)

This uses the mapping snapshot that we built for the search phase
in elastic#66295 for fetching nested documents. This is simpler to reason about
because the mapping snapshot is immutable.
nik9000 added a commit that referenced this pull request Jan 13, 2021
…67451)

This uses the mapping snapshot that we built for the search phase
in #66295 for fetching nested documents. This is simpler to reason about
because the mapping snapshot is immutable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v7.11.0 v7.12.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request cache and Runtime Fields
5 participants