-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement/smarter index selection #5642
Implement/smarter index selection #5642
Conversation
… implement/smarterIndexSelection
When using segmented fetch we currently assume that documents are stored in index patterns based on time, but this is no long necessary with the use of the field stats API. Since documents can be in any order we need to fetch documents from every index and then sort them client side. Since fetching the documents can take some time, we are using the bounds of the time field in an index to try and identify indices which can't produce hits for a result set, and then not fetch their documents. It works like this: - get the list of indices and the min/max of their timefield - fetch the entire sample size from each index until we have enough documents to satisfy the sample. - for each remaining index pattern - if the min/max of it's timefield overlaps the min/max of the fetched documents also fetch the documents for that index pattern - otherwise fetch with size=0
This isn't for 4.2.2 since the field stats stuff was added in 4.3.0. |
Please correct me if I'm wrong! |
const hitWindow = this._hitWindow; | ||
|
||
// the order of documents isn't important, just get us more | ||
if (!this._sortFn) return Math.max(this._desiredSize - hitWindow.size, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hitWindow is used before it is verified to exist
Get rid of the mix of es5 and es6. Stick with es5 in this pull because it changes functionality. You're welcome to convert to es6 in a separate pull that only contains es6 changes. |
.sort(this._sortFn) | ||
.slice(0, this._desiredSize); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged.hits.hits needs to be sliced even if this._sortFn is not defined
…rterIndexSelection
…rterIndexSelection
|
||
notify.event('flatten hit and count fields', function () { | ||
var counts = $scope.fieldCounts; | ||
var counts = $scope.fieldCounts = (sortFn ? {} : $scope.fieldCounts) || {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't need to block this PR, but this line is pretty complicated - two assignments, 2 conditionals, 3 branches. In the future we should try to err on the side of multiple lines for something like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hoped no one would notice
I'm all for changing the default value of desiredSize to |
24fd838
to
e07f8ac
Compare
When using segmented fetch we currently assume that documents are stored in index patterns based on time, but this is no long necessary with the use of the field stats API. Since documents can be in any order we need to fetch documents from every index and then sort them client side. Since fetching the documents can take some time, we are using the bounds of the time field in an index to try and identify indices which can't produce hits for a result set, and then not fetch their documents. It works like this: - get the list of indices and the min/max of their timefield - fetch the entire sample size from each index until we have enough documents to satisfy the sample. - for each remaining index pattern - if the min/max of it's timefield overlaps the min/max of the fetched documents also fetch the documents for that index pattern - otherwise fetch with size=0 Fixes #5642
When using segmented fetch we currently assume that documents are stored in index patterns based on time, but this is no long necessary with the use of the field stats API. Since documents can be in any order we need to fetch documents from every index and then sort them client side. Since fetching the documents can take some time, we are using the bounds of the time field in an index to try and identify indices which can't produce hits for a result set, and then not fetch their documents. It works like this: - get the list of indices and the min/max of their timefield - fetch the entire sample size from each index until we have enough documents to satisfy the sample. - for each remaining index pattern - if the min/max of it's timefield overlaps the min/max of the fetched documents also fetch the documents for that index pattern - otherwise fetch with size=0 Fixes #5642
When using segmented fetch we currently assume that documents are stored in index patterns based on time, but this is no longer a safe assumption. Since documents can be in any order we need to fetch documents from every index and then sort them client side. Since fetching the documents can take some time, we are using the bounds of the time field in an index to try and identify indices which can't produce hits for a result set, and then not fetch their documents.
It works like this:
Issues:
For #5605