Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement/smarter index selection #5642

Merged
merged 13 commits into from
Dec 11, 2015

Conversation

spalger
Copy link
Contributor

@spalger spalger commented Dec 11, 2015

When using segmented fetch we currently assume that documents are stored in index patterns based on time, but this is no longer a safe assumption. Since documents can be in any order we need to fetch documents from every index and then sort them client side. Since fetching the documents can take some time, we are using the bounds of the time field in an index to try and identify indices which can't produce hits for a result set, and then not fetch their documents.

It works like this:

  • get the list of indices and the min/max of their timefield
  • fetch the entire sample size from each index until we have enough documents to satisfy the sample.
  • for each remaining index
    • if the min/max of it's timefield overlaps the min/max of the fetched documents also fetch the documents for that index pattern
    • otherwise fetch with size=0

Issues:

  • it seems that the loading indicator on the histogram is not happy

For #5605

spalger added 2 commits December 10, 2015 16:55
When using segmented fetch we currently assume that documents are stored in index patterns based on time, but this is no long necessary with the use of the field stats API. Since documents can be in any order we need to fetch documents from every index and then sort them client side. Since fetching the documents can take some time, we are using the bounds of the time field in an index to try and identify indices which can't produce hits for a result set, and then not fetch their documents.

It works like this:
 - get the list of indices and the min/max of their timefield
 - fetch the entire sample size from each index until we have enough documents to satisfy the sample.
 - for each remaining index pattern
   - if the min/max of it's timefield overlaps the min/max of the fetched documents also fetch the documents for that index pattern
   - otherwise fetch with size=0
@epixa
Copy link
Contributor

epixa commented Dec 11, 2015

This isn't for 4.2.2 since the field stats stuff was added in 4.3.0.

@epixa
Copy link
Contributor

epixa commented Dec 11, 2015

Please correct me if I'm wrong!

@epixa epixa removed the v4.2.2 label Dec 11, 2015
const hitWindow = this._hitWindow;

// the order of documents isn't important, just get us more
if (!this._sortFn) return Math.max(this._desiredSize - hitWindow.size, 0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hitWindow is used before it is verified to exist

@rashidkpc
Copy link
Contributor

Get rid of the mix of es5 and es6. Stick with es5 in this pull because it changes functionality. You're welcome to convert to es6 in a separate pull that only contains es6 changes.

.sort(this._sortFn)
.slice(0, this._desiredSize);
});
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged.hits.hits needs to be sliced even if this._sortFn is not defined

@spalger spalger added the review label Dec 11, 2015

notify.event('flatten hit and count fields', function () {
var counts = $scope.fieldCounts;
var counts = $scope.fieldCounts = (sortFn ? {} : $scope.fieldCounts) || {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to block this PR, but this line is pretty complicated - two assignments, 2 conditionals, 3 branches. In the future we should try to err on the side of multiple lines for something like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hoped no one would notice

@epixa
Copy link
Contributor

epixa commented Dec 11, 2015

I'm all for changing the default value of desiredSize to undefined, but what do you think about using isFinite() (either from lodash or Number) instead of dealing with null coercion in a bunch of places. It's a little more explicit and expressive.

@spalger spalger force-pushed the implement/smarterIndexSelection branch from 24fd838 to e07f8ac Compare December 11, 2015 22:52
spalger pushed a commit that referenced this pull request Dec 11, 2015
spalger pushed a commit that referenced this pull request Dec 11, 2015
spalger pushed a commit that referenced this pull request Dec 11, 2015
spalger pushed a commit that referenced this pull request Dec 11, 2015
spalger pushed a commit that referenced this pull request Dec 11, 2015
Fixes #5642
spalger pushed a commit that referenced this pull request Dec 11, 2015
When using segmented fetch we currently assume that documents are stored in index patterns based on time, but this is no long necessary with the use of the field stats API. Since documents can be in any order we need to fetch documents from every index and then sort them client side. Since fetching the documents can take some time, we are using the bounds of the time field in an index to try and identify indices which can't produce hits for a result set, and then not fetch their documents.

It works like this:
 - get the list of indices and the min/max of their timefield
 - fetch the entire sample size from each index until we have enough documents to satisfy the sample.
 - for each remaining index pattern
   - if the min/max of it's timefield overlaps the min/max of the fetched documents also fetch the documents for that index pattern
   - otherwise fetch with size=0

Fixes #5642
spalger pushed a commit that referenced this pull request Dec 11, 2015
Fixes #5642
spalger pushed a commit that referenced this pull request Dec 11, 2015
spalger pushed a commit that referenced this pull request Dec 11, 2015
spalger pushed a commit that referenced this pull request Dec 11, 2015
spalger pushed a commit that referenced this pull request Dec 11, 2015
spalger pushed a commit that referenced this pull request Dec 11, 2015
spalger pushed a commit that referenced this pull request Dec 11, 2015
spalger pushed a commit that referenced this pull request Dec 11, 2015
Fixes #5642
spalger pushed a commit that referenced this pull request Dec 11, 2015
When using segmented fetch we currently assume that documents are stored in index patterns based on time, but this is no long necessary with the use of the field stats API. Since documents can be in any order we need to fetch documents from every index and then sort them client side. Since fetching the documents can take some time, we are using the bounds of the time field in an index to try and identify indices which can't produce hits for a result set, and then not fetch their documents.

It works like this:
 - get the list of indices and the min/max of their timefield
 - fetch the entire sample size from each index until we have enough documents to satisfy the sample.
 - for each remaining index pattern
   - if the min/max of it's timefield overlaps the min/max of the fetched documents also fetch the documents for that index pattern
   - otherwise fetch with size=0

Fixes #5642
@elasticsearch-bot
Copy link

Court Ewing merged this into the following branches!

Branch Commits
4.3.1 3a55e6c, 7619843, b78e797, c23e839, 6c9ac7b, 66cb5e8, 4bff792, 875d445, 4180d40

spalger pushed a commit that referenced this pull request Dec 11, 2015
spalger pushed a commit that referenced this pull request Dec 11, 2015
spalger pushed a commit that referenced this pull request Dec 11, 2015
spalger pushed a commit that referenced this pull request Dec 11, 2015
spalger pushed a commit that referenced this pull request Dec 11, 2015
spalger pushed a commit that referenced this pull request Dec 11, 2015
spalger pushed a commit that referenced this pull request Dec 11, 2015
spalger pushed a commit that referenced this pull request Dec 11, 2015
@spalger spalger deleted the implement/smarterIndexSelection branch February 25, 2016 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants