Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reporting] use point-in-time for paging search results #144201

Merged
merged 15 commits into from
Nov 7, 2022

Conversation

tsullivan
Copy link
Member

@tsullivan tsullivan commented Oct 29, 2022

Summary

Closes #143144

Based on de10eae

Checklist

Reviewer guide

See documentation on paging search results:

The search response includes an array of sort values for each hit. ... To retrieve the next page of results, repeat the request, take the sort values from the last hit, and insert those into the search_after array:

The open point in time request and each subsequent search request can return different id; thus always use the most recently received id for the next search request.

The search response includes an array of sort values for each hit. If you used a PIT, a tiebreaker is included as the last sort values for each hit. This tiebreaker called _shard_doc is added automatically on every search requests that use a PIT. The _shard_doc value is the combination of the shard index within the PIT and the Lucene’s internal doc ID, it is unique per document and constant within a PIT.

Release Note

Fixed a bug with CSV export in Discover, where searching over hundreds of shards would result in an incomplete CSV file.

@tsullivan tsullivan force-pushed the reporting/searchsource-exporttype branch 2 times, most recently from 316043b to 31822a2 Compare November 3, 2022 02:51
@tsullivan tsullivan changed the title [Reporrting] use point-in-time for paging search results [Reporting] use point-in-time for paging search results Nov 3, 2022
@tsullivan tsullivan force-pushed the reporting/searchsource-exporttype branch 2 times, most recently from 6317085 to 31608ff Compare November 3, 2022 03:13
@elastic elastic deleted a comment from kibana-ci Nov 3, 2022
@tsullivan tsullivan added Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience) ci:cloud-deploy Create or update a Cloud deployment ci:cloud-redeploy Always create a new Cloud deployment labels Nov 3, 2022
@tsullivan tsullivan force-pushed the reporting/searchsource-exporttype branch 6 times, most recently from 538cfee to d90e0c4 Compare November 3, 2022 20:53
@tsullivan tsullivan force-pushed the reporting/searchsource-exporttype branch from d90e0c4 to dcccbb5 Compare November 3, 2022 21:54
@tsullivan tsullivan marked this pull request as ready for review November 4, 2022 17:34
@tsullivan tsullivan requested review from a team as code owners November 4, 2022 17:34
@tsullivan tsullivan added release_note:fix (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead labels Nov 4, 2022
@tsullivan tsullivan marked this pull request as draft November 5, 2022 00:29
@tsullivan tsullivan marked this pull request as ready for review November 5, 2022 19:13
searchAfter?: estypes.SortResults;
/**
* Allow querying to use a point-in-time ID for paging results
* Requires searchAfter when the page index is > 1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be enforced in search source, before setting the pit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we chatted a bit to clarify this, and agreed it's better to just get rid of this comment.

Copy link
Contributor

@jloleysens jloleysens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @tsullivan ! I'm glad to see we can employ the recommended way of deeply paging through data for CSV exports 👏🏻


Tesed locally on a small-ish sample - do you have any data on how this performs in Kibana with larger datasets?

@tsullivan tsullivan requested a review from a team as a code owner November 7, 2022 16:02
@kibana-ci
Copy link
Collaborator

kibana-ci commented Nov 7, 2022

💚 Build Succeeded

Metrics [docs]

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
data 401.3KB 401.3KB +43.0B
Unknown metric groups

API count

id before after diff
data 3261 3263 +2

ESLint disabled in files

id before after diff
osquery 1 2 +1

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
fleet 58 64 +6
osquery 108 113 +5
securitySolution 440 446 +6
total +19

References to deprecated APIs

id before after diff
discover 25 30 +5
reporting 5 0 -5
total -0

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
fleet 66 72 +6
osquery 109 115 +6
securitySolution 517 523 +6
total +20

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting ci:cloud-deploy Create or update a Cloud deployment ci:cloud-redeploy Always create a new Cloud deployment (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead release_note:fix Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience) v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Reporting] Improve deep pagination method for CSV export
5 participants