Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core.savedObjects] Add helper for using find with pit and search_after. #92981

Merged
merged 15 commits into from
Mar 24, 2021

Conversation

lukeelmers
Copy link
Member

@lukeelmers lukeelmers commented Feb 26, 2021

Closes #91175
Unblocks #93770

Summary

In #89915 we introduced a PointInTimeFinder to make it easy to iterate over large numbers of saved objects using pit and search_after. This helper was used internally by the SavedObjectsExporter.

This PR exposes the PIT finder as a helper directly on the saved objects client[0] so that it can be used by other plugins which are working with large numbers of saved objects.

Usage

The helper takes in the options you would normally pass to savedObjectsClient.find[1], and returns an async generator which you can iterate over to retrieve subsequent pages of results.

The generator handles opening a Point-In-Time (PIT) in Elasticsearch, and uses the PIT with the search_after param to handle "paging" through results at the specified perPage interval. It will auto-close the PIT on any errors, or when finder.close is called[2].

 const finder = savedObjectsClient.createPointInTimeFinder({
   type: 'visualization',
   search: 'foo*',
   perPage: 100,
 });

 const responses: SavedObjectFindResponse[] = [];
 for await (const response of finder.find()) {
   responses.push(...response);
   if (doneSearching) {
     await finder.close();
   }
 }

Internally, the SO repository injects a logger which is used inside the finder, and the APIs which the finder uses under the hood (find, openPointInTimeForType, and closePointInTime) are passed in from the clients as dependencies, to ensure SO client wrappers still work.

Notes

[0] Note that createPointInTimeFinder is only exposed on the server SO client -- (not on the public SO client or via any REST APIs).

[1] Technically it accepts a subset of the SavedObjectsFindOptions, as the page, pit and searchAfter params are managed by the finder. There's a SavedObjectsCreatePointInTimeFinderOptions interface which reflects this difference.

[2] Do not neglect to close the finder when you are done paging (which closes the PIT), otherwise you will cause your cluster to consume resources unnecessarily.

Testing

The main thing to verify here is that there have been no regressions in the behavior of SavedObjectsExporter... otherwise this is a new API which is currently not used elsewhere. I added some notes to the original PR with a description of how to manually test the exporter.

@lukeelmers
Copy link
Member Author

@elasticmachine merge upstream

*/
export function createPointInTimeFinder({
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested in the last PR, I got rid of this as it wasn't really doing anything. Now the repository just uses the PointInTimeFinder class directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we added the SavedObjectsPointInTimeFinderDependencies, I would actually have kept the factory method 😅

type PITfactory = (deps: SavedObjectsPointInTimeFinderDependencies) => 
   (options: SavedObjectsPointInTimeFinderOptions) => PointInTimeFinder

but I don't mind the current implementation.

src/core/server/server.api.md Outdated Show resolved Hide resolved
@lukeelmers lukeelmers added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc and removed WIP Work in progress labels Mar 11, 2021
@lukeelmers lukeelmers marked this pull request as ready for review March 11, 2021 05:15
@lukeelmers lukeelmers requested review from a team as code owners March 11, 2021 05:15
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@lukeelmers lukeelmers requested review from rudolf and pgayvallet March 11, 2021 05:15
@lukeelmers
Copy link
Member Author

@elasticmachine merge upstream

@jportner jportner self-requested a review March 11, 2021 21:32
Copy link
Member

@ppisljar ppisljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

app arch changes LGTM

Copy link
Member Author

@lukeelmers lukeelmers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't any API exposed by a plugin using createPointInTimeFinder method end up being stateful?

@thomheymann That's a good point -- and yes, this helper would not be well-suited to a case where you want your plugin to expose an API that has a traditional paging mechanism where you make a single API call from the client for each new page of results. It was originally designed for use with saved object exports and exposed to the server only... so it really just works in scenarios where you are either collecting items on the server in memory, or streaming responses back to the client.

I think if we wanted to support page-by-page requests, we'd need to build a dedicated REST API for handling this, which bypasses the generator and more or less proxies queries directly to ES, requiring the API consumer to pass back the _pit and search_after params and implement the generator's logic on their own.

I think that could be a logical next step if folks are asking for it, but I'm not sure it's strictly necessary now. WDYT? (cc @pgayvallet @rudolf)

we could add a unit test to x-pack/plugins/security/server/saved_objects/secure_saved_objects_client_wrapper.test.ts that exercises the PIT Finder and verifies that audit logs are created. I'm a bit ambivalent about this though as it's sort of outside the scope of unit tests for this wrapper, that would be exercising functionality of the PIT Finder itself. WDYT?

@jportner IMHO such a test wouldn't really be providing additional value. We are already testing that the wrapper's find method is being called by the PIT finder, so as long as we already have tests confirming that find itself will produce an audit log, then I think we have coverage for this scenario.

@lukeelmers
Copy link
Member Author

@elasticmachine merge upstream

@jportner
Copy link
Contributor

@jportner IMHO such a test wouldn't really be providing additional value. We are already testing that the wrapper's find method is being called by the PIT finder, so as long as we already have tests confirming that find itself will produce an audit log, then I think we have coverage for this scenario.

Makes sense, agreed, thanks!

Copy link
Contributor

@rudolf rudolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly small nits and documentation suggestions

@rudolf
Copy link
Contributor

rudolf commented Mar 17, 2021

I think if we wanted to support page-by-page requests, we'd need to build a dedicated REST API for handling this, which bypasses the generator and more or less proxies queries directly to ES, requiring the API consumer to pass back the _pit and search_after params and implement the generator's logic on their own.

I think that could be a logical next step if folks are asking for it, but I'm not sure it's strictly necessary now. WDYT? (cc @pgayvallet @rudolf)

yeah I think the intention is for this pointInTimeFinder helper to be convenient but not solve every use case. In this case it explicitly doesn't solve the paging from a browser use case. There's some disparate conversations happening about paging through saved objects. For performance reasons we really shouldn't build UI's that try to allow users to page through e.g. more than 1000 results.

Performance impact #93770
UI / Design discussion #46435

It might be necessary to build an HTTP API that allows external other developers to iterate through many pages of saved objects. We could either solve that like import/export by streaming all results in a single request, or like luke's proxying the PIT and search_after params idea.

@pgayvallet
Copy link
Contributor

I might be misunderstanding the use case but I'm a bit unsure how generators will work for setups with multiple Kibana instances. Wouldn't any API exposed by a plugin using createPointInTimeFinder method end up being stateful?

In addition to #92981 (comment) and #92981 (review):

Yea, this API was meant to be used by server-side code processing the scroll in a single 'batch', not to keep the PIT open to have the client-side ask for the next batch. This use case will effectively not work in multi-instance (or in clustering mode FWIW). Also, even in single instance, that kind of scenarios would risk to reach the PIT timeout if the request for the next page is based on user interaction.

Should we document the limitations and 'forbidden' use cases of this new API/helper?

@lukeelmers
Copy link
Member Author

Should we document the limitations and 'forbidden' use cases of this new API/helper?

Yeah I will document this before merging. I think it would be easy to look at the code examples and think it would be perfect for paging, when really it can only be used if you are handling paging entirely on the client and shipping all the data up front.

@lukeelmers lukeelmers requested a review from rudolf March 18, 2021 21:13
@lukeelmers
Copy link
Member Author

@elasticmachine merge upstream

@lukeelmers
Copy link
Member Author

@elasticmachine merge upstream

@lukeelmers lukeelmers added the auto-backport Deprecated - use backport:version if exact versions are needed label Mar 24, 2021
@lukeelmers lukeelmers enabled auto-merge (squash) March 24, 2021 21:49
@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @lukeelmers

@kibanamachine
Copy link
Contributor

💚 Backport successful

7.x / #95374

This backport PR will be merged automatically after passing CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed Feature:Saved Objects release_note:enhancement Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.13.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[core.savedObjects] Expose PointInTimeFinder as a helper for paging through SOs
8 participants