Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add search_after support to SO find API using PIT #86301

Closed
joshdover opened this issue Dec 17, 2020 · 15 comments · Fixed by #89915
Closed

Add search_after support to SO find API using PIT #86301

joshdover opened this issue Dec 17, 2020 · 15 comments · Fixed by #89915
Assignees
Labels
Feature:Saved Objects Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@joshdover
Copy link
Contributor

joshdover commented Dec 17, 2020

Part of #77961 (comment)

Integrate with exports in the same PR to exercise the new API

@joshdover joshdover added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Saved Objects labels Dec 17, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@pgayvallet
Copy link
Contributor

pgayvallet commented Dec 18, 2020

From https://www.elastic.co/guide/en/elasticsearch/reference/7.x/paginate-search-results.html, even when using a PIT,

We recommend you include a tiebreaker field in your sort. This tiebreaker field should contain a unique value for each document. If you don’t include a tiebreaker field, your paged results could miss or duplicate hits.

If one of the initial need for search_after is allowing SO export of more than 10k, we may still need a tiebreaker field (missing records when exporting seems dealbreaker)

Also

Part of #52226 (comment)

I don't think this is pointing to the correct issue?

@joshdover
Copy link
Contributor Author

If one of the initial need for search_after is allowing SO export of more than 10k, we may still need a tiebreaker field (missing records when exporting seems dealbreaker)

If this is the case, then I don't think moving forward with this gets us much. We're likely to need to do #86300, but since that requires some non-trivial changes to SO migrations, we had decided to delay that until after SO Migrations v2 has shipped.

@lukeelmers Could you confirm that the note in the doc above is accurate with the Elasticsearch (Distrib?) team? And if so, do you agree that we should kick this can down the road until after 7.12 / Migrations v2?

@lukeelmers
Copy link
Member

If one of the initial need for search_after is allowing SO export of more than 10k, we may still need a tiebreaker field (missing records when exporting seems dealbreaker)

As discussed offline, it looks as if this should be resolved in 7.12 by elastic/elasticsearch#66093, so we can still move forward.

@lukeelmers
Copy link
Member

lukeelmers commented Jan 27, 2021

Been spending some time working on this, and I think the key question is how we want to deal with PIT...

Adding search_after support is pretty straightforward, e.g.

await so.find({
  search: 'foo*',
  type: ['whatever'],
  sortField: '@timestamp',
  sortOrder: 'desc',
+ searchAfter: [123456, 'some_id']
});

We would add a pit option too:

await so.find({
  search: 'foo*',
  type: ['whatever'],
  sortField: '@timestamp',
  sortOrder: 'desc',
+ pit: { id: 'some_pit', keep_alive: '1m' },
});

Then we’d also need to update SavedObjectsFindResponse to ensure the pit_id returned from a search is included, as this id is provided at the top level outside of hits:

export interface SavedObjectsFindResponse<T = unknown> {
  saved_objects: Array<SavedObjectsFindResult<T>>;
  total: number;
  per_page: number;
  page: number;
+ pit_id?: string;
}

But here's where it gets interesting: you need to open a PIT against the relevant index in advance (as a separate request), however consumers of find don't necessarily know the underlying index for their SO type. So we need to either make an easy way for folks to retrieve a type's index so they can handle opening/closing the PIT on their own, or we need to provide a way to magically handle this for them in find.

Here are the options I've come up with so far:

  1. Make consumers manually look up a SO type from the registry, get its indexPattern, and fall back to kibana index if unavailable.
  2. Expose a way to get the underlying index for a particular type from the client (repository already has a private getIndexForType method which could be made public).
  3. Add a public method to SO client for opening a PIT for a certain SO type, so you don't need to retrieve the index at all, just call something like client.openPitForType
  4. Provide some type of configuration option to find which could automatically open a PIT for you (openPit: true?), and then include it in the search without requiring you to do anything else.

I'm not a huge fan of (1) or (3), so I'm leaning toward (2) since (so far) we will be the only ones using PIT with the SO client, and it is the least invasive. Plus it wouldn't preclude us from later adding a more magical way to do this as in (4). But I'd like to get feedback from others.

@rudolf @pgayvallet @joshdover WDYT? Are there any other options I'm missing?

@pgayvallet
Copy link
Contributor

however consumers of find don't necessarily know the underlying index for their SO type

It's important to note that SOC.find accepts an array of types. And those types are not necessarily in the same index.

const esOptions = {
index: this.getIndicesForTypes(allowedTypes),

From what I see in the ES documentation, a PIT seems to be bound to a (single) specific index, which means that when using find with a PIT, we should either restrict to single type search, or assert that all the types the consumers wants to search for live in the same index (first option seems less error prone imho, but it's open to discussion)

My first question would be: is this single-type limitation dealbreaker for the issue we want to address here? I don't think it is, but just want to be sure.

Here are the options I've come up with so far:

Imho, 1. and 2. are too low level, and I don't really see any upside compared to either 3. or 4..

Also, I don't like consumers being forced to use both the es AND so services to be able to perform SO related operations. Imho, consumers should be able to interact with SOs exclusively with the SO client. (eh, I'm a madman: ES should be an implementation detail of SOs - as much as possible. But let's keep that discussion for TrollDay)

Now, between an explicit API (openPitForType) and an implicit one (openPit: true), I think I would personally go with the first one. That would allow to pass options when creating the PIT (such as the keep_alive period, which is the only one atm, but that seems a little more future-proof in case of additions?). So I would go with 3. But 4. would be perfectly fine too, as long as we add the appropriate validation (like, can't use pit AND createPit when calling find). I don't have a strong opinion between these two.

So in short, for me, it would be either 3. or 4.. But I would love to have @rudolf thoughts on that one.

since (so far) we will be the only ones using PIT with the SO client

Are we? from #77961 (comment), the main need seems to be coming from plugins?

Are there any other options I'm missing?

We could go with a totally distinct API instead of increasing (yet again) the options of find, but I don't think that would make much sense atm (this would be part of a bigger plan). Apart from that, I think your list is quite exhaustive.

@rudolf
Copy link
Contributor

rudolf commented Jan 27, 2021

I tried using the API and it successfully opened a PIT against multiple indices, so I think the docs could be misleading (but I didn't do a full test to search against my PIT).

My first question would be: is this single-type limitation dealbreaker for the issue we want to address here? I don't think it is, but just want to be sure.

We'd like to use this to make backup exports which should include all saved object types (if we really need to we could make a request per type, but it would be more awkward and we'll loose the benefit of the consistency of a single PIT).

I agree with @pgayvallet, option three seems preferable. It's two api calls for a consumer but if that's how the ES API works it seems better to do the same.

This is a bit of a meta comment... but we have many examples where saved objects have added too much abstraction on top of ES and it's limiting the features we can expose and holding back plugins. So I envision saved objects becoming a thinner layer over ES in the future rather than the original goal of saved objects being a black box with ES strictly being an implementation detail that needs to be completely abstracted away. I think this also has the advantage of developers only having to learn how ES works instead of having to understand the intricate details of how SO works with ES under the hood.

@pgayvallet
Copy link
Contributor

pgayvallet commented Jan 27, 2021

We'd like to use this to make backup exports which should include all saved object types

Of course we do... In that case, I guess we will be technically forced to perform one request per index? Note that I would love to be wrong here. Please, someone tells me that PITs are supporting multi-indices searches?

So I envision saved objects becoming a thinner layer over ES in the future rather than the original goal of saved objects being a black box with ES strictly being an implementation detail that needs to be completely abstracted away.

The worse thing is, I agree and disagree with your at the same time. I guess SO discussions are dangerous for one's sanity 😅 .

@rudolf
Copy link
Contributor

rudolf commented Jan 27, 2021

Whatever we decide, we will probably regret at least 10% of our decisions in two years time 😂

@lukeelmers
Copy link
Member

lukeelmers commented Jan 27, 2021

we have many examples where saved objects have added too much abstraction on top of ES and it's limiting the features we can expose and holding back plugins

this also has the advantage of developers only having to learn how ES works instead of having to understand the intricate details of how SO works with ES under the hood

These points are the reason why I had originally preferred option (2), as it means consumers would need to make the separate PIT request on their own and be aware of how that works. However, as @pgayvallet rightfully points out, this is also less than ideal from a DX perspective - it would be nice if you didn't need to use both ES client and SO client.

That said, I'm not strongly opposed to (3) and am comfortable moving in that direction since it seems to strike a balance between the two extremes (you don't need to have ES client, but you still need to explicitly open the PIT).

But 4. would be perfectly fine too, as long as we add the appropriate validation (like, can't use pit AND createPit when calling find)

Worth pointing out that we'll need some special validation in find either way: You cannot perform a _search with PIT against a specific index (e.g. /my-logs/_search { ...search with pit } will return a 400, it must be performed against /_search directly as the index is already stored with the PIT). Same goes for preference (used in find) and routing (not used). So we'll either need to enforce via typings or runtime checks that these parameters are not provided when a pit is also provided.

I tried using the API and it successfully opened a PIT against multiple indices, so I think the docs could be misleading (but I didn't do a full test to search against my PIT).

Glad you tried this @rudolf! I will do a test of this and verify w/ ES team that multiple indices are supported. When I originally tested I only used a single index as that's what the docs seemed to indicate, so my plan in that case was to take an approach like what Pierre describes & restrict PIT searches to only be permitted against a single type. But not having this restriction would be ideal... will report back 🤞

we will probably regret at least 10% of our decisions in two years time

In my experience so far, I would say this is an accurate estimate 😉

TLDR - Will confirm that PIT supports multiple indices, and if so, plan to move ahead with option (3) unless we think of any other good reasons not to.

@lukeelmers
Copy link
Member

I will do a test of this and verify w/ ES team that multiple indices are supported.

Tested and confirmed that both opening a PIT and searching against a PIT works with multiple indices, even though it isn't explicitly mentioned in the docs. Also double-checked with the ES search team & they confirmed that this is correct. 🎉

@lukeelmers
Copy link
Member

lukeelmers commented Jan 27, 2021

After chatting with @jimczi, it turns out that the PR linked above does not fully complete the work of automatic tiebreaking after all: it adds the sort field for tiebreaking but doesn't yet apply it to all queries within a PIT.

The issue to follow for the full feature is elastic/elasticsearch#56828; I'll keep an eye on that to make sure it lands before this PR does.

(EDIT: Worst case scenario this shouldn't block us: we can still manually add the sort field on Kibana side if we need to.)

@pgayvallet
Copy link
Contributor

Tested and confirmed that both opening a PIT and searching against a PIT works with multiple indices, even though it isn't explicitly mentioned in the docs

That's amazing news!

The issue to follow for the full feature is elastic/elasticsearch#56828; I'll keep an eye on that to make sure it lands before this PR does.

Do they have any idea when this one is approximatively planned for?

@lukeelmers
Copy link
Member

lukeelmers commented Jan 27, 2021

Do they have any idea when this one is approximatively planned for?

@pgayvallet Targeting 7.12 still, but even if that misses we could consider shipping our changes. The previous PR introduces a new field that can be used as a tiebreaker, so even without it automatically being applied, we would be able to take that field and insert it into the request in Kibana.

Basically ES has already done the work of creating the tiebreaker field for us, what we are waiting on still is for it to be applied automatically. But in the interim we can still do it manually.

[Edit: I should add that I haven't tested adding it manually yet, this is just based on my chat with Jim]

@pgayvallet
Copy link
Contributor

Basically ES has already done the work of creating the tiebreaker field for us, what we are waiting on still is for it to be applied automatically. But in the interim we can still do it manually.

Ok. In that case we're right, the impact on us would be minor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Saved Objects Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants