[Search Pipelines] Let a processor factory know if it's being called from an ad hoc / per-request pipeline #8163

msfroh · 2023-06-20T01:06:58Z

Is your feature request related to a problem? Please describe.

Currently, search pipeline processor implementers define a factory to create the processor. The processor instance's lifecycle is bound to that of the pipeline.

In some cases, processor creation may be "heavy" -- like it might create a service client and associated connection pool (along with a pool skimmer thread). That's okay for persistent, named search pipelines (i.e. ones created via a PUT /_search/pipeline/<id> request), but users can also define a pipeline in their search request (#6717) -- in theory, just for testing before creating a persistent pipeline.

Since it's possible for a user to define a search pipeline in every search request (which creates a new pipeline with new processors on every search request), it would be great if "heavy" processor factories can take some action if they're being called from a search request, even if it means throwing an exception to say "No, I don't work that way".

Describe the solution you'd like

The solution I have in mind (and have implemented on my fork) is to pass a config parameter, like "ad_hoc_pipeline":true to the processor factory if and only if the processor is being created based on a pipeline specified in the search request source.

Describe alternatives you've considered

A more invasive approach could involve explicitly passing an isAdHoc or isRequestScoped parameter to the Factory.create method. Since (AFAIK) nobody outside of opensearch-project has written any processors yet, we could still change that. (Or we could add a default overloaded create method that ignores the parameter.)

Another option could involve doing nothing -- users who call an external service from a request or response processor specified in a pipeline in the search request may run out of memory because every processor instantiates a credential rotation thread. That's their problem -- they should have read the documentation! I don't like that solution -- we should at least pass the relevant context to processor factory and let the author of the processor factory provide appropriate safeguards.

We could also try to cache ad hoc search pipelines for reuse. That would avoid recreating heavy resources, as the same processor instances could be reused on subsequent requests. I'm shying away from that solution just because it's hard and I'm lazy. Maybe we could still do that in the future -- it wouldn't break the proposed solution.

Additional context

The immediate driver for this is the PersonalizeRankingResponseProcessor, whose factory does create an AWSCredentialsProvider and an AWS SDK client. Reusing those across multiple calls to the same pipeline is great. Creating a new one on every search request would be bad.

The text was updated successfully, but these errors were encountered:

msfroh added enhancement Enhancement or improvement to existing feature or request untriaged labels Jun 20, 2023

msfroh mentioned this issue Jun 20, 2023

[Search pipelines] Pass "adhocness" flag to processor factories #8164

Merged

6 tasks

msfroh added this to Search Project Board Jun 20, 2023

github-project-automation bot moved this to 🆕 New in Search Project Board Jun 20, 2023

msfroh moved this from 🆕 New to 🏗 In progress in Search Project Board Jun 20, 2023

sejli removed the untriaged label Jun 20, 2023

sejli assigned msfroh Jun 20, 2023

sejli added the Search Search query, autocomplete ...etc label Jun 20, 2023

macohen added the v2.9.0 'Issues and PRs related to version v2.9.0' label Jun 22, 2023

macohen moved this from 🏗 In progress to 👀 In review in Search Project Board Jun 30, 2023

mingshl mentioned this issue Jun 30, 2023

[Search pipelines] Pass "adhocness" flag to processor factories #8390

Closed

6 tasks

dblock closed this as completed in #8164 Jul 6, 2023

github-project-automation bot moved this from 👀 In review to ✅ Done in Search Project Board Jul 6, 2023

msfroh mentioned this issue Jul 25, 2023

Prevent Personalize ranking processor from being created as part of an ad hoc pipeline opensearch-project/search-processor#182

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Search Pipelines] Let a processor factory know if it's being called from an ad hoc / per-request pipeline #8163

[Search Pipelines] Let a processor factory know if it's being called from an ad hoc / per-request pipeline #8163

msfroh commented Jun 20, 2023

[Search Pipelines] Let a processor factory know if it's being called from an ad hoc / per-request pipeline #8163

[Search Pipelines] Let a processor factory know if it's being called from an ad hoc / per-request pipeline #8163

Comments

msfroh commented Jun 20, 2023