Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Search Pipelines] Let a processor factory know if it's being called from an ad hoc / per-request pipeline #8163

Closed
msfroh opened this issue Jun 20, 2023 · 0 comments · Fixed by #8164
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc v2.9.0 'Issues and PRs related to version v2.9.0'

Comments

@msfroh
Copy link
Collaborator

msfroh commented Jun 20, 2023

Is your feature request related to a problem? Please describe.

Currently, search pipeline processor implementers define a factory to create the processor. The processor instance's lifecycle is bound to that of the pipeline.

In some cases, processor creation may be "heavy" -- like it might create a service client and associated connection pool (along with a pool skimmer thread). That's okay for persistent, named search pipelines (i.e. ones created via a PUT /_search/pipeline/<id> request), but users can also define a pipeline in their search request (#6717) -- in theory, just for testing before creating a persistent pipeline.

Since it's possible for a user to define a search pipeline in every search request (which creates a new pipeline with new processors on every search request), it would be great if "heavy" processor factories can take some action if they're being called from a search request, even if it means throwing an exception to say "No, I don't work that way".

Describe the solution you'd like

The solution I have in mind (and have implemented on my fork) is to pass a config parameter, like "ad_hoc_pipeline":true to the processor factory if and only if the processor is being created based on a pipeline specified in the search request source.

Describe alternatives you've considered

A more invasive approach could involve explicitly passing an isAdHoc or isRequestScoped parameter to the Factory.create method. Since (AFAIK) nobody outside of opensearch-project has written any processors yet, we could still change that. (Or we could add a default overloaded create method that ignores the parameter.)

Another option could involve doing nothing -- users who call an external service from a request or response processor specified in a pipeline in the search request may run out of memory because every processor instantiates a credential rotation thread. That's their problem -- they should have read the documentation! I don't like that solution -- we should at least pass the relevant context to processor factory and let the author of the processor factory provide appropriate safeguards.

We could also try to cache ad hoc search pipelines for reuse. That would avoid recreating heavy resources, as the same processor instances could be reused on subsequent requests. I'm shying away from that solution just because it's hard and I'm lazy. Maybe we could still do that in the future -- it wouldn't break the proposed solution.

Additional context

The immediate driver for this is the PersonalizeRankingResponseProcessor, whose factory does create an AWSCredentialsProvider and an AWS SDK client. Reusing those across multiple calls to the same pipeline is great. Creating a new one on every search request would be bad.

@msfroh msfroh added enhancement Enhancement or improvement to existing feature or request untriaged labels Jun 20, 2023
@msfroh msfroh moved this from 🆕 New to 🏗 In progress in Search Project Board Jun 20, 2023
@sejli sejli removed the untriaged label Jun 20, 2023
@sejli sejli added the Search Search query, autocomplete ...etc label Jun 20, 2023
@macohen macohen added the v2.9.0 'Issues and PRs related to version v2.9.0' label Jun 22, 2023
@macohen macohen moved this from 🏗 In progress to 👀 In review in Search Project Board Jun 30, 2023
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Search Project Board Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc v2.9.0 'Issues and PRs related to version v2.9.0'
Projects
Archived in project
3 participants