-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingest Node and number of script compilation #49763
Comments
Pinging @elastic/es-core-features (:Core/Features/Ingest) |
A cache limit of 100 compiled scripts is going to add a lot of overhead to ingest, where we use a lot of very short scripts. Can we not base the cache on memory usage instead? Or alternatively, change the |
We discussed this today and agree that we need to address this issue. How we do it though is still a question. While it would be possible to change the On the topic of max size and compilation throttle - The cache currently has configurable max size, but does does not offer size based evictions.The throttle is configurable but as seen one large pipeline can easily exceed the throttle. We wonder if these a) How much actual memory is used by these normally small b) Similarly, should these small @jdconrad @stu-elastic - thoughts ? |
One initial thought to consider is if an ingest script processor didn't allow the use of stored scripts, the script could be compiled when the ingest processor is parsed, so it could then be cached locally and compilation would only occur when the processor changes. However, to address the questions posed by @jakelandis the memory footprints for compiled scripts should be relatively small and the difference between a large script and a small script shouldn't be much, so it probably doesn't make sense to have special handling for small conditional ingest scripts specifically. As to the second point we do agree that the throttle limit should be removed for conditional ingest scripts because they are not changed on a per request basis. After discussion with @stu-elastic and @rjernst we would propose making the cache size and compilation limits per context. This way ingest could be set to have a much larger cache and no compilation limit by default for this type of use case. |
So today conditional scripts and scripts in the |
That's exactly part of the purpose of #50152. By splitting the cache per context, we can control the rate limit separately per cache, and have the ability to disable the rate limit completely. I don't think this is different whether they are compiled at parse or pipeline runtime. Additionally, while I do think we should move to parse time compilation for ingest condition scripts, I'm not sure we can until we limit the existing API. These can currently be stored scripts (just referenced by name), which means they might change after the pipeline is parsed. If we want to compile at parse time, I think we need to limit these scripts to inline only, which makes sense to me because they are essentially already stored scripts, just stored in a different place than the global stored scripts. |
👍
Agreed, for the ingest context this shouldn't be different and disabling the rate limit for the ingest context shouldn't depend on moving the compilation from runtime to parse time.
Yes, only allowing inline scripts is required in order to move compilation to from runtime to parse time. This would be a breaking change, since both script processor and code that parses the conditional invoke |
I've removed the team-discuss label. When #50152 is implemented then we should disable the rate limit for the ingest context. |
For 7.X would it be possible to improve the error message when the cache is too small? If we have more evictions per minute than the |
That's a reasonable request @leehinman, let's continue that discussion in #52497 |
Closed by #59267 |
The Beats Cisco ASA module has a fairly complex Ingest Node pipeline. The pipeline is generated by the module from YAML, but the resulting pipeline can be found in this gist. Due to the large number of
if
conditions, and because each is an individual script, we run into "Too many dynamic script compilations" errors.As explained by @adriansr, Elasticsearch has a cache of 100 compiled scripts and a default limit of 75 script compilation per 5 minutes. That pipeline alone has close to 100 scripts, so the cache runs out quickly enough.
We have written docs to document the workaround, which is mostly to increase the limit and the cache.
While we hit this limit with our own module, based on the huge Logstash configs that I've seen, I suspect users and customers are going to hit similar problems as they adopt Ingest Node more.
Questions: should we adjust the cache and limit on the Elasticsearch side? Or should we consider refactoring the pipeline to use a large Painless script? Are there other options?
The text was updated successfully, but these errors were encountered: