-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Reporting] Improving CSV generation by supporting concurrent background tasks #181064
Comments
Pinging @elastic/appex-sharedux (Team:SharedUX) |
Assuming my understanding of #108485 is accurate, and still relevant, running 10 CSV exports concurrently has the chance of causing Kibana to crash due to an OOM. @elastic/appex-sharedux can you all confirm that each CSV export task could use approximately 100MB of memory? |
@kobelb You understanding seems accurate of the current configuration of how we chunk the reports. Currently, @vadimkibana has created an issue to hardcode the chunk size to 4MB: #180829 and that will stop reports from causing OOM in 1GB instances. |
Thanks, @tsullivan. If we use 4 MB chunks, then I don't have concerns about doing 10 concurrently. |
@mikecote @elastic/kibana-security Having concurrent background tasks for CSV reports will also help in case we want to generate an API key to use as authentication for the report during its runtime. The API key will need a set expiration time, which needs to cover not just (execution time * the number of attempts), but also the time that the report is in "pending" status while waiting for earlier reports to finish. If we can execute CSV reports in parallel, that reduces the time that the report is waiting in pending status. |
Thanks @tsullivan, just as a heads up, we're working on supporting API keys at the task manager level (#190661). For reporting, it would mean the reporting document wouldn't have to store and encrypt the API key. If task manager manages the API key, would that suffice for reporting's use case? From my thinking it does but wanted to make sure. We can also work around the complications of API key expiration by invalidating the API key at the same time the task is done / deleted. |
@mikecote Yes! This would be very beneficial to keep the Reporting code as simple as possible. Thanks! |
The
report:execute
task today has a concurrency per Kibana node set to1
. What Task Manager does when a task type is configured like this is it will prevent more than one reporting task from running on the same node at any given time.The following exposes some limitations that we have in serverless:
What I propose is running the CSV generation tasks under a new task type
report:execute-csv
that doesn't havemaxConcurrency
set within its task definition and keep thereport:execute
multi-purpose in case there are still CSV tasks in the queue. This will allow 10x throughput per Kibana node for generating CSVs and will benefit serverless, ESS and on-prem users. One thing to keep an eye out for is with 10x concurrency, we also put 10x the memory / CPU pressure and I am not familiar with the internals of how much resource utilization each task needs.The text was updated successfully, but these errors were encountered: