Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpk: aio tuner should take in account /proc/sys/fs/aio-nr when setting /proc/sys/fs/aio-max-nr #4004

Open
esteban opened this issue Mar 14, 2022 · 5 comments
Labels

Comments

@esteban
Copy link

esteban commented Mar 14, 2022

Version & Environment

Redpanda version: v21.11.9

What went wrong?

In hosts where /proc/sys/fs/aio-nr has been previously tuned via sysctl, using rpk redpanda tune will set the value of /proc/sys/fs/aio-max-nr to 1048576, in most cases it shouldn't be a problem but in hosts where /proc/sys/fs/aio-nr is also configured to the same value or very close to it will cause Redpanda to abort with the following message:

rpk[19642]: ERROR 2022-03-14 02:18:12,717 [shard 47] seastar - Could not setup Async I/O: Resource temporarily unavailable. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application

What should have happened instead?

rpk redpanda tune should take in account the value of /proc/sys/fs/aio-nr and offset /proc/sys/fs/aio-max-nr by the same amount in order to allocate the right number of AIO slots.

JIRA Link: CORE-859

@esteban esteban added kind/bug Something isn't working supportability labels Mar 14, 2022
@twmb twmb added the area/rpk label May 16, 2022
@JapuDCret
Copy link

JapuDCret commented Jan 20, 2023

Have a similar issue with the Redpanda container (observed in redpanda:v22.3.10 and redpanda:v22.3.11) in my Testcontainer setup.

When resources are a little scarce, i get

libc++abi: terminating with uncaught exception of type std::runtime_error:
    Could not setup Async I/O: Resource temporarily unavailable.
    The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr.
    Try increasing that number or reducing the amount of logical CPUs available for your application

and

WARN  2023-01-20 13:01:00,316 seastar - Requested AIO slots too large,
please increase request capacity in /proc/sys/fs/aio-max-nr. available:54510 requested:88208

unfortunately, one cannot guarantee that that much resources are available in every Testcontainer execution.

The worst thing about this, is that the Container does not say that it is unhealthy.

@fracasula
Copy link

Same here. I'm using the attached docker-compose.yaml file (I got it from the RedPanda quickstart page here).
I'm running this on my laptop. My laptop isn't doing anything else, it's running on a 12th gen i9-12900HK (14 cores). Memory wise it has 64GB DDR5.

DEBUG 2024-01-26 16:43:53,804 seastar - smp::count: 20
DEBUG 2024-01-26 16:43:53,804 seastar - latency_goal: 0.00075
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU0 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU1 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU2 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU3 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU4 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU5 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU6 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU7 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU8 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU9 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU10 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU11 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU12 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU13 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU14 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU15 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU16 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU17 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU18 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Assign CPU19 to NUMA0
DEBUG 2024-01-26 16:43:53,815 seastar - Auto-configure 1 IO groups
WARN  2024-01-26 16:43:53,826 seastar - Requested AIO slots too large, please increase request capacity in /proc/sys/fs/aio-max-nr. configured:65536 available:16 requested:220520
Could not initialize seastar: std::runtime_error (Could not setup Async I/O: Not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application)

Screenshot from 2024-01-26 17-41-21

Screenshot from 2024-01-26 17-42-34

docker-compose.zip

@fracasula
Copy link

In my case I had to increase the threshold by doing:

echo 1048576 > /proc/sys/fs/aio-max-nr

Now all 3 brokers are able to run:

sudo sysctl -a | grep -i aio
fs.aio-max-nr = 1048576
fs.aio-nr = 65520

@jsilvao
Copy link

jsilvao commented Nov 6, 2024

Any progress here? It's almost 2025

@rodkevich
Copy link

@jsilvao
Hi,
I’m not sure if there’s a problem with the Redpanda code.
I had a situation like this, and here’s what I found out.
In my environment main insatance of Redpanda is running via Docker Compose.
At the same time, Dockertest runs tests by creating Redpanda instances with additional settings like:
"redpanda start",
"--mode dev-container",
"--overprovisioned",
"--smp 1",
"--memory 256MiB",
"--reserve-memory 0M",
"--node-id 0",
"--check=false"
Until I added exactly the same settings to the main instance (which had more resources), I kept getting such errors in the Dockertest container, regardless of how much overall resources my Docker was consuming at the time.
If I turn off the main instance, everything works well.
Once I added resource limits into compose file, both the main Redpanda instance and the test containers started working.
It might be some sort of configuration collision idk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants