Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow more MDT flexibility in Rabbit lustre allocations #171

Open
jameshcorbett opened this issue Jul 2, 2024 · 1 comment
Open

Allow more MDT flexibility in Rabbit lustre allocations #171

jameshcorbett opened this issue Jul 2, 2024 · 1 comment

Comments

@jameshcorbett
Copy link
Member

Problem: by default, Flux creates one MDT per rabbit for lustre file systems. However, directivebreakdown resources list some info which may indicate that fewer (or more) MDTs should be created.

@behlendorf said:

We really haven't done much testing with multiple rabbits but being able to control the number of MDTs and OSTs is going to be important at scale.

Further, creating one MDT per rabbit

[Is] going to be an issue at scale. Lustre has issues beyond 50'ish MDTs in a filesystems, it should work but performance will get worse as MDTs are added.

Flux should look at directivebreakdowns to see if they offer hints on how many MDTs to create.

@jameshcorbett
Copy link
Member Author

jameshcorbett added a commit to jameshcorbett/flux-coral2 that referenced this issue Jul 15, 2024
Problem: as described in issue flux-framework#171, creating many MDTs is a bad
for performance, and usually goes against what is explicitly
required by directivebreakdown resources. However, there is not
yet a good way to get Fluxion to handle MDT allocation.

Bypass Fluxion allocation completely, and tell DWS to create
exactly the number of allocations requested in the
.constraints.count field (which is usually found on MDTs).

Place the allocations on the rabbits which have the most compute
nodes allocated to the job.

This is intended to be only a temporary solution, since it
adds a new potential problem, in that some rabbit storage is
used which is not tracked by Fluxion. This could lead to
overallocation of resources, causing jobs to fail with errors.
However, this seems unlikely to occur in practice, since MDTs
are small and Fluxion always gives jobs more storage than they
asked for, so there should usually be some spare storage.
jameshcorbett added a commit to jameshcorbett/flux-coral2 that referenced this issue Jul 15, 2024
Problem: as described in issue flux-framework#171, creating many MDTs is a bad
for performance, and usually goes against what is explicitly
required by directivebreakdown resources. However, there is not
yet a good way to get Fluxion to handle MDT allocation.

Bypass Fluxion allocation completely, and tell DWS to create
exactly the number of allocations requested in the
.constraints.count field (which is usually found on MDTs).

Place the allocations on the rabbits which have the most compute
nodes allocated to the job.

This is intended to be only a temporary solution, since it
adds a new potential problem, in that some rabbit storage is
used which is not tracked by Fluxion. This could lead to
overallocation of resources, causing jobs to fail with errors.
However, this seems unlikely to occur in practice, since MDTs
are small and Fluxion always gives jobs more storage than they
asked for, so there should usually be some spare storage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant