Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flux account commands hang while fairshare is being updated #451

Open
ryanday36 opened this issue May 14, 2024 · 1 comment
Open

flux account commands hang while fairshare is being updated #451

ryanday36 opened this issue May 14, 2024 · 1 comment
Assignees
Labels
improvement Upgrades to an already existing feature

Comments

@ryanday36
Copy link
Contributor

I've noticed that, on corona, flux account commands hang when the fairshare tasks in the accounting cron tab are running. E.g. flux account view-user ... usually runs in less than a second, but takes ~2.5 minutes when run at the same time as the accounting cron tab. That's not terrible, so this probably isn't the highest priority, but it would be good if those updates could be done in a way that doesn't block queries.

I also don't know how much scaling testing has been done with the update-usage and related scripts, but I do worry that this could be a larger issue as we move Flux to larger systems with, potentially, many more jobs.

@cmoussa1 cmoussa1 self-assigned this May 14, 2024
@cmoussa1 cmoussa1 added the improvement Upgrades to an already existing feature label May 14, 2024
@cmoussa1
Copy link
Member

Thanks for pointing this out. Definitely something to look into further. I know that SQLite is pretty lightweight, and when there are a heavy number of concurrent reads and writes, especially for a fair-share update (which is a lot of writes), it could lead to low concurrency.

I wonder if there is potential to optimize the command that updates the job usage values for all of the associations in a flux-accounting DB. Right now, the command iterates through every row in the association_table and if it needs to make an update, it acquires a lock on the database to make that update. Repeat this process for a large number of users, and that's a lot of locks... I'll look into seeing if I can perhaps wrap all of the updates that would happen here into a single transaction that gets written to the database at once instead of for each row.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Upgrades to an already existing feature
Projects
None yet
Development

No branches or pull requests

2 participants