You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've noticed that, on corona, flux account commands hang when the fairshare tasks in the accounting cron tab are running. E.g. flux account view-user ... usually runs in less than a second, but takes ~2.5 minutes when run at the same time as the accounting cron tab. That's not terrible, so this probably isn't the highest priority, but it would be good if those updates could be done in a way that doesn't block queries.
I also don't know how much scaling testing has been done with the update-usage and related scripts, but I do worry that this could be a larger issue as we move Flux to larger systems with, potentially, many more jobs.
The text was updated successfully, but these errors were encountered:
Thanks for pointing this out. Definitely something to look into further. I know that SQLite is pretty lightweight, and when there are a heavy number of concurrent reads and writes, especially for a fair-share update (which is a lot of writes), it could lead to low concurrency.
I wonder if there is potential to optimize the command that updates the job usage values for all of the associations in a flux-accounting DB. Right now, the command iterates through every row in the association_table and if it needs to make an update, it acquires a lock on the database to make that update. Repeat this process for a large number of users, and that's a lot of locks... I'll look into seeing if I can perhaps wrap all of the updates that would happen here into a single transaction that gets written to the database at once instead of for each row.
I've noticed that, on corona,
flux account
commands hang when the fairshare tasks in theaccounting
cron tab are running. E.g.flux account view-user ...
usually runs in less than a second, but takes ~2.5 minutes when run at the same time as the accounting cron tab. That's not terrible, so this probably isn't the highest priority, but it would be good if those updates could be done in a way that doesn't block queries.I also don't know how much scaling testing has been done with the
update-usage
and related scripts, but I do worry that this could be a larger issue as we move Flux to larger systems with, potentially, many more jobs.The text was updated successfully, but these errors were encountered: