Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v8.0] Match all the things #7907

Merged
merged 1 commit into from
Dec 2, 2024

Conversation

chrisburr
Copy link
Member

@chrisburr chrisburr commented Nov 22, 2024

In LHCbDIRAC we found that the matcher was struggling when there are many jobs running at a single site. This was caused by the JobDB being slow to do select Type, count(*) from Jobs where Site = %(site)s group by Type. Every 10 seconds the cache would expire and every thread would try to query the DB again and hang for a long time.

Looking at the culmative response time over a couple of hours (with a similar number of requests in both periods) the new version is 98.9% faster.

Before this change:

Screenshot 2024-11-22 at 12 55 22 Screenshot 2024-11-22 at 14 20 42

After this change:

Screenshot 2024-11-22 at 12 55 40 Screenshot 2024-11-22 at 14 20 56

BEGINRELEASENOTES

*WorkloadManagment
CHANGE: Better caching performance in the Matching Limiter

ENDRELEASENOTES

@DIRACGridBot DIRACGridBot added the alsoTargeting:integration Cherry pick this PR to integration after merge label Nov 22, 2024
@chrisburr chrisburr force-pushed the match-all-the-things branch 5 times, most recently from 7819a7b to 4ba5244 Compare November 22, 2024 10:59
@chrisburr chrisburr requested a review from fstagni November 22, 2024 11:59
@chrisburr chrisburr force-pushed the match-all-the-things branch from 4ba5244 to b10bed7 Compare November 22, 2024 13:15
class Limiter:
# static variables shared between all instances of this class
csDictCache = DictCache()
condCache = DictCache()
newCache = TwoLevelCache(10, 300)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth replacing condCache with newCache but I don't have time for it.

@chrisburr chrisburr marked this pull request as ready for review November 22, 2024 13:17
Comment on lines +357 to +358
data = result["Value"]
data = {k[0][attName]: k[1] for k in data}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data = result["Value"]
data = {k[0][attName]: k[1] for k in data}
data = {k[0][attName]: k[1] for k in result["Value"]}

return result
# It is critical that ``future`` is waited for outside of the lock as
# _work aquires the lock before filling the caches. This also means
# we can gaurentee that the future has not yet been removed from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# we can gaurentee that the future has not yet been removed from the
# we can guarantee that the future has not yet been removed from the

@@ -12,10 +21,109 @@
from DIRAC.WorkloadManagementSystem.Client import JobStatus


class TwoLevelCache:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not seem to me there's anything "specific" to this class. What if you move it to a generic utility module?

self.futures: dict[str, Future] = {}
self.pool = ThreadPoolExecutor(max_workers=max_workers)

def get(self, key: str, populate_func: Callable[[], Any]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get(self, key: str, populate_func: Callable[[], Any]):
def get(self, key: str, populate_func: Callable[[], Any]) -> dict:

@fstagni fstagni changed the title [master] Match all the things [v8.0] Match all the things Nov 27, 2024
@fstagni
Copy link
Contributor

fstagni commented Dec 2, 2024

I will merge this one right now, and take care of minor comments (and possibly other usage of the new cache) in a later PR.

@fstagni fstagni merged commit 5889b98 into DIRACGrid:rel-v8r0 Dec 2, 2024
26 checks passed
@DIRACGridBot DIRACGridBot added the sweep:done All sweeping actions have been done for this PR label Dec 2, 2024
DIRACGridBot pushed a commit to DIRACGridBot/DIRAC that referenced this pull request Dec 2, 2024
@DIRACGridBot
Copy link

Sweep summary

Sweep ran in https://github.com/DIRACGrid/DIRAC/actions/runs/12115525328

Successful:

  • integration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alsoTargeting:integration Cherry pick this PR to integration after merge sweep:done All sweeping actions have been done for this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants