-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvs: rename kvs namespace events to limit calls to flux_event_subscribe()
#2779
Comments
Doh! There is also a |
ugh ... i am wasting more time thinking about a good naming scheme than I'd like to admit:
I thought of I didn't want to do something stupid like "kvs.create-namespace", which works but I'm just being stupid about creating a different "prefix" substring. I am almost inclined to just leave as is now. If I can't come up with a good crossover naming scheme, then perhaps it should be as is. It simply means too many events have specialized purposes or "wide" usage. |
We should try to avoid the back-to-back subscribes everywhere though if possible. My guess is that this might have a measurable impact on the high throughput jobs case since the kvs modules can't do anything else while they are synchronously waiting for responses to subscribe requests. It might be a better trade-off to recv and drop the events you aren't interested in, than have many, many subscriptions. If you just threw out a branch with this kind of change we could see if there is even any impact on the job workload. |
Seems like a good idea to try. We want to remove the 3 successive |
Another thought, at what point to we just have to accept that we should wait for (or make higher priority) #1557? |
Even if we have a fix for #1157, less calls to subscribe will be beneficial. |
Yeah, just using a Here's a current version. Run with flux mini run --dry-run sleep 0 > jobs/0.json
for i in `seq 1 1023`; do
cp jobs/0.json jobs/$i.json
done It is also useful to reload
import time
import sys
import flux
from flux import job
from flux import constants
t0 = time.time()
h = flux.Flux()
jobs = []
class QueueAdmin:
data = {'query_only': False, 'enable': False, 'reason': "Testing"}
def __init__(self, h):
self.h = h
def start(self):
msg = dict(self.data, enable = True)
return self.h.rpc("job-manager.alloc-admin", msg).get()
def stop(self):
msg = dict(self.data)
return self.h.rpc("job-manager.alloc-admin", msg).get()
def status(self):
msg = dict(self.data, query_only = True)
p = self.h.rpc("job-manager.alloc-admin", msg).get()
if p["enable"]:
return "scheduling enabled"
else:
return "scheduling disabled, reason={}".format(p["reason"])
label="bulksubmit"
def log(s):
print(label + ": " + s)
def progress(fraction, length=72, suffix=""):
fill = int(round(length * fraction))
bar = '\u2588' * fill + '-' * (length - fill)
s = '\r|{0}| {1:.1f}% {2}'.format(bar, 100*fraction, suffix)
sys.stdout.write(s)
if fraction == 1.:
sys.stdout.write('\n')
def submit_cb(f, arg):
jobs.append (job.submit_get_id (f))
log("Starting...")
q = QueueAdmin(h)
q.stop()
log(q.status())
for file in sys.argv[1:]:
with open(file) as jobspec:
job.submit_async(h, jobspec.read(), waitable=True).then(submit_cb)
if h.reactor_run(h.get_reactor(), 0) < 0:
h.fatal_error("reactor start failed")
# print(jobs)
total = len(jobs)
dt = time.time() - t0
jps = len(jobs)/dt
log("submitted {0} jobs in {1:.2f}s. {2:.2f}job/s".format(total, dt, jps))
q.start()
t0 = time.time()
log(q.status())
count = 0
while count < total:
jobid, result, s = job.wait(h)
if not result:
print("{}: {}".format(jobid, s))
count = count + 1
if count == 1:
log("First job finished in about {0:.3f}s".format(time.time() - t0))
suffix = "({0:.1f} job/s)".format(count/(time.time() - t0))
progress(count/total, length=58, suffix=suffix)
dt = time.time() - t0
log("Ran {0} jobs in {1:.1f}s. {2:.1f} job/s".format(total, dt, total/dt))
# vi: ts=4 sw=4 expandtab |
So running the benchmark above, there's a net win renaming the namespaces, removing calls to Before (code of PR #2777)
After
this was the best run for "Before" and the worst run for "After", so the average better performance was a tad better than this ( 68.4 to 67.1). As I thought about the 4 events, these are only two events that will get dropped:
So really we're trading the drops of all the |
After discussion at the meeting, decided this should go in. My primary concern is that unlike other performance fixes, there was a clear "speed 1 thing up, slow another thing down" tradeoff, and just because performance is a win in this case, doesn't mean it'll be a win later. And we don't have performance tests in travis. But given low probability of changes in the future, this should go in. I'll of course add many comments :P |
In the kvs and kvs-watch modules, collapse multiple calls to flux_event_subscribe() or flux_event_unsubscribe() into a single call using the same event substrings. Fixes flux-framework#2779
In the kvs and kvs-watch modules, collapse multiple calls to flux_event_subscribe() or flux_event_unsubscribe() into a single call using the same event substrings. Fixes flux-framework#2779
In the kvs and kvs-watch modules, collapse multiple calls to flux_event_subscribe() or flux_event_unsubscribe() into a single call using the same event substrings. Fixes flux-framework#2779
As @garlick mentions in #2777, we could further reduce the number of calls to
flux_event_subscribe()
andflux_event_unsubscribe()
through clever naming of the events and ensuring there are common substrings.The text was updated successfully, but these errors were encountered: