Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs: rename kvs namespace events to limit calls to flux_event_subscribe() #2779

Closed
chu11 opened this issue Feb 26, 2020 · 10 comments
Closed
Assignees

Comments

@chu11
Copy link
Member

chu11 commented Feb 26, 2020

As @garlick mentions in #2777, we could further reduce the number of calls to flux_event_subscribe() and flux_event_unsubscribe() through clever naming of the events and ensuring there are common substrings.

@chu11 chu11 self-assigned this Feb 27, 2020
@chu11
Copy link
Member Author

chu11 commented Feb 27, 2020

I'm clearly not that clever, as @garlick figured out we can do kvs.namespace-%s-setroot as the pattern, so rank 0 can subscribe to kvs.namespace and the workers can subscribe to kvs.namespace-%s.

Doh! There is also a kvs.namespace-created-%s event. Which the primary KVS module does not listen for, but the kvs-watch module does. Gotta get a little more creative on names or just modify the created event name.

@chu11
Copy link
Member Author

chu11 commented Feb 27, 2020

ugh ... i am wasting more time thinking about a good naming scheme than I'd like to admit:

kvs.setroot - subscribed to by kvs & kvs-watch module
kvs.namespace-removed - subscribed to by kvs module & kvs-watch module
kvs.error - subscribed to only by kvs module
kvs.namespace-create - subscribed to only by kvs-watch module

I thought of kvs.event-internal and kvs.event-external (but abbreviated), but there's multiple events that are used by both modules.

I didn't want to do something stupid like "kvs.create-namespace", which works but I'm just being stupid about creating a different "prefix" substring.

I am almost inclined to just leave as is now. If I can't come up with a good crossover naming scheme, then perhaps it should be as is. It simply means too many events have specialized purposes or "wide" usage.

@grondo
Copy link
Contributor

grondo commented Feb 28, 2020

We should try to avoid the back-to-back subscribes everywhere though if possible. My guess is that this might have a measurable impact on the high throughput jobs case since the kvs modules can't do anything else while they are synchronously waiting for responses to subscribe requests. It might be a better trade-off to recv and drop the events you aren't interested in, than have many, many subscriptions.

If you just threw out a branch with this kind of change we could see if there is even any impact on the job workload.

@chu11
Copy link
Member Author

chu11 commented Feb 28, 2020

If you just threw out a branch with this kind of change we could see if there is even any impact on the job workload.

Seems like a good idea to try. We want to remove the 3 successive flux_event_subscribe() calls in kvs-watch as well.

@chu11
Copy link
Member Author

chu11 commented Feb 28, 2020

Another thought, at what point to we just have to accept that we should wait for (or make higher priority) #1557?

@grondo
Copy link
Contributor

grondo commented Feb 28, 2020

Even if we have a fix for #1157, less calls to subscribe will be beneficial.

@chu11
Copy link
Member Author

chu11 commented Feb 28, 2020

@grondo is there a job workload you've been using for this and #2727

@grondo
Copy link
Contributor

grondo commented Feb 28, 2020

Yeah, just using a bulksubmit script I have laying around:

Here's a current version. Run with flux python bulksubmit.py jobs/*.json.
I've been testing with 1024 sleep 0 jobs I think, e.g.

flux mini run --dry-run sleep 0 > jobs/0.json
for i in `seq 1 1023`; do
  cp jobs/0.json jobs/$i.json
done

It is also useful to reload sched-simple in unlimited alloc mode, e.g. I run on fluke with something like

$ srun --pty --mpi=none -N16 flux start sh -c 'flux module reload sched-simple unlimited && flux python bulksubmit jobs/*json'
import time
import sys
import flux
from flux import job
from flux import constants

t0 = time.time()

h = flux.Flux()
jobs = []

class QueueAdmin:
    data = {'query_only': False, 'enable': False, 'reason': "Testing"}

    def __init__(self, h):
        self.h = h

    def start(self):
        msg = dict(self.data, enable = True)
        return self.h.rpc("job-manager.alloc-admin", msg).get()

    def stop(self):
        msg = dict(self.data)
        return self.h.rpc("job-manager.alloc-admin", msg).get()

    def status(self):
        msg = dict(self.data, query_only = True)
        p = self.h.rpc("job-manager.alloc-admin", msg).get()
        if p["enable"]:
            return "scheduling enabled"
        else:
            return "scheduling disabled, reason={}".format(p["reason"])


label="bulksubmit"

def log(s):
    print(label + ": " + s)

def progress(fraction, length=72, suffix=""):
    fill = int(round(length * fraction))
    bar = '\u2588' * fill + '-' * (length - fill)
    s = '\r|{0}| {1:.1f}% {2}'.format(bar, 100*fraction, suffix)
    sys.stdout.write(s)
    if fraction == 1.:
        sys.stdout.write('\n')

def submit_cb(f, arg):
    jobs.append (job.submit_get_id (f))


log("Starting...")
q = QueueAdmin(h)
q.stop()
log(q.status())

for file in sys.argv[1:]:
    with open(file) as jobspec:
        job.submit_async(h, jobspec.read(), waitable=True).then(submit_cb)

if h.reactor_run(h.get_reactor(), 0) < 0:
    h.fatal_error("reactor start failed")

# print(jobs)

total = len(jobs)
dt = time.time() - t0
jps = len(jobs)/dt
log("submitted {0} jobs in {1:.2f}s. {2:.2f}job/s".format(total, dt, jps))

q.start()
t0 = time.time()
log(q.status())

count = 0
while count < total:
    jobid, result, s = job.wait(h)
    if not result:
        print("{}: {}".format(jobid, s))
    count = count + 1
    if count == 1:
        log("First job finished in about {0:.3f}s".format(time.time() - t0))
    suffix = "({0:.1f} job/s)".format(count/(time.time() - t0))
    progress(count/total, length=58, suffix=suffix)

dt = time.time() - t0
log("Ran {0} jobs in {1:.1f}s. {2:.1f} job/s".format(total, dt, total/dt))

# vi: ts=4 sw=4 expandtab

@chu11
Copy link
Member Author

chu11 commented Feb 29, 2020

So running the benchmark above, there's a net win renaming the namespaces, removing calls to flux_event_subscribe(), but incurring a few extra events that are dropped.

Before (code of PR #2777)

bulksubmit: Starting...                                                                                                                                                                                            
bulksubmit: scheduling disabled, reason=Testing                                                                                                                                                                    
bulksubmit: submitted 1024 jobs in 1.73s. 590.63job/s                                                                                                                                                              
bulksubmit: scheduling enabled                                                                                                                                                                                     
bulksubmit: First job finished in about 1.043s                                                                                                                                                                     
bulksubmit: Ran 1024 jobs in 15.1s. 67.7 job/s 

After

bulksubmit: Starting...                                                                                                                                                                                            
bulksubmit: scheduling disabled, reason=Testing                                                                                                                                                                    
bulksubmit: submitted 1024 jobs in 2.15s. 475.93job/s                                                                                                                                                              
bulksubmit: scheduling enabled                                                                                                                                                                                     
bulksubmit: First job finished in about 0.631s                                                                                                                                                                     
bulksubmit: Ran 1024 jobs in 15.0s. 68.1 job/s    

this was the best run for "Before" and the worst run for "After", so the average better performance was a tad better than this ( 68.4 to 67.1).

As I thought about the 4 events, these are only two events that will get dropped:

kvs.error will be dropped by kvs-watch, but under the assumption errors are rare, shouldn't be a big deal.

kvs.namespace-create - will be dropped by the kvs module everytime a namespace is created

So really we're trading the drops of all the kvs.namespace-create for the fewer event subscriptions. Which appears to be a slight win overall for this benchmark.

@chu11
Copy link
Member Author

chu11 commented Mar 3, 2020

After discussion at the meeting, decided this should go in.

My primary concern is that unlike other performance fixes, there was a clear "speed 1 thing up, slow another thing down" tradeoff, and just because performance is a win in this case, doesn't mean it'll be a win later. And we don't have performance tests in travis.

But given low probability of changes in the future, this should go in. I'll of course add many comments :P

chu11 added a commit to chu11/flux-core that referenced this issue Mar 3, 2020
In the kvs and kvs-watch modules, collapse multiple calls to
flux_event_subscribe() or flux_event_unsubscribe() into a single
call using the same event substrings.

Fixes flux-framework#2779
chu11 added a commit to chu11/flux-core that referenced this issue Mar 4, 2020
In the kvs and kvs-watch modules, collapse multiple calls to
flux_event_subscribe() or flux_event_unsubscribe() into a single
call using the same event substrings.

Fixes flux-framework#2779
@mergify mergify bot closed this as completed in 225518f Mar 5, 2020
chu11 added a commit to chu11/flux-core that referenced this issue Mar 7, 2020
In the kvs and kvs-watch modules, collapse multiple calls to
flux_event_subscribe() or flux_event_unsubscribe() into a single
call using the same event substrings.

Fixes flux-framework#2779
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants