Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: verify that zeromq/czmq built with security, epgm #7

Closed
garlick opened this issue Sep 28, 2014 · 15 comments
Closed

build: verify that zeromq/czmq built with security, epgm #7

garlick opened this issue Sep 28, 2014 · 15 comments

Comments

@garlick
Copy link
Member

garlick commented Sep 28, 2014

from: Mark Grondona [email protected]
to: Jim Garlick [email protected]
date: Sun, Sep 28, 2014 at 11:12 AM

On my laptop running Ubuntu 14.04.1 with

libzmq3-dev:amd64 4.0.4+dfsg-2
libmunge-dev 0.5.11-1ubuntu1

And czmq built from git and installed in /usr/local.. I'm getting
zeroed keys for curve, which seems to be causing an assert
when starting cmbd. I assume I'm missing some dependency
or didn't build something right, but I'm not sure where to start
looking...

$ ./flux keygen
Saving /home/grondo/.flux/curve/client
Saving /home/grondo/.flux/curve/client_private
Saving /home/grondo/.flux/curve/server
Saving /home/grondo/.flux/curve/server_private

grondo@moron:~/git/flux-core/src/cmd$ cat /home/grondo/.flux/curve/server
#   ****  Generated on 2014-09-28 11:09:06 by CZMQ  ****
#   ZeroMQ CURVE Public Certificate
#   Exchange securely, or use a secure mechanism to verify the contents
#   of this file after exchange. Store public certificates in your home
#   directory, in the .curve subdirectory.

metadata
    time = "2014-09-28T11:09:06"
    role = "server"
curve
    public-key = "0000000000000000000000000000000000000000"

$ ./flux start --size=1
cmbd: zcertstore.c:173: zcertstore_insert: Assertion `rc == 0' failed.
flux-start: 0 (pid 12491) Aborted

from: Jim Garlick [email protected]
to: Mark Grondona [email protected]
date: Sun, Sep 28, 2014 at 12:27 PM

The security stuff came in zeromq4 so you'll need to build that too.
I think the steps are:
-build and install libsodium to /usr/local (me: 0.5.0)
-build and install libzmq (me: 4.0.4), I think you need --with-pgm
(but it is now included in the source tree)
-build and install czmq (me: 2.2.0)

Sounds like we might want to build in checks for functioning pgm and
curve to our configure.ac.
(Maybe open an issue on that?)

from: Mark Grondona [email protected]
to: Jim Garlick [email protected]
date: Sun, Sep 28, 2014 at 12:44 PM

The libzmq3 in Ubuntu is confusingly zeromq 4.0.4 (see version of package). Maybe not built with libsodium? Would runtime checks be possible so that keygen doesn't silently create zero keys?

@grondo
Copy link
Contributor

grondo commented Sep 29, 2014

My guess is we might be hitting something like zeromq/czmq #325

which was fixed in this commit (At least zcert_new() would return NULL)

zeromq/czmq@6009411

However, that behavior has since changed, which is why I'm still getting zeroed keys from
czmq git master.

As a workaround (though I admit an unsatisfying one), we could check for a cert with zero keys
as I did here:

grondo@76601b5

Or, as I just noticed is used in CZMQ, judicious use of zsys_has_curve() might be enough
to detect at runtime if we HAVE_ZAUTH but don't have Curve support.

(Sorry, a bit new at this code and missed the obvious the first time)

@garlick
Copy link
Member Author

garlick commented Sep 29, 2014

Possibly we could call this and look for a return of -1 and errno == ENOTSUPP?

http://api.zeromq.org/4-0:zmq-curve-keypair

Does that make sense as an autoconf test?

@grondo
Copy link
Contributor

grondo commented Sep 29, 2014

The autoconf test would be a definite improvement.

Since you're building with CZMQ anyway, zsys_has_curve() might be easier to use,
but I'm not sure if it is reliable yet against libzmq without zmq_has().

There is still a chance of generating zero keys silently at runtime though, since
one could configure against working libzmq and run against non-working libzmq.
Therefore, best case would be to detect this at runtime. Best best case would be
if zcert_new() failed instead of creating zero keys (which cause an assert at runtime
in other parts of the code)

@garlick
Copy link
Member Author

garlick commented Sep 29, 2014

Good point, agreed we need a runtime check.

I think zsys_has_curve() is pretty new (newer than czmq 2.2.0).
Maybe AC_CHECK_FUNCS for that and test it at runtime if available?
And if not fall back to your null cred check code?

@grondo
Copy link
Contributor

grondo commented Sep 29, 2014

I have a version that uses zsys_has_curve() if available and falls back to
the null cred check if not, but now it occurs to me -- why don't we just use
zmq_curve_keypair() and zcert_new_from() in gencurve() instead of
zcert_new()? This would be much more straightforward, unless you had a
reason to avoid zmq_curve_keypair()?

@garlick
Copy link
Member Author

garlick commented Sep 29, 2014

No, I don't think I did. Most likely I was looking at the czmq manual nd it just worked so I never got to the underlying stuff. Sorry I didn't think of that!

@grondo
Copy link
Contributor

grondo commented Sep 29, 2014

Ok, I tested this out in grondo@8970b23

I'll try it on my laptop tonight and generate a pull request if it works. Please comment if you have the chance.

@garlick
Copy link
Member Author

garlick commented Sep 30, 2014

I commented on the commit but I'll say it here too: looks good and thank you!

grondo referenced this issue in grondo/flux-core Sep 30, 2014
Use AC_RUN_IFELSE with zmq_curve_keypair() to detect at configure
time if the libzmq we're building against is compiled with
security (libsodium).

Fixes #7
@grondo
Copy link
Contributor

grondo commented Sep 30, 2014

Build time check might be overkill? But I propose one above....

@garlick
Copy link
Member Author

garlick commented Sep 30, 2014

Yes let's do both! Nice.

@grondo grondo closed this as completed in 8970b23 Sep 30, 2014
@garlick
Copy link
Member Author

garlick commented Sep 30, 2014

I tried regenerating my keys (flux keygen -f) after applying these changes and although they superficially look OK, zmq connections can no longer be made. No problem if keys were generated with the old code. Will have a look tomorrow. This is on my ubuntu test box.

@garlick garlick reopened this Sep 30, 2014
@garlick
Copy link
Member Author

garlick commented Sep 30, 2014

The code for zcert_new() calls crypto_box_keypair () (I guess a libsodium call) and passes the result to zcert_new_from(). Maybe zcert_new_from() wants keys unarmored?

@grondo
Copy link
Contributor

grondo commented Sep 30, 2014

Oh, sorry! I did try to test my branch, but obviously failed miserably.

I think you are right about z85 armor. Now it seems obvious why zcert_new_from() takes byte *..
:-(

@grondo
Copy link
Contributor

grondo commented Sep 30, 2014

Test with branch grondo/flux-core:zcert_curve_new-fixes to make sure I've resolved the issue with curve keys. I tested with

$ cat t.sh
#!/bin/sh
sleep 1
./flux up

$ ./flux keygen --force && ./flux start --size=2 sh t.sh 2>/dev/null | grep -q '^ok:.*\[0-1\]' && echo "Success" || echo "Failed."

Saving /g/g0/grondo/.flux/curve/client
Saving /g/g0/grondo/.flux/curve/client_private
Saving /g/g0/grondo/.flux/curve/server
Saving /g/g0/grondo/.flux/curve/server_private
Success

@grondo
Copy link
Contributor

grondo commented Oct 7, 2014

Closing this one as the work has been done and merged I think.

@grondo grondo closed this as completed Oct 7, 2014
garlick added a commit to garlick/flux-core that referenced this issue Sep 15, 2020
Problem: unloading resource module with events posted to eventlog
in flight can resut in segfault.

Program terminated with signal SIGSEGV, Segmentation fault.

 #0  __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:102
 102     ../sysdeps/x86_64/multiarch/strcmp-avx2.S: No such file or directory.
 [Current thread is 1 (Thread 0x7fe74b7fe700 (LWP 3495430))]
 (gdb) bt
 #0  __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:102
 #1  0x00007fe764f40de0 in aux_item_find (key=<optimized out>,
     head=0x7fe73c006180) at aux.c:88
 #2  aux_get (head=<optimized out>, key=0x7fe764f5b000 "flux::log") at aux.c:119
 #3  0x00007fe764f1f0d4 in getctx (h=h@entry=0x7fe73c00c6d0) at flog.c:72
 #4  0x00007fe764f1f3a5 in flux_vlog (h=0x7fe73c00c6d0, level=7,
     fmt=0x7fe7606318fc "%s: %s event posted", ap=ap@entry=0x7fe74b7fd790)
     at flog.c:146
 #5  0x00007fe764f1f333 in flux_log (h=<optimized out>, lev=lev@entry=7,
    fmt=fmt@entry=0x7fe7606318fc "%s: %s event posted") at flog.c:195
 flux-framework#6  0x00007fe76061166a in reslog_cb (reslog=<optimized out>,
     name=0x7fe73c016380 "online", arg=0x7fe73c013000) at acquire.c:319
 flux-framework#7  0x00007fe760610deb in notify_callback (event=<optimized out>,
     reslog=0x7fe73c005b90) at reslog.c:47
 flux-framework#8  post_handler (reslog=reslog@entry=0x7fe73c005b90, f=0x7fe73c00a510)
     at reslog.c:91
 flux-framework#9  0x00007fe760611250 in reslog_destroy (reslog=0x7fe73c005b90)
     at reslog.c:182
 flux-framework#10 0x00007fe76060e6b8 in resource_ctx_destroy (ctx=ctx@entry=0x7fe73c016640)
     at resource.c:129
 flux-framework#11 0x00007fe76060ef18 in resource_ctx_destroy (ctx=0x7fe73c016640)
     at resource.c:331

It looks like the acquire subsystem got a callback for a rank coming online
after its context was freed.  Set the reslog callback to NULL before
destroying the acquire context.

Also, set the monitor callback to NULL before destroying the discover
context, as it appears this destructor has a similar safety issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants