-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for data transfer encryption via rc4 and aes #236
Conversation
Hmm @colinmarc I haven't been able to reproduce the failure being seen on the Travis run and have no idea what is causing it, any ideas as to what i can do to try to track it down? :) |
Wooh nice! This looks pretty good - I think shifting some abstractions may be in order, though. And we'll need to figure out some testing. Can you add a case to travis.yml for data transfer encryption? The Append error is unrelated. |
I'll see about adding a case to the travis.yml for testing the data transfer encryption, let me know what shifting around of the abstractions you're thinking of. There were a couple aspects here that i wasn't completely sure about design-wise, but I wanted to keep things as unobtrusive as possible. My testing was also only against Hadoop 2.X and all the test cases on travis failing with the Append thing and timing out are for CDH6, which is Hadoop 3.X, is there any way we can resolve that Append and timing out issue so that I can ensure that the problem isn't some incompatibility with my change and hadoop 3.x? |
Go ahead and add |
W00t! Figured out the cause of the failure and put a fix in. :) I'll play around with getting a good test case added for the case of encrypted data tomorrow |
@colinmarc unit test added for the digestmd5, travis updated with test cases for the data encryption transfer which are now passing and even found a couple bugs in my implementation that i've since fixed. I see a couple avenues for cleaning up some of the abstractions and interfaces so i'll take a look at that, in the meantime if you have any suggestions let me know. |
Hey @colinmarc, anything else you need from me here to get this merged? |
Hey @zeroshade, Thanks for the ping. I’ll give this a review in the next few days (if I don’t, please harass me). Excited to have this feature! Colin |
@colinmarc harass harass harass harass harass :) consider this me harassing you :) |
Ok, looking through - I think the raw material is really good here but I think it would be better if the functionality was broken up a bit into composable pieces. Here's how I would organize this:
Some other general comments:
Thanks so much for your work on this! Based on the number of people that have been asking for it, it's sorely needed. |
One more performance suggestion: note that |
The reason i created the
If you look at transport.go:127 I don't wrap it in the default case, if you're not using
That's the situation where the client config is specifying encryption and the server config doesn't match and doesn't respond with the encryption, (which is a common implementation which I pulled from libhdfs3), from a consumer standpoint I'd prefer for it to work than to die when the client config supports encryption but the server has it disabled. |
I see what you mean. Let's encapsulate that in a BlockReader{
DialFunc: (&datanodeSaslDialer {
token: block.GetBlockToken(),
dialer: f.client.options.DatanodeDialFunc,
...
}).DialContext
...
} I don't think it's abusing that interface to do negotiation as part of the dial; that seems to be a pattern elsewhere. As a side note, I've been writing rust for a bit and it just about broke my brain to remember that you can put the receiver in a closure like that: https://play.golang.org/p/OXQBmmsdLEG |
I really think this is a bad idea, regardless of what libhdfs does. If we're talking to a malicious interloper, they can just respond in plaintext and we'll send them the nuclear access codes anyway. On the other hand, the hadoop docs for
Am I missing something, or is that really poorly designed? I'd say for now we require clients to be explicit, and fail if things don't match up. If someone shows up to complain that we're missing a feature for "deducing" server encryption settings, then maybe they can explain to me why it's fine. |
@colinmarc From a consumer standpoint it can make sense, essentially that you don't need to worry about configuring the client, but rather the client will essentially ask the serve whether or not it wants to use encryption and then will do what the server requests. At the same time, I can agree with where you're coming from, it makes sense for a client to be able to say "I want to use an encrypted connection, and i do not want to allow my data across an unencrypted one". I'm fine with requiring clients to be explicit for now and failing if the settings don't match up, I'll make that change. |
So quick question, when you say:
Do you mean reusing the input buffer for the output? I'm not sure how i feel about that one as far as overwriting the input buffer with the modified data, I think I prefer doing the copy. Or was there a different reusing you were referring to? |
I meant that we shouldn't be doing any allocation inside b := make([]byte, n)
conn.Read(b) It's much better to have a d.buf.Reset()
io.Copy(d.buf, conn) // This will use ReadFrom under the hood
b := d.buf.Bytes() |
Ah, gotcha. that makes sense. |
Hmm, @colinmarc any reason you can think of to not just do the check and wrap of the dialer inside of |
Sounds elegant to me! |
If you rebase, you can remove the |
@colinmarc I still need to go through and put more comments in and such, but i'd like your opinion on the refactor / redesign based on our conversations. Let me know what you think. thanks! Also added a new unit test for the encrypted case to ensure it doesn't get broken without having to run the travis tests to find out :) |
Hmmm don't know why that |
looked at master, that test is currently skipped. so the failure is not my fault :) i just got overzealous when you said i could rebase and remove the skips haha. I've put the skip back and hopefully everything should succeed now. |
@colinmarc Finally! :) Passing all tests, and redesigned based on your comments. Consider this me harassing you to take a look and let me know what you think now :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, did a first pass through. This is looking really good.
EncryptedDataNode bool | ||
// SecureDataNode specifies whether we're using block access tokens to | ||
// communicate with datanodes, specified by dfs.block.access.token.enable in the config | ||
SecureDataNode bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was something else, actually? We've always sent the block token, and I think this is just a server-side option for verifying them (with or without kerberos). In any case it should be EnableBlockAccessToken
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out you're right, this should be changed to reference dfs.data.transfer.protection
not dfs.block.access.token.enable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if the new comments / updated name/comment is good.
options.EncryptedDataNode = true | ||
options.SecureDataNode = true | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment for this method needs to be updated (to say that we munge these fields)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment updated. let me know if this is good.
origHashStart := msgTypeStart - macHMACLen | ||
|
||
if !bytes.Equal(hmac, input[origHashStart:origHashStart+macHMACLen]) || | ||
!bytes.Equal(macMsgType[:], input[msgTypeStart:msgTypeStart+macMsgTypeLen]) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the [:]
necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bytes.Equal
needs a slice, but we static initialized macMsgType
as a [2]byte
, so you need the [:]
for it to be passed to bytes.Equal
.
@@ -223,6 +223,24 @@ func (c *NamenodeConnection) Execute(method string, req proto.Message, resp prot | |||
return nil | |||
} | |||
|
|||
// GetEncryptionKeys will use the `getDataEncryptionKey` operation on the | |||
// namenode in order to fetch the current data encryption keys | |||
func (c *NamenodeConnection) GetEncryptionKeys() *EncryptionKey { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about exporting this. Does it have utility to users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now, i kept it exported because it's Client
which is using it to pass to the DatanodeSaslDialer
as the KeyFunc
so that the dialer doesn't need to have a reference to the namenode connection directly, just a function which gives it the key. So since it's being used in the client.go
file, it needs to be exported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, maybe just pass the two options into NamenodeConnection and then put the wrapDatanodeDialer func there?
but then i'd have to export wrapDatanodeDialer since NamenodeConnection is in internal/rpc
and this needs to be called from file_reader
/ file_writer
. Either way something would need to get exposed since the block reader / writer don't have a reference to the namenode, which they shouldn't need anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured it made more sense to export this utility function which is just a function call on the Namenode than it would have been to export a wrapping function for the datanode dialer.
@colinmarc holy crap all the tests pass still and i've addressed the feedback. W00t w00t :) Just waiting for a few more comments from you on some of the things to let me know if they are good. |
@colinmarc well, i'm stumped, something about how i've changed things now is changing the way the allocations happen for the size of that []byte slice, and now it's not exhibiting that issue anymore -_- oy vey. i'm so confused, but at least it all works now haha |
Can you rebase in 15f6da0 (and squash) to make sure those tests pass too? |
|
||
// if the server didn't send us a Nonce, then the data isn't encrypted | ||
// but we will still attempt to authenticate | ||
if useSecure && len(key.Nonce) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which case does the server not send a nonce?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the client has the data transfer protection set to privacy, but the server is configured for auth-integrity or authentication but not auth-privacy, then there is no key to send and as such the nonce will be empty. In that case we fall back to what the server's qop is.
Just found this in the RFC:
Sounds like we can just use the server's QOP, and hope we're not mitm'd. I'm munging some comments now, so I'll make the change. |
d = newDigestMD5IntegrityConn(conn, kic[:], kis[:]) | ||
} | ||
|
||
if len(msg.GetCipherOption()) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this always be true if the qop is privacy
? Put another way, is digestMD5PrivacyConn
ever actually used (beyond the call to decode
below)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if for some reason the server doesn't support AES then this would be empty and it would only use the rc4 that was negotiated during the digest authentication.
@colinmarc Rebased and Squashed |
according to travis this completed successfully, i don't know why it still is listed as in progress here :( |
When I extend the matrix to test different Edit: actually, my fault. I thought the configuration could be entirely determined at connection time, but looks like that's not the case. |
Merged :) thanks so much for your work on this!!! |
Just a note I've been trying to track down a flakey test failure in privacy mode - in case you have any insight: https://travis-ci.org/github/colinmarc/hdfs/jobs/709475318 This is the error from the datanode logs:
I haven't started looking through the traceback, but I will tomorrow. |
@colinmarc yeah, you can't use the |
Aha! I missed that detail when futzing with your code. Sorry about that. |
@colinmarc any chance of seeing a release with this? I'm having to use mod replace to point to the master branch right now since there's no release with the changes yet |
addresses #145
I've only implemented rc4 encryption here as i haven't figured out 3des / des yet, but this at least solves the use case in my own environment which is nice.
For references for the implementation I used:
I was able to test this with my own set up using encrypted data transfer and it works! huzzah!