Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libroach: encrypt data at rest #21580

Merged
merged 1 commit into from
May 2, 2018
Merged

Conversation

mberhault
Copy link
Contributor

@mberhault mberhault commented Jan 19, 2018

Part of encryption-at-rest (see #19783)

Quick overview:

  • file registry records encryption settings for encrypted files
  • an encrypted env has a "env level" (plain, store, data) and a key
    manager as a source
  • the data key manager uses an encrypted env with the store key manager as
    key source
  • rocksdb uses an encrypted env with the data key manager as key source

Release note: none

@mberhault mberhault requested review from a team January 19, 2018 15:15
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@mberhault mberhault requested a review from bdarnell January 19, 2018 15:16
@mberhault mberhault mentioned this pull request Jan 19, 2018
29 tasks
@mberhault mberhault force-pushed the marc/env_encrypt branch 5 times, most recently from 20f0058 to caeb344 Compare January 22, 2018 09:19
@knz
Copy link
Contributor

knz commented Jan 22, 2018

nit: place the Release note line at the end of the commit message(s). Here it doesn't matter but it's good to form a habit.

(The release not extraction assumes the release note can contain multiple paragraphs so the entire text of your PR description goes into it. Except when it's "none" which is why it's just a nit here.)

@bdarnell
Copy link
Contributor

I haven't done a complete first pass, but I have some questions about the EnvContext/SwitchingProvider design so here are my initial comments.


Reviewed 3 of 30 files at r4, 1 of 33 files at r7, 1 of 4 files at r8, 13 of 30 files at r10, 1 of 2 files at r11, 2 of 5 files at r12, 3 of 6 files at r13.
Review status: 20 of 32 files reviewed at latest revision, 10 unresolved discussions, all commit checks successful.


c-deps/libroach/switching_provider.cc, line 24 at r6 (raw file):

                                                               CipherStreamCreator* creator) {
  auto res =
      creators_.insert(std::make_pair(env_level, std::unique_ptr<CipherStreamCreator>(creator)));

You can use std::make_unique(creator) to avoid repeating the type here. (although it would be better if the function parameter was a unique_ptr so you can tell from the function signature that it takes ownership).


c-deps/libroach/switching_provider.h, line 27 at r4 (raw file):

// deletion.
//
// Some useful envs are keps:

keps?


c-deps/libroach/switching_provider.h, line 28 at r4 (raw file):

//
// Some useful envs are keps:
// base_env: the env that all other envs must wrap. Usually a Env::Default or MemEnv.

When would you want to access base_env other than when creating db_env?


c-deps/libroach/switching_provider.h, line 30 at r4 (raw file):

// base_env: the env that all other envs must wrap. Usually a Env::Default or MemEnv.
// db_env: the env used by rocksdb. This can be an EncryptedEnv.
struct EnvContext {

s/EnvContext/EnvManager/? That's not a great name either, but "Env" and "Context" are nearly synonymous. (also, when I saw EnvContext in another header, i wasn't sure which file would have its declaration, so having some word in common between EnvContext and SwitchingProvider would be helpful).


c-deps/libroach/switching_provider.h, line 67 at r13 (raw file):

// and whether the file is new.
//
// The 0 env level (for plaintext) is always present.

"Env levels" need some documentation. It looks like this should be an enum instead of an int.


c-deps/libroach/switching_provider.h, line 69 at r13 (raw file):

// The 0 env level (for plaintext) is always present.
//
// The base_env should be a plain rocksdb::Env, either Env::Default(), or

I don't see a base_env here.


c-deps/libroach/rocksdbutils/aligned_buffer.h, line 4 at r1 (raw file):

//  This source code is licensed under both the GPLv2 (found in the
//  COPYING file in the root directory) and Apache 2.0 License
//  (found in the LICENSE.Apache file in the root directory).

The apache and bsd references here should be rewritten to refer to the copies in our licenses directory. For GPL, I'd probably just link out to a canonical web copy.


c-deps/libroach/rocksdbutils/env_encryption.h, line 76 at r10 (raw file):

  virtual ~EncryptionProvider() {}

  virtual rocksdb::Status CreateCipherStream(int env_level, const std::string& fname, bool new_file,

Why did you introduce this env_level parameter that gets plumbed through EncryptedEnv to CreateCipherStream, instead of giving each EncryptedEnv a separate EncryptionProvider that knows its "level"?


pkg/ccl/cmdccl/enc_utils/main.go, line 9 at r5 (raw file):

//     https://github.com/cockroachdb/cockroach/blob/master/licenses/CCL.txt

package main

Should this be linked into the main binary as a debug command instead of a separate executable? Or is it too much of a proof-of-concept to even be useful as a debug command? (Turning this into a unit test might be nice just to make sure that the encryption is happening as expected)


pkg/ccl/cmdccl/enc_utils/main.go, line 28 at r5 (raw file):

)

const fileRegistryPath = "COCKROACHDB_REGISTRY"

I think these should just reference constants from engineccl.


Comments from Reviewable

@mberhault
Copy link
Contributor Author

Review status: 12 of 37 files reviewed at latest revision, 10 unresolved discussions.


c-deps/libroach/switching_provider.cc, line 24 at r6 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

You can use std::make_unique(creator) to avoid repeating the type here. (although it would be better if the function parameter was a unique_ptr so you can tell from the function signature that it takes ownership).

I tried that but it couldn't fine std::make_unique (it's supposed to be in <memory>, so I gave up pretty quickly. I also switched a few of the constructors and utilities to just take raw pointers as the unique_ptr was getting a bit much. I'm happy to require unique_ptr everywhere we take ownership, but I'm also happy to mention it in the comments (more fragile, obviously).


c-deps/libroach/switching_provider.h, line 27 at r4 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

keps?

Done. "kept".


c-deps/libroach/switching_provider.h, line 28 at r4 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

When would you want to access base_env other than when creating db_env?

The store-level Env (the one using the store key manager) uses the base_env. Even though the pointer for db_env is the right one at the time, I prefer to clearly refer to base_env as that never changes.


c-deps/libroach/switching_provider.h, line 30 at r4 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/EnvContext/EnvManager/? That's not a great name either, but "Env" and "Context" are nearly synonymous. (also, when I saw EnvContext in another header, i wasn't sure which file would have its declaration, so having some word in common between EnvContext and SwitchingProvider would be helpful).

Sorry. Renaming to EnvManager and giving it its own file to make it easier to find.


c-deps/libroach/switching_provider.h, line 67 at r13 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

"Env levels" need some documentation. It looks like this should be an enum instead of an int.

Switched to the enginepb::EnvLevel. Not sure why I didn't to begin with. Clarified env levels.


c-deps/libroach/switching_provider.h, line 69 at r13 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I don't see a base_env here.

moved to the mention of level 0 env.


c-deps/libroach/rocksdbutils/aligned_buffer.h, line 4 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

The apache and bsd references here should be rewritten to refer to the copies in our licenses directory. For GPL, I'd probably just link out to a canonical web copy.

I've added them to the same directory. This feels more contained than putting everything in ours, and cleaner than doing a mix. This also has the advantage of having everything clearly in one place (the original intent of this subdirectory).


c-deps/libroach/rocksdbutils/env_encryption.h, line 76 at r10 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Why did you introduce this env_level parameter that gets plumbed through EncryptedEnv to CreateCipherStream, instead of giving each EncryptedEnv a separate EncryptionProvider that knows its "level"?

The goal of the env_level is to allow for multiple cipher stream types and ensure that we can tell that all the required ones are present (OSS vs CCL plain vs CCL with encryption options).

I'll update this PR to have the CipherStreamCreator know its own level (simpler registration and EncryptedEnv creation), but I'd like to keep the registration for detection of bad setups (eg: no encryption options when encrypted files are present). This would remove the creator map and odd-looking lookup but would keep the safety of matching registered stream creators vs file registry entries, as well as detecting mismatched accesses (eg: level 1 env opening a level 2 file).

The SwitchingProvider itself is still useful as it provides the coordination with the file registry (and the added bonus of owning all creators).

Ultimately, I also want to move plaintext (when using the switching env) to use a level 0 stream creator. This still bypasses block operations so should be fine to replace the Env::Default.


pkg/ccl/cmdccl/enc_utils/main.go, line 9 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Should this be linked into the main binary as a debug command instead of a separate executable? Or is it too much of a proof-of-concept to even be useful as a debug command? (Turning this into a unit test might be nice just to make sure that the encryption is happening as expected)

This is mostly proof-of-concept for now, hence the completely separate command. CLI interaction with rocksdb files should be properly-wrapped debug commands and perhaps some high-level encryption-related command in C++ (eg: show encryption settings and usage). I expect this code to disappear in favor of a utility used in Go-side testing.


pkg/ccl/cmdccl/enc_utils/main.go, line 28 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I think these should just reference constants from engineccl.

Do you mean one set of constants on the Go side, and the same set of constants on the C++ side? Co-located with the proto could be ok, but either way we'll have two things to keep in sync.


Comments from Reviewable

@mberhault
Copy link
Contributor Author

I've addressed most review comments. I'll change the env_level use as described in a comment below. Should be in tomorrow morning.


Review status: 12 of 37 files reviewed at latest revision, 10 unresolved discussions.


Comments from Reviewable

@bdarnell
Copy link
Contributor

Reviewed 4 of 30 files at r10, 1 of 5 files at r12, 7 of 12 files at r14, 7 of 7 files at r15.
Review status: 31 of 37 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


c-deps/libroach/db.cc, line 1486 at r15 (raw file):

  if (env_ctx->switching_provider != nullptr) {
    // We are using the switching provider. Make sure the OpenHook sets all the needed env levels.
    auto status = env_ctx->switching_provider->CheckEnvLevels();

Why isn't this done in the CCL DBOpenHook? Then I don't think we'd need to expose switching_provider on the OSS side at all and can move it all over to CCL. (I guess we'd need a check something like if DBOpenHook == nil { assert file_registry.UsedLevels() == {0} })

(If this goes, I think we can get EnvManager over to the CCL side too, but I'll save that for the next round. I'm pushing for this not because I want to maximize how much stuff falls under the CCL but because I think straddling the divide is forcing some awkward design decisions and I don't think we need this many crossing points)


c-deps/libroach/env_manager.h, line 25 at r15 (raw file):

// EnvManager manages all created Envs, as well as the switching provider.
// Rocksdb owns Env::Default (global static). All other envs are owned by EnvManager.

Can we just make a copy of Env::Default when we use it so that we can have everything be owned by the EnvManager instead of a mix of raw and unique pointers?


c-deps/libroach/rocksdbutils/aligned_buffer.h, line 4 at r1 (raw file):

Previously, mberhault (marc) wrote…

I've added them to the same directory. This feels more contained than putting everything in ours, and cleaner than doing a mix. This also has the advantage of having everything clearly in one place (the original intent of this subdirectory).

We want to include all the licenses for anything in the repo in the top-level licenses directory (but since this is dual-licensed I don't think we need to include the GPL COPYING file and can just link to a version on the web). Also, in the other files (where we've made less trivial changes) we should add our own copyright notice in addition to the facebook one.


pkg/ccl/cmdccl/enc_utils/main.go, line 28 at r5 (raw file):

Previously, mberhault (marc) wrote…

Do you mean one set of constants on the Go side, and the same set of constants on the C++ side? Co-located with the proto could be ok, but either way we'll have two things to keep in sync.

I was thinking just one set of constants period, but I wasn't thinking about cross-language issues. It's not worth plumbing constants through cgo just to avoid this duplication.


pkg/storage/engine/enginepb/file_registry.proto, line 26 at r15 (raw file):

}

// EnvLevel determines which rocksdb::Env is used.

"Level" makes these sound ordered/comparable, but they're really not.


pkg/storage/engine/enginepb/file_registry.proto, line 53 at r15 (raw file):

message FileEntry {
  // Env level identifies which rocksdb::Env is responsible for this file.
  EnvLevel env_level = 1;

Why isn't this inside the encryption_settings protobuf? We don't create FileEntries for plaintext files, so this would just indicate whether it's "store" or "data", which only the CCL side cares about.

Combining this and my previous comment, I suggest moving this to the ccl key_registry.proto and renaming to something like KeyScope or KeyType.


Comments from Reviewable

@mberhault
Copy link
Contributor Author

Review status: 20 of 35 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


c-deps/libroach/db.cc, line 1486 at r15 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Why isn't this done in the CCL DBOpenHook? Then I don't think we'd need to expose switching_provider on the OSS side at all and can move it all over to CCL. (I guess we'd need a check something like if DBOpenHook == nil { assert file_registry.UsedLevels() == {0} })

(If this goes, I think we can get EnvManager over to the CCL side too, but I'll save that for the next round. I'm pushing for this not because I want to maximize how much stuff falls under the CCL but because I think straddling the divide is forcing some awkward design decisions and I don't think we need this many crossing points)

Because we absolutely must check whether encryption was used before we start up, regardless of the current build mode or flags. Otherwise the only indication is garbage files (eg: rocksdb's CURRENT) and a very obscure error message.


c-deps/libroach/env_manager.h, line 25 at r15 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Can we just make a copy of Env::Default when we use it so that we can have everything be owned by the EnvManager instead of a mix of raw and unique pointers?

We could have yet another env wrapper that only implements the things that can't be inherited and use that right away. The use of EnvManager is fairly limited though, it's pretty much just about having an owner.


c-deps/libroach/rocksdbutils/aligned_buffer.h, line 4 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

We want to include all the licenses for anything in the repo in the top-level licenses directory (but since this is dual-licensed I don't think we need to include the GPL COPYING file and can just link to a version on the web). Also, in the other files (where we've made less trivial changes) we should add our own copyright notice in addition to the facebook one.

Done.


pkg/ccl/cmdccl/enc_utils/main.go, line 28 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I was thinking just one set of constants period, but I wasn't thinking about cross-language issues. It's not worth plumbing constants through cgo just to avoid this duplication.

Yeah, I with proto3 had constants, or even the proto2 "default" work-around.


pkg/storage/engine/enginepb/file_registry.proto, line 26 at r15 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

"Level" makes these sound ordered/comparable, but they're really not.

True. Right now, the higher level requires the lower level env, but if we add different types of ciphers we may want to give them their own levels (it's pointless to ask a CTR cipher to decode GCM cipher data). I could call them EnvType.


pkg/storage/engine/enginepb/file_registry.proto, line 53 at r15 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Why isn't this inside the encryption_settings protobuf? We don't create FileEntries for plaintext files, so this would just indicate whether it's "store" or "data", which only the CCL side cares about.

Combining this and my previous comment, I suggest moving this to the ccl key_registry.proto and renaming to something like KeyScope or KeyType.

We don't, but there's no reason we couldn't save some (or all) plaintext files as well. Ultimately, it's to be able to have the OSS side properly handle mismatched levels. This will become even more important when we have other users of the encryption envs. Having just a blob we can't read is a bit unfriendly.


Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: 17 of 35 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


c-deps/libroach/db.cc, line 1486 at r15 (raw file):

Previously, mberhault (marc) wrote…

Because we absolutely must check whether encryption was used before we start up, regardless of the current build mode or flags. Otherwise the only indication is garbage files (eg: rocksdb's CURRENT) and a very obscure error message.

Right, but can we get that from the FileRegistry (UsedLevels() == {0})?


pkg/storage/engine/enginepb/file_registry.proto, line 26 at r15 (raw file):

Previously, mberhault (marc) wrote…

True. Right now, the higher level requires the lower level env, but if we add different types of ciphers we may want to give them their own levels (it's pointless to ask a CTR cipher to decode GCM cipher data). I could call them EnvType.

EnvType works for me.


pkg/storage/engine/enginepb/file_registry.proto, line 53 at r15 (raw file):

We don't, but there's no reason we couldn't save some (or all) plaintext files as well.

Then we could store a bool for plaintext or not, or just infer it from the non-empty encryption_settings.

This will become even more important when we have other users of the encryption envs.

What other users are you thinking about?


Comments from Reviewable

@mberhault
Copy link
Contributor Author

Review status: 0 of 37 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


c-deps/libroach/db.cc, line 1486 at r15 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Right, but can we get that from the FileRegistry (UsedLevels() == {0})?

We can ask the file registry how many it needs, but if I register the wrong ones (eg: eventually we have AES-GCM, but you're specifying AES-CTR mode), I still need to know which ones are in use.


pkg/storage/engine/enginepb/file_registry.proto, line 26 at r15 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

EnvType works for me.

Done.


pkg/storage/engine/enginepb/file_registry.proto, line 53 at r15 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

We don't, but there's no reason we couldn't save some (or all) plaintext files as well.

Then we could store a bool for plaintext or not, or just infer it from the non-empty encryption_settings.

This will become even more important when we have other users of the encryption envs.

What other users are you thinking about?

True, but this is again about nicer error handling and misconfiguration detection. The other uses I'm thinking of is the sideloading stuff. We may be able to reuse the env, but not always (eg: debug commands).


Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: 0 of 37 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.


c-deps/libroach/db.cc, line 1486 at r15 (raw file):

Previously, mberhault (marc) wrote…

We can ask the file registry how many it needs, but if I register the wrong ones (eg: eventually we have AES-GCM, but you're specifying AES-CTR mode), I still need to know which ones are in use.

But that part of the validation can happen on the CCL side in DBOpenHook.


pkg/storage/engine/enginepb/file_registry.proto, line 53 at r15 (raw file):

Previously, mberhault (marc) wrote…

True, but this is again about nicer error handling and misconfiguration detection. The other uses I'm thinking of is the sideloading stuff. We may be able to reuse the env, but not always (eg: debug commands).

I'm not seeing how an int plus a blob on the OSS side is any better than just a blob. Any user of this is going to need to be able to see into the blob.


Comments from Reviewable

@mberhault
Copy link
Contributor Author

Review status: 0 of 37 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.


c-deps/libroach/db.cc, line 1486 at r15 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

But that part of the validation can happen on the CCL side in DBOpenHook.

Sure, I can check in the CCL hook 1) when I have no encryption args, and 2) when I do, then in OSS always. At some point, it seems simpler to have a single safeguard that all encryption objects are loaded before trying to start rocksdb.


pkg/storage/engine/enginepb/file_registry.proto, line 53 at r15 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I'm not seeing how an int plus a blob on the OSS side is any better than just a blob. Any user of this is going to need to be able to see into the blob.

It keeps all the EnvType checking within the SwitchingProvider, limiting repetition in the actual stream creators. For example, the stream creator doesn't care about the previous file when we're creating a new one, but the SwitchingProvider checks that any existing file is of the same type.


Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: 0 of 37 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


c-deps/libroach/rocksdbutils/env_encryption.cc, line 322 at r26 (raw file):

class EncryptedEnv : public rocksdb::EnvWrapper {
 public:
  EncryptedEnv(rocksdb::Env* base_env, EncryptionProvider* provider, CipherStreamCreator* creator)

I'm not sure passing a CipherStreamCreator is any better than passing the level. When I suggested getting rid of the level argument here, I was thinking of making the EncryptionProvider self-contained: there wouldn't be a SwitchingProvider any more. Instead, there would be a separate EncryptionProvider for each level. (The validation functionality now performed by SwitchingProvider would go in some "registry" object instead of in an EncryptionProvider)

Basically I'm trying to keep the stuff on the OSS side limited to "registry" type objects and anything dealing with encryption, cipher streams, etc on the CCL SIDE>


Comments from Reviewable

@mberhault mberhault requested a review from a team March 20, 2018 13:29
@mberhault mberhault force-pushed the marc/env_encrypt branch 2 times, most recently from 569c24f to dcea9bb Compare April 3, 2018 13:22
@mberhault mberhault force-pushed the marc/env_encrypt branch 2 times, most recently from 7c2d7ec to 4b2e509 Compare April 23, 2018 13:47
@mberhault
Copy link
Contributor Author

Dropped the switching provider in the last commit, making the encrypted env(s) directly own the cipher stream creators. Since last review batch, some small tweaks to 1) fix path handling (rocksdb something uses double slashes) 2) dump more info in the Go tool (still need to figure out the long term goal for this, most likely move it into debug commands)

@bdarnell
Copy link
Contributor

This is looking pretty good.


Reviewed 8 of 30 files at r30, 1 of 2 files at r31, 1 of 5 files at r32, 3 of 18 files at r36, 1 of 2 files at r38, 4 of 12 files at r39, 2 of 4 files at r40, 1 of 5 files at r43, 1 of 1 files at r44, 16 of 16 files at r45.
Review status: all files reviewed at latest revision, 2 unresolved discussions.


c-deps/libroach/ccl/ctr_stream.cc, line 133 at r45 (raw file):

  // XOR data with ciphertext.
  // TODO(mberhault): this is not an efficient XOR. Instead, we could move

You'd be surprised what compilers can do. GCC and Clang can both turn this into something optimal provided both data and scratch have the __restrict__ annotation (and the compiler is given either -O3 or -ftree-vectorize):

https://godbolt.org/g/nzgwQx


c-deps/libroach/ccl/db.cc, line 42 at r45 (raw file):

  // The Go code sets the "switching_env" storage version if we specified encryption flags,
  // but let's double check anyway.
  if (!db_opts.use_switching_env) {

Should use_switching_env be renamed now?


c-deps/libroach/ccl/db.cc, line 108 at r45 (raw file):

  // TODO(mberhault): enable at some point. We still want to make sure people do not use it.
  // return rocksdb::Status::InvalidArgument("encryption is not supported");

Remove these comments? I'm not too worried about people stumbling across unreleased enterprise features and using them.


c-deps/libroach/rocksdbutils/env_encryption.cc, line 22 at r45 (raw file):

//  and Apache 2.0 License (found in licenses/APL.txt in the root
//  of this repository).

Are there any particular parts of this file I should be looking at? It looks like it's mostly copied from the rocksdb version so I haven't gone through it too closely.


pkg/ccl/cmdccl/enc_utils/README.md, line 6 at r45 (raw file):

Given a key and a cockroach node started with:

$ openssl rand 48 > aes-128.key

Coming back to this after a long time away, the use of 48-byte keys is striking (why can't we just use 16-byte keys again?). We should (in a subsequent PR) offer a cockroach subcommand to generate keys so we don't need to duplicate the key length in so many places.


pkg/storage/engine/enginepb/file_registry.proto, line 53 at r15 (raw file):

Previously, mberhault (marc) wrote…

It keeps all the EnvType checking within the SwitchingProvider, limiting repetition in the actual stream creators. For example, the stream creator doesn't care about the previous file when we're creating a new one, but the SwitchingProvider checks that any existing file is of the same type.

Now that SwitchingProvider is gone, is it still useful to have EnvType here?


pkg/storage/engine/enginepb/file_registry.proto, line 46 at r45 (raw file):

  // Map of filename -> FileEntry.
  // Filename is relative to the rocksdb dir if the file is inside it.
  // Otherwise it is an absolute path.

What kinds of files could be written to this registry that are outside the rocksdb dir? I assume the reason for relative paths is that we want to allow moving the rocksdb directory to a different path; could that be needed for other files too? We might need an enum of different roots and always store paths relative to some root.


Comments from Reviewable

@mberhault
Copy link
Contributor Author

I've added some cleanup/rename/todos to #19783, some of them are trivial but touch a lot of files.


Review status: 37 of 38 files reviewed at latest revision, 8 unresolved discussions.


c-deps/libroach/ccl/ctr_stream.cc, line 133 at r45 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

You'd be surprised what compilers can do. GCC and Clang can both turn this into something optimal provided both data and scratch have the __restrict__ annotation (and the compiler is given either -O3 or -ftree-vectorize):

https://godbolt.org/g/nzgwQx

Yeah, I'm not too concerned, but cryptopp (and others) have optimized primitives for these kinds of things, it may be worth investigating depending on the benchmark/profiling results.


c-deps/libroach/ccl/db.cc, line 42 at r45 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Should use_switching_env be renamed now?

Sure. The main change now is the file registry. This touches the Go engine code as well, so I propose to do this in a followup.


c-deps/libroach/ccl/db.cc, line 108 at r45 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Remove these comments? I'm not too worried about people stumbling across unreleased enterprise features and using them.

Done.


c-deps/libroach/rocksdbutils/env_encryption.cc, line 322 at r26 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I'm not sure passing a CipherStreamCreator is any better than passing the level. When I suggested getting rid of the level argument here, I was thinking of making the EncryptionProvider self-contained: there wouldn't be a SwitchingProvider any more. Instead, there would be a separate EncryptionProvider for each level. (The validation functionality now performed by SwitchingProvider would go in some "registry" object instead of in an EncryptionProvider)

Basically I'm trying to keep the stuff on the OSS side limited to "registry" type objects and anything dealing with encryption, cipher streams, etc on the CCL SIDE>

Done.


c-deps/libroach/rocksdbutils/env_encryption.cc, line 22 at r45 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Are there any particular parts of this file I should be looking at? It looks like it's mostly copied from the rocksdb version so I haven't gone through it too closely.

The main bits are *AndCreateCipherStream which used to be in the SwitchingProvider. The rest is mostly removing all the prefix stuff, and passing different objects around. Not much changed.


pkg/ccl/cmdccl/enc_utils/README.md, line 6 at r45 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Coming back to this after a long time away, the use of 48-byte keys is striking (why can't we just use 16-byte keys again?). We should (in a subsequent PR) offer a cockroach subcommand to generate keys so we don't need to duplicate the key length in so many places.

This was when we switched to grabbing a key ID out of the key file itself as opposed to a hash of the key. The reason for the 256 bit ID length (which is definitely excessive) is to be able to tell from the size of the file what key size we're talking about and detect when the user generated the wrong size (eg: 16/24/32 bytes are invalid, you need 48/56/64).

I'm happy to revisit this, either in this PR or later (later might be best, it's a small, easy change after all).


pkg/storage/engine/enginepb/file_registry.proto, line 53 at r15 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Now that SwitchingProvider is gone, is it still useful to have EnvType here?

EnvType is still checked (see env_encryption.cc LookupAndCreateCipherStream and InitAndCreateCipherStream) to make sure we handle files at the right level. I also plan to address the TODO to check that the needed env levels are loaded (important to not screw up when forgetting encryption flags). It'll also be useful when computing statistics to go from file listing back to env level.


pkg/storage/engine/enginepb/file_registry.proto, line 46 at r45 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

What kinds of files could be written to this registry that are outside the rocksdb dir? I assume the reason for relative paths is that we want to allow moving the rocksdb directory to a different path; could that be needed for other files too? We might need an enum of different roots and always store paths relative to some root.

The most obvious ones will be ingested sstables and temporary work dir. There's quite a bit of work left to do for those, so I left this unspecified for now. I propose to flesh this out properly in the next phase ("encrypt other uses of local disk")


Comments from Reviewable

@mberhault
Copy link
Contributor Author

Ping? I think this is ready to go in post alpha sha picking.

@mberhault mberhault force-pushed the marc/env_encrypt branch 2 times, most recently from 5e345c9 to 7b6fee7 Compare May 1, 2018 16:45
@bdarnell
Copy link
Contributor

bdarnell commented May 2, 2018

:lgtm:


Reviewed 8 of 30 files at r49, 1 of 2 files at r50, 1 of 5 files at r51, 3 of 18 files at r55, 1 of 2 files at r57, 4 of 12 files at r58, 2 of 4 files at r59, 1 of 5 files at r62, 1 of 1 files at r63, 12 of 16 files at r64, 2 of 2 files at r65.
Review status: all files reviewed at latest revision, 2 unresolved discussions.


Comments from Reviewable

Part of encryption-at-rest (see cockroachdb#19783)

Quick overview:
* file registry records encryption settings for encrypted files
* an encrypted env has a "env level" (plain, store, data) and a key
manager as a source
* the data key manager uses an encrypted env with the store key manager as
key source
* rocksdb uses an encrypted env with the data key manager as key source

Release note: none
@mberhault mberhault force-pushed the marc/env_encrypt branch from 7b6fee7 to b641eac Compare May 2, 2018 15:55
@mberhault mberhault changed the title libroach: perform actual encryption libroach: encrypt data at rest May 2, 2018
@mberhault
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request May 2, 2018
21580: libroach: encrypt data at rest r=mberhault a=mberhault

Part of encryption-at-rest (see #19783)

Quick overview:
* file registry records encryption settings for encrypted files
* an encrypted env has a "env level" (plain, store, data) and a key
manager as a source
* the data key manager uses an encrypted env with the store key manager as
key source
* rocksdb uses an encrypted env with the data key manager as key source

Release note: none

25240: distsqlrun: forward-port regression test for topk panic r=jordanlewis a=jordanlewis

Release note: None

Co-authored-by: marc <[email protected]>
Co-authored-by: Jordan Lewis <[email protected]>
@craig
Copy link
Contributor

craig bot commented May 2, 2018

Build succeeded

@craig craig bot merged commit b641eac into cockroachdb:master May 2, 2018
@mberhault mberhault deleted the marc/env_encrypt branch May 2, 2018 18:41
mberhault pushed a commit to mberhault/cockroach that referenced this pull request May 2, 2018
Cleanup following cockroachdb#21580, the switching env went away.

Release note: None
craig bot pushed a commit that referenced this pull request May 3, 2018
25248: encryption: rename "switching env" format version to "file registry". r=mberhault a=mberhault

Cleanup following #21580, the switching env went away.

This is a rename only, the only thing persisted was the format version iota, and that has not changed.
 
Release note: None

Co-authored-by: marc <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants