Encrypted blob store repository - take I #46170

albertzaharovits · 2019-08-30T11:24:22Z

This is the first iteration at at the client-side encrypted snapshot repository. I would much appreciate the @elastic/es-distributed confirmation about the approach in the code (I have specific questions) and I also have some questions about the next iteration.

To create the encrypted repository use the following command:

curl -X PUT "localhost:9200/_snapshot/my_fs_encrypted_backup?pretty" -H 'Content-Type: application/json' -d'
{
  "type": "encrypted",
  "settings": {
    "delegate_type": "fs",
    "password": "password",
    "location": "/tmp/encrypted_snapshot"
  }
}
'

delegate_type can be any type backed by a BlobStoreRepository: fs, url, azure, gcs, s3.
password will be stored in the keystore but it's easier to have it like this for now

Q1:
How do you feel about the approach of EncryptedRepository extending BlobStoreRepository and also containing an BlobStoreRepository ? Especially about the lifecycle methods:

    @Override
    protected void doStart() {
        this.delegatedRepository.start();
        super.doStart();
    }

    @Override
    protected void doStop() {
        super.doStop();
        this.delegatedRepository.stop();
    }

    @Override
    protected void doClose() {
        super.doClose();
        this.delegatedRepository.close();
    }

Q2:
How do you feel about having the encryption metadata (right now containing only the encrypted data encryption key) as a separate blob inside a blob container named from the master encryption key. Having a different blob for every data key helps with rotation in the presence of the eventual consistency issue (rotating keys creates a new file) and avoids partial updates for blobs.
But it does incur one extra call round trip to write the metadata blob - can we overlap the read/write ops, can two writeBlob orreadBlob be called simultaneously on the same BlobStoreReposity?

In addition I have another questions for the follow-up implementation:

To guard against chosen cyphertext attacks (manipulations of the cypher text result in predictable manipulations in the plaintext) that AES schemes are vulnerable to (in various proportions, depending on the mode) it is preferable we use authenticated encryption. This produces an "authentication tag" (a HMAC over the cyphertext) that we must verify before "trusting" the decryption result. For that we either have to buffer the cypher text, before decrypting, and after the tag validation succeeds. We can buffer this in memory or spool to disk in a temporary file. What approach would you favor? I would lean for the spool to disk. (This implementation does in-memory buffering which is disastruous performance wise).
And a related question: given that we would have to buffer (in memory or on disk), what's a chunk size we should use out of the box? Smaller chunk size requires less temporary buffers but increases the API counts.

No key rotation consideration in this.
This could be a separate plugin or as part of the xpack module (current), I don't think it matters for now.

Relates: #41910

elasticmachine · 2019-08-30T11:24:24Z

Pinging @elastic/es-security

original-brownbear

Thanks @albertzaharovits I will take a proper look as soon as I find some time for it (next week I'm afraid).
In general the appraoch looks fine and like what we discussed in the past. I found one issue with the blob sizes that we'll need to address though.

original-brownbear · 2019-08-30T14:14:50Z

x-pack/plugin/core/src/main/java/org/elasticsearch/snapshots/EncryptedRepository.java

+                }
+                Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding");
+                cipher.init(Cipher.ENCRYPT_MODE, dataEncryptionKey, gcmParameterSpec);
+                this.delegatedBlobContainer.writeBlob(blobName, new CipherInputStream(inputStream, cipher), blobSize, failIfAlreadyExists);


This seems wrong (and like it'll be a problem potentially!). The blobSize for the purposes of cloud stores like S3 must be the exact number of bytes that will be written. This currently works with the FsRepository because it doesn't use the size here, but it will write partial (since the encrypted bytes are more than the unencrypted) data with S3 and such.

Can we guess the size up front here in some form (via padding magic or so ... I'm admittedly not that knowledgeable here)?

Great catch! We can compute the cyphertext length from the padding and mode we use. It's not complicated. I will make the changes.

original-brownbear · 2019-08-30T14:18:04Z

x-pack/plugin/core/src/main/java/org/elasticsearch/snapshots/EncryptedRepository.java

+
+        @Override
+        public Map<String, BlobMetaData> listBlobs() throws IOException {
+            return this.delegatedBlobContainer.listBlobs();


This won't work so easily. We need the correct metadata be returned for each blob here (i.e. the correct size, but now it returns the encrypted size)

Yes, I'll make the length computation in this case as well. Thanks!

original-brownbear · 2019-09-01T15:52:55Z

As far as Q1 is concerned:

How do you feel about the approach of EncryptedRepository extending BlobStoreRepository and also containing an BlobStoreRepository ?

I think this is fine. In the end, currently neither doStart nor doStop really do much/anything in production code for any implementation ... the optics of wrapping those method migth not be great but I can't see a practical issue with it so it's fine with me.

But it does incur one extra call round trip to write the metadata blob - can we overlap the read/write ops, can two writeBlob orreadBlob be called simultaneously on the same BlobStoreReposity?

Yea sure we can concurrently read and write to the repository, we do it all the time by using the SNAPSHOT threadpool to parallelize uploads. Just make sure to not block in whatever things you code up and you're good to (and likely should :)) parallelize these operations.

To guard against chosen cyphertext attacks (manipulations of the cypher text result in predictable manipulations in the plaintext) that AES schemes are vulnerable to (in various proportions, depending on the mode) it is preferable we use authenticated encryption. This produces an "authentication tag" (a HMAC over the cyphertext) that we must verify before "trusting" the decryption result. For that we either have to buffer the cypher text, before decrypting, and after the tag validation succeeds. We can buffer this in memory or spool to disk in a temporary file. What approach would you favor? I would lean for the spool to disk. (This implementation does in-memory buffering which is disastruous performance wise).
And a related question: given that we would have to buffer (in memory or on disk), what's a chunk size we should use out of the box? Smaller chunk size requires less temporary buffers but increases the API counts.

I might lack some knowledge here, but why does a smaller chunk size in terms of the HMAC validation lead to more API calls? Can't we still write all the blocks for one unencrypted file (that are separately validated) into a single file and verify them as we stream them back to the client?

albertzaharovits · 2019-09-02T09:14:34Z

Thanks a lot for the very fast feedback @original-brownbear !

Yea sure we can concurrently read and write to the repository, we do it all the time by using the SNAPSHOT threadpool to parallelize uploads. Just make sure to not block in whatever things you code up and you're good to (and likely should ) parallelize these operations.

I now see the example of BlobStoreRepository#snapshotFile being called on the SNAPSHOT threadpool, thanks!
But I worry that all the calls are done using the same client of the same BlobStore instance.

To be clearer, for example:

        @Override
        public void writeBlob(String blobName, InputStream inputStream, long blobSize, boolean failIfAlreadyExists) throws IOException {
                try (InputStream stream = new ByteArrayInputStream(wrappedDataEncryptionKey)) {
                    this.encryptionMetadataBlobContainer.writeBlob(blobName, stream, wrappedDataEncryptionKey.length, failIfAlreadyExists);
                }
                this.delegatedBlobContainer.writeBlob(blobName, new CipherInputStream(inputStream, cipher), blobSize, failIfAlreadyExists);
        }

Calling writeBlob on encryptionMetadataBlobContainer and delegatedBlobContainer boils down to the same BlobStore instance and cloud client.

The metadata blob is small, therefore unnecessary (because the calls are eventually serialized) forking on a threadpool might be detrimental to performance.

I will try using the threadpool and compare against no thread pool, we'll see!

why does a smaller chunk size in terms of the HMAC validation lead to more API calls? Can't we still write all the blocks for one unencrypted file (that are separately validated) into a single file and verify them as we stream them back to the client?

I think you're suggesting that a single blob be a concatenation of several "encryption call instances". I think this is an option still on the table, there is nothing in the crypto precluding this. But the code would be more complex (we have to implement streaming ourselves, to recognize "segment" boundaries) and the AES algorithm already works by blocks (and it is "on-line" i.e. streaming ) and the BlobStoreRepository also has the blob notion. Introducing our own intermediate blocks/segments (i.e. encryption calls) would work but would surely be inelegant and adding complexity on the crypto code which I would abstain from. But it's still on the table I'll think about it, thanks!

original-brownbear · 2019-09-02T09:26:02Z

@albertzaharovits npnp :)

But I worry that all the calls are done using the same client of the same BlobStore instance.

This is not an issue, the existing code already assumes these are threadsafe :)

The metadata blob is small, therefore unnecessary (because the calls are eventually serialized) forking on a threadpool might be detrimental to performance.

I think it very much depends on the situation here. Keep in mind that on cloud backed repositories we can have something like 100ms in latency for a write routinely.
For operations that write a large number of blobs I would agree that parallelizing the write of a and its metadata doesn't get you much because you can only do so many writes in parallel anyway.
Buf for operations that only write a small number of blobs (all the things master does) it would nicely lower the overall time that e.g. finalizing a snapshot would take.

That said, I wouldn't invest too much time into this. It might not be possible to parallelize non-segment blob writes at the moment without an API change anyway. Probably something to optimize in the future by moving the API of Repository to something async for all operations first.

original-brownbear · 2019-09-02T11:31:59Z

@albertzaharovits ah sorry forgot one of my comments :)

On the whole HMAC topic:

Can't we simply verify this via the InputStream we create from an encrypted blob? As we stream we can keep calculating the checksum/hash (forgive me for not knowing the correct terminology here :)) and then when we read the last byte (we can wrap the original stream in some way to "read 1 byte ahead" or so to make this work nicely) we check if it matches with expectations and throw if it doesn't?
That wouldn't require any buffering would it?

albertzaharovits · 2019-09-02T12:14:31Z

Can't we simply verify this via the InputStream we create from an encrypted blob? As we stream we can keep calculating the checksum/hash (forgive me for not knowing the correct terminology here :)) and then when we read the last byte (we can wrap the original stream in some way to "read 1 byte ahead" or so to make this work nicely) we check if it matches with expectations and throw if it doesn't?
That wouldn't require any buffering would it?

Yes, this is a very good option, and it is a close option 2.

For popularity/practicality reasons we have converged around AES encryption in the GCM mode. This mode has MAC calculation built-in, it is supposed to be hardware accelerated (if the JVM JIT compiles the code that does the decryption, and if using the SunJCE crypto provider), and we have guidance on errors to avoid in the usage in the NIST SP800-38D specification.
Unfortunately this same specification recommends against "interpreting" decrypted cypher text before validating the authentication tag, which is the reason for the internal buffering of the GaloisCounterMode cipher from SunJCE. As you pointed out, in a separate discussion we can be assured of it in FileRestoreContext, that a partial blob with an IOException at the end, when verifying the authentication tag, will not get "interpreted" (This made redundant the temporary spooling file for decrypted cryphertext that I am asking about). But there is no way to turn the internal buffering off in the GaloisCounterMode cipher in the SunJCE provider.

Without recurring to buffering we have two options, that I will explore them both to gauge the performance:

Use a GCM implementation from the Bouncy Castle Crypto provider that does not employ the internal buffering.
Use a different mode, CBC, which does not offer authentication, and wrap a HMAC computation on the CipherText to obtain the authentication feature.

GCM is uber popular and it minimizes the code we have to write that deals with crypto so I am favoring it as a first option. A close second is your suggestion to do our own HMAC on a ciphertext from an AES mode that does not do authentication.

ywelsch · 2019-09-05T14:26:38Z

...ry-encrypted/src/main/java/org/elasticsearch/repositories/encrypted/EncryptedRepository.java

+                }
+                Cipher cipher = Cipher.getInstance(ENCRYPTION_MODE, BC_PROV);
+                cipher.init(Cipher.ENCRYPT_MODE, dataEncryptionKey, ivParameterSpec);
+                this.delegatedBlobContainer.writeBlob(blobName, new CipherInputStream(inputStream, cipher), blobSize + GCM_TAG_BYTES_LENGTH,


CipherInputStream does not support marking (i.e. CipherInputStream.markSupported() returns false) which means that failed requests to S3 / GCS can't be auto-retried.

Urgh good find ... I guess we could create our own cipher stream that supports it, but then we'd either have to buffer the encrypted data to disk or reencrypt the whole thing on retry right? (maybe there's a better way that I don't see?)

If those repositories (blob stores) only require reseting back to the beginning of the stream (to retry the whole request) then we could probably handle it with reencryption.
But if we want to mark an arbitrary point in the stream and reset to it, then re-encryption probably isn't going to be an option because you can't reset a Cipher to an artibrary point, and that means any checksums will be thrown out because the Cipher will think it has processed more data than was actually written to the blob store.

But if we want to mark an arbitrary point in the stream and reset to it, then re-encryption probably isn't going to be an option because you can't reset a Cipher to an artibrary point, and that means any checksums will be thrown out because the Cipher will think it has processed more data than was actually written to the blob store.

My thinking was to simply reset back to 0 to get a fresh Cipher, but then just dump the bytes up to whatever point x we actually want to reset to and only start producing the bytes starting from x to have a CPU-expensive by IO-cheap way of implementing resetting to an arbitrary point. Wouldn't that work?

Thanks for raising this point @ywelsch !

I understand that this is a problem for upload (hence encryption). Decryption is the CipherInputStream wrapping the cloud client's stream so it should not be affected. Please correct me if I'm mistaken.

I looked over the S3 SDK sources and I believe AwsChunkedEncodingInputStream is where an input stream which does not support mark/reset will be memory buffered during an update. This chunking is recommended for buffers larger than 100MB. Note that this chunking is different from the chunking we do at the blob level; this is chunking done inside the SDK during the multipart upload. The chunk buffers are generally small, only 128Kb by default. Hence I believe general mark/reset is required (not only from the beginning).

Assuming that renouncing the chunked-upload-and-retry by the cloud SDK library is a last resort option, we're left with four other choices:

Buffer the cipher text in memory. This incurs 128Kb of memory overhead, and the client losses the ability to control the chunk size (on the sdk level).

Buffer the cipher text on disk on a spool file. Disk IO performance impact to be determined.

Use AES/CTR with HmacSha512. This encryption scheme is able to produce a mark/reset-able CipherInputStream (there is such an implementation in the BC library). This gets us coding the decryption and authentication separately, exposing us to some pitfalls, but I am confident we can straighten them all in reviews.

AES/GCM can be translated to an AES/CTR scheme, therefore we might ~~buffer~~ seek the plain text and redo the encryption upon a reset, using the CTR cipher specifically configured for the seek position. The code will be nasty, but we can test that re-winded and re-encrypted streams are identical to the original AES/GCMciphertext.

From my pure engineering perspective I would pick option 3. The code implemented with BC should be very neat and the caveats doing the MAC yourself are manageable. I don't like buffering given how much has been invested in developing seek-able streams al over the codebase (hence discounting 1 and 2), and option 4 is a bit too complex. That being said I think option 4, when it finally works, will be easier to "prove" correct (because we can test cipher text equality). Also, on-disk spooling, if feasible, would be a great leeway in terms of future configurability of the encryption plugin; maybe spooling is required by other features too?

I am curious what's your thinking on this @original-brownbear @ywelsch @tvernum .
In the mean time I will try to implement option 3 using BouncyCastle to see if there's a noticeable performance impact.

I would agree that 3 seems like the cleanest option here.
But as I said above, I think 4 is fine too. The situation of having the reset the stream and retry the upload of a chunk should be a relatively rare occurence (obviously this depends on the specific networking situation, but even then the time spent on retries should be small relative to the overall snapshot time)
=> Assuming 3 doesn't work, I think it'd be better to spent some CPU here and go with 4 than to introduce the complications (there's a number of open questions this would raise in regards to disk usage IMO) of spooling to disk (2.) or restricting upload chunk sizing flexibility (1.).

albertzaharovits · 2019-09-12T17:13:40Z

@bizybot I think you can push the code that moves this plugin under the x-pack directory.
Given that the code sits in a few self contained files, moving them around to another directory should carry @original-brownbear 's comments. Worst case we can revert the move, and open another PR if the comments get in an unusable state.

…ticsearch into encrypted-repo-poc

albertzaharovits · 2019-10-10T00:01:45Z

I am going to merge this into a feature branch, although it did not pass reviews.

Given the excellent feedback that I've already got I know what needs changed (make CipherInputStream support mark) but I don't want to hold off this PR until I get there and instead prefer to unlock @bizybot .

I hope the following PRs that will address the issues raised here will not be perturbed by me merging this, since most the changes here are new code (it won't strain the reviewer with changes upon changes).

jmlrt · 2019-10-11T13:09:03Z

@albertzaharovits fyi elastic+elasticsearch+pull-request+multijob+packaging-tests-unix-next-sample was running since 37 hours on ubuntu-16.04 Ci workers and /var/lib/jenkins/jobs/elastic+elasticsearch+pull-request+multijob+packaging-tests-unix-next-sample/configurations/axis-os/ubuntu-16.04-packaging/builds/183/build-log s.log used 245GB on elasticsearch-ci master.

I had to cancel the job and delete the build as part of https://github.com/elastic/infra/issues/15235.

jmlrt · 2019-10-11T13:14:49Z

@albertzaharovits elastic+elasticsearch+pull-request+multijob+packaging-tests-unix-next-sample is still running since 37 hours and taking 131GB on centos-7.

root@elasticsearch-ci-master-blue:/var/lib/jenkins/jobs/elastic+elasticsearch+pull-request+multijob+packaging-tests-unix-next-sample/configurations/axis-os/centos-7-packaging/builds/183# du -sh *
131G    build-logs.log
12K     build-logs.yml
4.0K    build-results.yml
496K    changelog.xml
1.5G    log
4.0K    meta.json
4.0K    module
16K     registry

I won't stop the job and delete log files yet if you want to investigate.

albertzaharovits · 2019-10-11T13:18:39Z

@jmlrt please cancel the job and delete any files. This PR has been merged into a feature branch it did not pass all compilation tests, ie I'm not interested in the results of those jobs. I will later fix the issues in PRs.

This adds a new bare snapshot repository project which contains the classes implementing encryption (and decryption) input stream decorators that support mark and reset. Relates #48221 , #46170

This adds a new bare snapshot repository project which contains the classes implementing encryption (and decryption) input stream decorators that support mark and reset. Relates elastic#48221 , elastic#46170

albertzaharovits added 12 commits August 20, 2019 10:02

POC

1b648a7

EncryptedRepository lifecycle

e48c568

encryptionMetadataBlobPath

61e4f2a

Almost...

5b7f5ea

Done???

23bcd23

Merge branch 'master' into encrypted-repo-poc

9e50384

Nit

f7ac3ed

WORKS!

d892c2c

Merge branch 'master' into encrypted-repo-poc

5f8d77b

Merge branch 'master' into encrypted-repo-poc

ad6f14a

Chunk size

f1a44de

SunJCE mrrrr

7b3eb4d

albertzaharovits added >feature :Distributed Coordination/Snapshot/Restore :Security/Security labels Aug 30, 2019

albertzaharovits requested review from ywelsch and original-brownbear August 30, 2019 11:24

albertzaharovits self-assigned this Aug 30, 2019

original-brownbear reviewed Aug 30, 2019

View reviewed changes

Merge branch 'master' into encrypted-repo-poc

a54513c

Always failIfExists for encryption metadata

43087e5

albertzaharovits added 3 commits September 2, 2019 13:49

Parameterize for provider and chunk size

c160245

compile oversight

85b1803

License

7345c9b

albertzaharovits added 2 commits September 2, 2019 18:59

Adjust sizes

5fd4e61

Refactoring in a new plugin WIP

24378fc

albertzaharovits mentioned this pull request Sep 4, 2019

Client Side Encrypted Snapshot Repositories #41910

Open

24 tasks

Works!

c1649c8

ywelsch reviewed Sep 5, 2019

View reviewed changes

Merge branch 'master' into encrypted-repo-poc

411f5da

Changes to move encrypted snapshots code to x-pack module

ab8d6c6

bizybot force-pushed the encrypted-repo-poc branch from 9bba5f4 to ab8d6c6 Compare October 1, 2019 04:51

albertzaharovits added 5 commits October 3, 2019 23:32

Merge branch 'encrypted-repo-poc' of github.com:albertzaharovits/elas…

5e75538

…ticsearch into encrypted-repo-poc

Merge branch 'master' into encrypted-repo-poc

de4aeb9

FIPS libs

69fc7e5

Straight GCM but with the bc-fips lib

b483b57

Merge branch 'master' into encrypted-repo-poc

8345c3a

albertzaharovits changed the base branch from master to client-side-encrypted-snapshot-repos October 9, 2019 23:48

albertzaharovits marked this pull request as ready for review October 9, 2019 23:51

albertzaharovits merged commit ef2f8aa into elastic:client-side-encrypted-snapshot-repos Oct 10, 2019

albertzaharovits mentioned this pull request Oct 18, 2019

Encrypted Repository with AES GCM Packets #48221

Closed

albertzaharovits mentioned this pull request Dec 6, 2019

Packet-based enc/dec cipher streams #49896

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encrypted blob store repository - take I #46170

Encrypted blob store repository - take I #46170

albertzaharovits commented Aug 30, 2019 •

edited

Loading

elasticmachine commented Aug 30, 2019

original-brownbear left a comment

original-brownbear Aug 30, 2019

albertzaharovits Sep 2, 2019

original-brownbear Aug 30, 2019

albertzaharovits Sep 2, 2019

original-brownbear commented Sep 1, 2019

albertzaharovits commented Sep 2, 2019 •

edited

Loading

original-brownbear commented Sep 2, 2019

original-brownbear commented Sep 2, 2019

albertzaharovits commented Sep 2, 2019 •

edited

Loading

ywelsch Sep 5, 2019

original-brownbear Sep 5, 2019

tvernum Sep 9, 2019 •

edited

Loading

original-brownbear Sep 9, 2019

albertzaharovits Sep 12, 2019 •

edited

Loading

original-brownbear Sep 13, 2019

albertzaharovits commented Sep 12, 2019

albertzaharovits commented Oct 10, 2019

jmlrt commented Oct 11, 2019

jmlrt commented Oct 11, 2019

albertzaharovits commented Oct 11, 2019

Encrypted blob store repository - take I #46170

Encrypted blob store repository - take I #46170

Conversation

albertzaharovits commented Aug 30, 2019 • edited Loading

elasticmachine commented Aug 30, 2019

original-brownbear left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

original-brownbear commented Sep 1, 2019

albertzaharovits commented Sep 2, 2019 • edited Loading

original-brownbear commented Sep 2, 2019

original-brownbear commented Sep 2, 2019

albertzaharovits commented Sep 2, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tvernum Sep 9, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

albertzaharovits Sep 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

albertzaharovits commented Sep 12, 2019

albertzaharovits commented Oct 10, 2019

jmlrt commented Oct 11, 2019

jmlrt commented Oct 11, 2019

albertzaharovits commented Oct 11, 2019

albertzaharovits commented Aug 30, 2019 •

edited

Loading

albertzaharovits commented Sep 2, 2019 •

edited

Loading

albertzaharovits commented Sep 2, 2019 •

edited

Loading

tvernum Sep 9, 2019 •

edited

Loading

albertzaharovits Sep 12, 2019 •

edited

Loading