-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encrypted blob store repository - take I #46170
Encrypted blob store repository - take I #46170
Conversation
Pinging @elastic/es-security |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @albertzaharovits I will take a proper look as soon as I find some time for it (next week I'm afraid).
In general the appraoch looks fine and like what we discussed in the past. I found one issue with the blob sizes that we'll need to address though.
} | ||
Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding"); | ||
cipher.init(Cipher.ENCRYPT_MODE, dataEncryptionKey, gcmParameterSpec); | ||
this.delegatedBlobContainer.writeBlob(blobName, new CipherInputStream(inputStream, cipher), blobSize, failIfAlreadyExists); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong (and like it'll be a problem potentially!). The blobSize
for the purposes of cloud stores like S3 must be the exact number of bytes that will be written. This currently works with the FsRepository
because it doesn't use the size here, but it will write partial (since the encrypted bytes are more than the unencrypted) data with S3 and such.
Can we guess the size up front here in some form (via padding magic or so ... I'm admittedly not that knowledgeable here)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! We can compute the cyphertext length from the padding and mode we use. It's not complicated. I will make the changes.
|
||
@Override | ||
public Map<String, BlobMetaData> listBlobs() throws IOException { | ||
return this.delegatedBlobContainer.listBlobs(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work so easily. We need the correct metadata be returned for each blob here (i.e. the correct size, but now it returns the encrypted size)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll make the length computation in this case as well. Thanks!
As far as Q1 is concerned:
I think this is fine. In the end, currently neither
Yea sure we can concurrently read and write to the repository, we do it all the time by using the
I might lack some knowledge here, but why does a smaller chunk size in terms of the HMAC validation lead to more API calls? Can't we still write all the blocks for one unencrypted file (that are separately validated) into a single file and verify them as we stream them back to the client? |
Thanks a lot for the very fast feedback @original-brownbear !
I now see the example of To be clearer, for example:
Calling The metadata blob is small, therefore unnecessary (because the calls are eventually serialized) forking on a threadpool might be detrimental to performance. I will try using the threadpool and compare against no thread pool, we'll see!
I think you're suggesting that a single blob be a concatenation of several "encryption call instances". I think this is an option still on the table, there is nothing in the crypto precluding this. But the code would be more complex (we have to implement streaming ourselves, to recognize "segment" boundaries) and the AES algorithm already works by blocks (and it is "on-line" i.e. streaming ) and the BlobStoreRepository also has the blob notion. Introducing our own intermediate blocks/segments (i.e. encryption calls) would work but would surely be inelegant and adding complexity on the crypto code which I would abstain from. But it's still on the table I'll think about it, thanks! |
@albertzaharovits npnp :)
This is not an issue, the existing code already assumes these are threadsafe :)
I think it very much depends on the situation here. Keep in mind that on cloud backed repositories we can have something like That said, I wouldn't invest too much time into this. It might not be possible to parallelize non-segment blob writes at the moment without an API change anyway. Probably something to optimize in the future by moving the API of |
@albertzaharovits ah sorry forgot one of my comments :) On the whole HMAC topic: Can't we simply verify this via the |
Yes, this is a very good option, and it is a close option 2. For popularity/practicality reasons we have converged around AES encryption in the GCM mode. This mode has MAC calculation built-in, it is supposed to be hardware accelerated (if the JVM JIT compiles the code that does the decryption, and if using the SunJCE crypto provider), and we have guidance on errors to avoid in the usage in the NIST SP800-38D specification. Without recurring to buffering we have two options, that I will explore them both to gauge the performance:
GCM is uber popular and it minimizes the code we have to write that deals with crypto so I am favoring it as a first option. A close second is your suggestion to do our own HMAC on a ciphertext from an AES mode that does not do authentication. |
} | ||
Cipher cipher = Cipher.getInstance(ENCRYPTION_MODE, BC_PROV); | ||
cipher.init(Cipher.ENCRYPT_MODE, dataEncryptionKey, ivParameterSpec); | ||
this.delegatedBlobContainer.writeBlob(blobName, new CipherInputStream(inputStream, cipher), blobSize + GCM_TAG_BYTES_LENGTH, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CipherInputStream does not support marking
(i.e. CipherInputStream.markSupported()
returns false) which means that failed requests to S3 / GCS can't be auto-retried.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Urgh good find ... I guess we could create our own cipher stream that supports it, but then we'd either have to buffer the encrypted data to disk or reencrypt the whole thing on retry right? (maybe there's a better way that I don't see?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If those repositories (blob stores) only require reseting back to the beginning of the stream (to retry the whole request) then we could probably handle it with reencryption.
But if we want to mark
an arbitrary point in the stream and reset
to it, then re-encryption probably isn't going to be an option because you can't reset a Cipher
to an artibrary point, and that means any checksums will be thrown out because the Cipher will think it has processed more data than was actually written to the blob store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if we want to mark an arbitrary point in the stream and reset to it, then re-encryption probably isn't going to be an option because you can't reset a Cipher to an artibrary point, and that means any checksums will be thrown out because the Cipher will think it has processed more data than was actually written to the blob store.
My thinking was to simply reset back to 0
to get a fresh Cipher, but then just dump the bytes up to whatever point x
we actually want to reset to and only start producing the bytes starting from x
to have a CPU-expensive by IO-cheap way of implementing resetting to an arbitrary point. Wouldn't that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for raising this point @ywelsch !
I understand that this is a problem for upload (hence encryption). Decryption is the CipherInputStream
wrapping the cloud client's stream so it should not be affected. Please correct me if I'm mistaken.
I looked over the S3 SDK sources and I believe AwsChunkedEncodingInputStream
is where an input stream which does not support mark/reset
will be memory buffered during an update. This chunking is recommended for buffers larger than 100MB. Note that this chunking is different from the chunking we do at the blob level; this is chunking done inside the SDK during the multipart upload. The chunk buffers are generally small, only 128Kb by default. Hence I believe general mark/reset
is required (not only from the beginning).
Assuming that renouncing the chunked-upload-and-retry by the cloud SDK library is a last resort option, we're left with four other choices:
- Buffer the cipher text in memory. This incurs 128Kb of memory overhead, and the client losses the ability to control the chunk size (on the sdk level).
- Buffer the cipher text on disk on a spool file. Disk IO performance impact to be determined.
- Use
AES/CTR
withHmacSha512
. This encryption scheme is able to produce amark/reset
-ableCipherInputStream
(there is such an implementation in the BC library). This gets us coding the decryption and authentication separately, exposing us to some pitfalls, but I am confident we can straighten them all in reviews. AES/GCM
can be translated to anAES/CTR
scheme, therefore we mightbufferseek the plain text and redo the encryption upon areset
, using theCTR
cipher specifically configured for the seek position. The code will be nasty, but we can test that re-winded and re-encrypted streams are identical to the originalAES/GCM
ciphertext.
From my pure engineering perspective I would pick option 3. The code implemented with BC should be very neat and the caveats doing the MAC yourself are manageable. I don't like buffering given how much has been invested in developing seek-able streams al over the codebase (hence discounting 1 and 2), and option 4 is a bit too complex. That being said I think option 4, when it finally works, will be easier to "prove" correct (because we can test cipher text equality). Also, on-disk spooling, if feasible, would be a great leeway in terms of future configurability of the encryption plugin; maybe spooling is required by other features too?
I am curious what's your thinking on this @original-brownbear @ywelsch @tvernum .
In the mean time I will try to implement option 3 using BouncyCastle to see if there's a noticeable performance impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree that 3 seems like the cleanest option here.
But as I said above, I think 4 is fine too. The situation of having the reset the stream and retry the upload of a chunk should be a relatively rare occurence (obviously this depends on the specific networking situation, but even then the time spent on retries should be small relative to the overall snapshot time)
=> Assuming 3 doesn't work, I think it'd be better to spent some CPU here and go with 4 than to introduce the complications (there's a number of open questions this would raise in regards to disk usage IMO) of spooling to disk (2.) or restricting upload chunk sizing flexibility (1.).
@bizybot I think you can push the code that moves this plugin under the x-pack directory. |
9bba5f4
to
ab8d6c6
Compare
I am going to merge this into a feature branch, although it did not pass reviews. Given the excellent feedback that I've already got I know what needs changed (make I hope the following PRs that will address the issues raised here will not be perturbed by me merging this, since most the changes here are new code (it won't strain the reviewer with changes upon changes). |
ef2f8aa
into
elastic:client-side-encrypted-snapshot-repos
@albertzaharovits fyi elastic+elasticsearch+pull-request+multijob+packaging-tests-unix-next-sample was running since 37 hours on ubuntu-16.04 Ci workers and I had to cancel the job and delete the build as part of https://github.com/elastic/infra/issues/15235. |
@albertzaharovits elastic+elasticsearch+pull-request+multijob+packaging-tests-unix-next-sample is still running since 37 hours and taking 131GB on centos-7.
I won't stop the job and delete log files yet if you want to investigate. |
@jmlrt please cancel the job and delete any files. This PR has been merged into a feature branch it did not pass all compilation tests, ie I'm not interested in the results of those jobs. I will later fix the issues in PRs. |
This adds a new bare snapshot repository project which contains the classes implementing encryption (and decryption) input stream decorators that support mark and reset. Relates elastic#48221 , elastic#46170
This is the first iteration at at the client-side encrypted snapshot repository. I would much appreciate the @elastic/es-distributed confirmation about the approach in the code (I have specific questions) and I also have some questions about the next iteration.
To create the encrypted repository use the following command:
delegate_type
can be any type backed by aBlobStoreRepository
:fs
,url
,azure
,gcs
,s3
.password
will be stored in the keystore but it's easier to have it like this for nowQ1:
How do you feel about the approach of
EncryptedRepository
extendingBlobStoreRepository
and also containing anBlobStoreRepository
? Especially about the lifecycle methods:Q2:
How do you feel about having the encryption metadata (right now containing only the encrypted data encryption key) as a separate blob inside a blob container named from the master encryption key. Having a different blob for every data key helps with rotation in the presence of the eventual consistency issue (rotating keys creates a new file) and avoids partial updates for blobs.
But it does incur one extra call round trip to write the metadata blob - can we overlap the read/write ops, can two
writeBlob
orreadBlob
be called simultaneously on the sameBlobStoreReposity
?In addition I have another questions for the follow-up implementation:
To guard against chosen cyphertext attacks (manipulations of the cypher text result in predictable manipulations in the plaintext) that AES schemes are vulnerable to (in various proportions, depending on the mode) it is preferable we use authenticated encryption. This produces an "authentication tag" (a HMAC over the cyphertext) that we must verify before "trusting" the decryption result. For that we either have to buffer the cypher text, before decrypting, and after the tag validation succeeds. We can buffer this in memory or spool to disk in a temporary file. What approach would you favor? I would lean for the spool to disk. (This implementation does in-memory buffering which is disastruous performance wise).
And a related question: given that we would have to buffer (in memory or on disk), what's a chunk size we should use out of the box? Smaller chunk size requires less temporary buffers but increases the API counts.
No key rotation consideration in this.
This could be a separate plugin or as part of the xpack module (current), I don't think it matters for now.
Relates: #41910