-
Notifications
You must be signed in to change notification settings - Fork 261
Make MRenclave measurement independent of file options appearing in the Graphene manifest #2208
Comments
This would be an excellent feature. As Prakash mentioned, the primary use case is for federated learning. In this case, there would be an enclave on different remote nodes. Each enclave would run the same software (DL model training), but the data on each node would be different. The participants have to register their dataset hash prior to the federation even being proposed so we know the expected hash; but the worry is that as the process runs in the enclave that the data owner could change the data after its hash has been confirmed. So the question boils down to: Is there an easy method to make sure that the files in a trusted directory (outside of the enclave) don't change while the enclave exists? Or, do we need to explicitly check the data against the hash every time we load the files. |
Why not use protected files for this? See https://graphene.readthedocs.io/en/latest/manifest-syntax.html#protected-files. This is the intended way of shipping additional files at runtime. |
Do protected files get encrypted and loaded into the enclave? I guess what we're worried about is that this is typically the case where there is a directory of maybe a few thousand files with each file being maybe 100 MB in size. So I'm just wondering about performance issues for speed. |
Thank you for the question. In addition to performance hit that @tonyreina alluded to, I am afraid protected files fundamentally do not solve the problem at hand. Here is why: The data at rest resides with the data-owner. No one outside the data owner node has access to the data. So we are not "protecting" data from the data-owner. It does not matter if the data at rest is encrypted or not. There is no privacy issue here. And the attack we are worried about is the data-owner itself providing incorrect data during training after committing to a dataset before training starts. So as far as I understand, to solve this there are only ways 1. Either list the files as trusted files in the manifest or 2. The application after loading the data computes the hash and checks if this matches with what the data owner committed to before in the first place. Whether the data at rest is encrypted or not does not affect the situation, in my opinion. Thanks |
To put it another way, I think what @prakashngit and @tonyreina want boils down to this:
There is no need to encrypt or somehow additionally protect the files from the second set. Whatever we do for Of course, one can write her own code plugged into Graphene to implement this in an ad-hoc way. But this requires non-trivial implementation effort, and it makes sense to incorporate such code in mainline Graphene. Also, this is exactly the scenario envisioned for SGX's CONFIGID (see e.g. openenclave/openenclave#3054). So it makes sense to implement some default behavior for this in Graphene. |
Yes, they are encrypted at rest and decrypted + integrity-checked inside enclave memory. The decryption uses HW acceleration, so it should be quite fast (but still, a small overhead may be seen).
Thanks for the explanation, I think I now see what's your case exactly. I proposed protected files, because it's a superset of trusted files - it also does the integrity check and allows shipping files at runtime. The encryption is there and doesn't hurt, but you also get integrity in the package. Anyways, I'd suggest either using protected files or just listing everything as trusted in the manifest. It seems to me that you treat signing/verifying Graphene enclave and signing/verifying the input for it as completely separate things which can be done by different entities. Unfortunately, from what I know this doesn't make sense, at least in case of Graphene - if you control the filesystem contents, then you also control Graphene code, so the first signature (the one certifying that you're indeed running a specific Graphene version) is meaningless, the filesystem changes may override it. Maybe it could be possible to separate both, but that will always be app-specific and very brittle from the security perspective. So, my point is, you should think of input same as you think of code which runs inside SGX. You may just shift the signing of the enclave to the entity which provides input and you'll have the same security properties, but no need for two signatures. |
The reason not to use protected files is that the data files in question may have use in addition to the enclave usage, but placing them in protected files requires a copy of that (large) dataset in order to encrypt it. The envisioned use case only requires integrity protection, so using protected files creates a large wasted space (100's GB or TB). |
Ok, that's a good argument against using protected files in this case. Anyways, what about my last point? ("You may just shift the signing of the enclave to the entity which provides input and you'll have the same security properties, but no need for two signatures.") |
We want the attestation to represent the algorithm being performed on the input data (including algorithmic defenses against the platform host manipulating the data) to provide assurance to other parties. If the data owner (which in FL is likely the platform host) signs the enclave, then verification of the attestation does not prove anything of value to the verifier(s) since they would need the data to reason about what the attestation represents. If instead the attestation reflects the algorithm (code) and a guarantee that the algorithm faithfully prevents the data modification, a verifier can proceed to reason about the algorithm independently of the data. As a bonus all enclaves are the same at each endpoint. |
Ok, thanks for the explanation. This is exactly what I was afraid that you want to achieve with the solution initially suggested by @prakashngit. In short, this doesn't work this way and the guarantees you're expecting aren't actually provided by @prakashngit's design. The main problem is, that if you allow the data owner to define the contents of the filesystem, then they can easily take over the control of the whole enclave and e.g. overwrite the algorithm code (in runtime, so it wouldn't be reflected in MRENCLAVE), rendering the first measurement meaningless. Simple example: data owner plants a dynamic library (.so) in a place where it will take precedence over the intended one and this way executes their own code. And as you said, the "algorithm owner" doesn't have the input data, so they can't really verify if anything suspicious was added (quoting: "since they would need the data to reason about what the attestation represents").
So, because of what I said above, this is not possible, at least in general case. We could try to rescue this idea by limiting what can be added by the data owner, but this is highly app-specific and will almost always result in insecure setups, if left to the users. Example: a user wants to create a TF-as-a-service and allows others to provide TF models and data. The TF engine would go into MRENCLAVE, and the input would go into that second measurement. But what they don't know, is that TensorFlow SavedModel format wasn't designed to handle untrusted data, and if you control SavedModel data you can execute arbitrary code, thus breaking the TF-engine-provider assumptions. There are actually quite few models where this is secure, e.g. you could limit the data-owner powers to a specific directory with a specific file name patterns and then teach users which formats are secure and which aren't and ensure they load only the safe ones. But as said above, this makes the whole solution very app-specific and putting a lot of trust in its users, which most likely are not trained in security, especially in the tricky SGX threat model. |
My five cents: to me the main (actually, only) general requirement here is that you can reason as a verifier of an attestation compositionally/separately about (files of) separate components of your application and hence manage (e.g., distribute) root-of-trusts"/measurements independently (and potentially differently) for various components. E.g., In prakash's case it is about separating the application from the specific workload (so you can enforce that one party in the FL scenario uses consistently always the same dataset). Note that crucially from a assurance perspective you want exactly the same behavior from graphene runtime in terms of file integrity verification as if all the files would be all defined in the same manifest! Of course, as already currently, as application developer and attestation verifier you have already very careful make sure which files have to measured and which values can be trusted and where you can trust the application to process safely untrusted files (i.e., what you have in sgx.allowed_files). In fact, you could see Prakash's case as one where you would have simply placed the datasets in Picking up on what dmitrii mentioned above, one easy way to achieve above requirement would be to
I think that should be easy to do and enables compositional reasoning of attestation measurements. I'm also convinced that Prakash's scenario is unlikely the only use-case which has such a requirement. Another cases might be just to separate the core graphene files (PAL, maybe libc) from the actual application files as this also could make devops and policy-management easier from a separation of concern perspective even in simple outsourcing use-cases. |
How do you exactly define composition of the measurements? Because from what @nsxfreddy wrote (and from what @prakashngit explained to me in a call yesterday) the algorithm owner doesn't have the data which were added to the Graphene configuration, yet they want to be able to reason about the algorithm which is running there (so, they want to reason only from MRENCLAVE, and treat CONFIGID just as a black box commitment to something constant). And this is not possible, unless you restrict the added data very precisely (and usually in an app-specific way).
The difference is that in the solution with allowed files the algorithm owner could verify what files exactly are allowed in the config (i.e. paths and names). If you allow data owner to add trusted files entries, you're out of control what they add to the filesystem.
For this you can just use hash of the .tar.gz of Graphene release which was included in the final image (which you need to have) and you'll have exactly the same security properties. You can't reason about what Graphene binaries are running inside the enclave without knowing the application files, because they could overwrite Graphene. You either know all the binaries upfront or you can't really reason what's running inside, as Graphene doesn't have an internal security sandbox. Overall, I think it would be possible to add a very limited and restricted "measured mount" which would allow mounting a directory which would go into CONFIGID instead of MRENCLAVE, but that would require a lot of precautions and still would be very risky from security perspective. And this is already different from the design proposed in this issue, which assumed that data and code are separate and FS data can't take over code. |
You wouldn't treat CONFIGID as a blackbox, only the particular value of the content of the filenames specified in the CONFIGID manifest. Of course for this to work you have to rely on the application to do proper input-validation of these files. The semantics of the overall attestation of course has to be application specific and there is always some dependencies between the manifests of the components (as it true for any composition). The key part is though that it makes it easier to reason and work with if you can decompose.
Well, a data owner (or, in fact anybody ever running any enclave) of course can add arbitrary files with arbitrary paths, but the key part would be that as long as this would be visible in the atttestion (via MRENCLAVE and CONFIGID) and then a verifier simply can reject/ignore that enclave. The key-part for that to work, of course, are the steps in my proposal where the enclave verifies that the (hash of the) configid-manifest matches CONFIGID and then enforces the combined policy from both manifests as if they would have been in a single manifest
Hmm, but as a verifier of an attestation there is no obvious relation of the hash of the tar.gz and the measurement? That said, you can of course achieve the same compositional logic i have outlined also without CONFIGID. Assuming the manifest is loaded and measured last into EPC, you could essentially compute the state of the hash-function leading to MRENCLAVE and based on that value and the manifest, a verifier could compute MRENCLAVE and verify not based on MRENCLAVE as ground truth but based on a this partial hash state and a constraints on manifest values. (In fact, https://arxiv.org/abs/2008.09501v1 does that for slightly different reasons). It's just that with CONFIGID this is much simpler.
I think there is a disconnect here: in my proposal the verifier knows exactly all the filenames and all their hashes, just for some she might not care about what the actual content is as she trusts the application to do proper input validation. So for prakash there is no need for a sandbox. Of course, there could be other applications, e.g., where your main application is an interpreter such as with faas or for block chain applications like in Private Data Objects, where you would want the application to be a sandbox. But again, it's requirements on your applications, it's not something graphene has to provide.
I disagree, in my proposal there is nothing which distinguishes data from code from a graphene level (and practically speaking there is not much graphene can even do, given that any data essentially can be (interpreted) code). Of course, applications need to be super careful in how they write their manifest, but this is true even right now. I don't think anything in my proposal has fundamental changed that as the assurance provided in my proposal is equivalent to just having had all trusted files in a single manifest. Note that graphene really gives you (by necessity) only assurance that trusted files are guaranteed to be integrity protected but cannot say anything about the "goodness" of the file (e.g., non-buggy code, non-malicous data). The trust on the latter always will have to come from the application |
Yup, I think this was the confusion - I assumed that the algorithm owner doesn't know what was added to the manifest, only the resulting CONFIGID. I guess in the future I'll just ask for a detailed information and verification flows for such designs, will be harder to misinterpret than long discussions :)
I agree with your disagreement :) My comment (as noted above) assumed that the algorithm owner doesn't know the added entries. So, if we assume that the algorithm owner can verify all the entries in the manifest (original + added by data owner) then I think it's fine from security / security-foolproofness perspective. But... If both parties know all the entries, why bother with the split at all? (that's why I assumed that the algorithm owner doesn't know the added entries, and thus followed with my security concerns) Is it because you want to commit the files after the enclave was signed? But then, you could just either sign it on the data owner side or sign it after commitment - after all MRSIGNER is not really useful in this scenario if we already verify MRENCLAVE (MRENCLAVE is "stronger" than MRSIGNER).
I'd say it's just a tooling problem, which is quite easy to solve (much easier than implementing CONFIGID support). As noted, you need the final image of the enclave to reason about anything, and if you have it, it's quit easy to verify if it was generated from given Graphene binaries - for LibOS you check the hash of And maybe some explanation from my side why I'm pushing so hard to really understand what you're trying to achieve:
|
This i completely agree (and have mentioned to Mona et al in the past) ...
... although that part is somewhat debatable. I don't think CONFIGID support should be really complicated but it is certainly also true that all the code already exists somewhere in graphene for a non-CONFIGID solution. All you would have have to do is refactor and re-use it so (a) the build returns the pre-manifest-load-into-epc hash state (in addition to, or instead of, MRENCLAVE) and (b) there would be a library with a function which computes MRENCLAVE given this pre-manifest-load-into-epc hash-state and a manifest as input. It does have the advantage that in particular it wouldn't affect anything in the trusted part of graphene (but on the other hand seems a bit less intuitive for folks who are used to MRENCLAVE and know KSS/CONFIGID). |
So, do we agree that this whole issue is only about tooling, and KSS/CONFIGID doesn't give any more properties than we can have without it? So, the actual problem underneath this issue is "how can end users create enclaves compositionally"? (I'm not sure yet how to define precisely what "compositionally" means in this context, so that's open for input) If so, then I think we should analyze potential solutions in two aspects:
For 1. I think both solutions can be identical for the end users - whether we'll use CONFIGID or not, I wouldn't expect end users to know anything about it - they'll be just calling some CLI/library wrappers for this functionalities. |
It is rather the verification than the creation which is the issue. |
Isn't this the same thing actually? Building an enclave is just simulating loading of binaries to calculate hashes, and I think for verification you usually want to do the same (get the claimed Graphene binaries, get the specific app version, build and compare hash). One disclaimer: this assumes that the enclave building is reproducible, but I think it is in Graphene? Also, moving one step back, what about my question about the algorithm owner knowing the added manifest entries? (but not the data) If they know them, then the enclave can be easily built with just concatenation of the two entries lists (after verifying the contents of the latter). What's the problem with doing it this way? |
Reproducible build is a somewhat orthogonal argument to composition in verifcation i'm talking about, i.e., it is related to how you get confidence in various binary code artifact (pal, lib, executable) vs whether the artifacts plus other values together make a meaningful manifest. It's the latter which i think it would be good to make composable and enable some divide'n'conquer approaches. Yes, with current tooling you could, if you have all binary artifacts and manifest input compute mrenclave using the existing graphene tooling to verify whether an MRENCLAVE is "good" even though you didn't know apriori all "ingredients". However, if some values are relatively dynamic, that doesn't really work well: (a) It is also really all-or-nothing in terms of "ingredients" you need to create an MRenclave candidate to verify against during validation, which is neither efficiently nor very usable if you think from a programming perspective of this verification; and (b) more importantly, it doesn't work if the verification would happen inside an enclave as the tooling doesn't support that. I think my proposal above does address it, certainly for the two use cases mentioned, in a clean and simple way, e.g., hidding all gory details, many of them relying on graphene internals which might change, in a simple api which can be used during verification (i.e., my point (b))
See above (essentially, easy automation/programmability and support inside enclaves for the case the "ingredients" are not all more or less static and defined in "human time") |
@prakashngit @g2flyer @mkow What about a completely different approach? With Graphene, we've been thinking of having a central entity that would simplify SGX remote attestation and secret provisioning. Currently, we don't have a good attestation story when there is a cluster of SGX enclaves working towards one goal. E.g., the Federated Learning case has a bunch of loosely coupled SGX enclaves (on different machines, probably in different data centers), and the SGX remote attestation becomes a deployment/verification nightmare. So for a cluster of SGX enclaves, it makes sense to have a single entity that manages remote attestation and secret/key provisioning. The end users do not perform SGX attestation of separate enclaves but only perform SGX attestation and verification of a single "attestation service". And this "attestation service" is bootstrapped with a policy file that contains all the measurements/policies for each of the SGX enclaves. There are already multiple centralized attestation services existing (or announced): Microsoft Azure Attestation, SCONE CAS, Fortanix Confidential Computing Manager, etc. And it seems that there is already an open-source attestation service that may fit well: Marblerun Coordinator from Edgeless Systems. See https://www.youtube.com/watch?v=e_7q1uOpCqw and https://github.com/edgelesssys/marblerun. In particular, Marblerun Coordinator is:
Now applying this "centralized attestation service" to our use case of Federated Learning:
I believe this Coordinator ("centralized attestation service") approach also solves the other problem of @prakashngit -- #2243 ("Sign the Cert by an external Authority rather than Self Signed"). Since the Coordinator servers as a root CA (or maybe intermediate CA, if this is needed), there is no need to send RA-TLS-enhanced certificates to end users, but instead these can be normal classic X.509 certificates. What do you think about this approach? P.S. One obvious drawback of this approach is centralization -- now there is a Single Point of Failure: the Coordinator. On the other hand, it is much-much easier for deployment and attestation/verification. |
Thanks a lot Dmitii for summarizing this discussion. I think this approach for close integration of Graphene with Marblerun can serve a number of use cases even outside of K8/Service mesh deployments of confidential compute applications with Graphene. I am tagging Felix and Moritz from edgeless here as well @flxflx and @m1ghtym0 |
Let me answer @g2flyer first, then I'll comment on Dmitrii's and Mona's proposal in a separate comment.
Sorry, I wasn't clear, by "reproducible build" in this context I meant only the final step, the enclave "assembling" from binaries and configs.
So, I meant +/- this part :) Although I'm not sure what you mean by "making a meaningful manifest" - do you want Graphene tooling to safe-check the interactions between components assembled? Like checking if the added manifest entries can overwrite Graphene or something similar?
Can you give me an example where something less than all makes sense? In the design discussed in this thread it seems that both sides know all the entries in the final manifest (one just doesn't know the file contents, but they aren't directly part of MRENCLAVE). What current tooling is for sure missing is a way to stop it from calculating the hashes for those trusted files which have hashes added manually to the manifest (but that would be a trivial change).
Good point, I think the verification tooling should be written in no-stdlib C then, so that you can use it wherever you want. Ad your proposal, see below.
Hmm, but as I asked above in this very comment, could you show an example of a model/flow in which this makes sense? It seems to be that in the initial problem which was presented in this issue both parties know all the data which go directly into MRENCLAVE, so there's no need for these partial MRENCLAVEs at all (the idea itself sounds fine, although as with configid, I don't see a workflow/use case for them which doesn't reduce into a simple recalculation of MRENCLAVE over all the binaries). |
My comments on Dmitrii's and Mona's proposal below. Sounds good, but I think it requires these few points to which we already concluded with @g2flyer, that are needed also for @prakashngit use case, which are:
I think I got lost here. In 3. you mention "what we called the second, extended Graphene manifest" as something which is known to the Coordinator, but is a secret for the Data Owner. But originally this was known to both sides (and that's why I couldn't understand the split of the manifest into two parts, which I still don't understand why would ever be needed). Overall this proposal makes sense, but I think it solves a different problem - how to easily manage secret bootstrapping into enclaves at scale. The problem which @prakashngit had was that the contents of some trusted files can be unknown to the Algorithm Owner and Coordinator, but I believe this whole issue reduces to just making our attestation tools work with file hashes instead of file data (right now it forcibly calculates all trusted hashes itself). |
@mkow While Dmitrii used the term secret as in those hashes will be passed to enclave over a secure channel during the phase after attestation that is typically called secret provisioning. We need to get those hashes inside the enclave (old proposal suggested using config ID). We are just proposing sending over a secure channel as those hashes do need to be registered with the coordinator - both side know these and they are not secret for enclave. I also agree that our initial idea is to just move towards this overall approach to easy manage attestation and secret bootstrapping. Our general impression is that it will address both the requests that Prakash has for FL use case by various policies exposed by the coordinator. But we need to work though those scenarios in a bit more detail. |
Yes. I think it is. By "verification" of this, do you mean just a thorough code review of our
I don't see a need for this with the Coordinator approach:
...But in general yes, it would be nice to have a standalone nostdlib C library that calculates MRENCLAVE and other measurements in Graphene. Currently we don't have in our
Again, I propose a slightly different change to Graphene -- Graphene must be able to augment the in-enclave TOML representation of the manifest with additional entries (like ...But in general yes, this is a good feature to have and also trivial to add. TLDR: With the Coordinator approach, we don't have a "second extra manifest" but instead have a "Coordinator configuration file" which contains all these extra details and emits them to Graphene instances upon startup.
My wording was bad probably. I shouldn't have called it a "secret", it's just an extra piece of information for the base manifest. There is no secret in |
The approach you sketched makes sense to me @dimakuv. I think it is a good use case for Marblerun. In general, we're happy to help with integrating Graphene and Marblerun.
True. To reduce the risk, we plan to replicate the Coordinator using Raft (via etcd) in the future. |
@flxflx Great to hear that you are going to look at Raft support for coordinatore. That is definitely an area where we can work together. One other idea I have is that marblerun to explore integrating with Azure MAA and Azure Key vault and use attestation/key provisioing services provided by Azure instead of Marblerun. This way anyone integrating with Marblerun will also automatically get integration with Azure. What are your thoughts on that? |
That's an interesting thought. We have MAA support on the roadmap but haven't spent much thought on AKV. There primary reason being that AKV seems to rely on HSMs without RA. Thus, establishing a secure channel (in the CC sense) between AKV and enclaves doesn't seem possible. But I may be mistaken here. |
Yes. But the fact that the C loader and the Python signer give the same measurements is a bit reassuring here.
Ok, now that I re-read everything I see that it's not directly needed in this approach, as Coordinator doesn't need to calculate anything in runtime. But on the other hand (also, a bit off-topic to this whole discussion), this approach assumes that "all enclaves have the same MRENCLAVE, known to all parties" - in practice, all the parties will need a tool to actually verify this MRENCLAVE (i.e. take binary blobs of the claimed software and verify if it builds into the same enclave hash).
Yup, now I get it, this "secret" misled me. Overall, this approach seems to me like circumventing the inability to easily calculate the updated MRENCLAVE - we just push some of the variables to runtime, to not have to calculate the updated MRENCLAVE, right? If that's the case, wouldn't better tooling on our side solve this problem at the core? (as I proposed above) It wouldn't require any changes to Graphene internals (especially no new APIs) and I think it would solve all the problems at once. And it's still compatible with Marblerun, the difference would be that all the parties would calculate the MRENCLAVE with the appended manifest entries and use this as the expected MRENCLAVE. |
Yes, this is correct.
Yes, your proposed approach also works. |
The valid part of this issue will be resolved when we implement gramineproject/gramine#11. I'm closing this one to keep the discussion in one place. |
This is a feature request for supporting extended measurements via SGX's CONFIGID, (instead of measuring user trusted/allowed files via MRenclave.)
Context
In applications such as Federated learning, one to would like multiple data owners to execute the same training algorithm but with different datasets. We use graphene to run the training algorithm. In terms of the manifest, there is a base manifest that gets shared with all data owners. Each data owner then adds the list of data-files that it will use for training to the base manifest before building the graphene image.
Ideally, the data owner should list these data-files as trusted files on the manifest so that files used for training get pre-committed before the training task starts. However, in this case, all data owners will have different MRenclaves, and it becomes hard to use the MRenclave to verify that all data-owners are executing the same training algorithm.
Our current workaround is to mount a common folder as an allowed folder (and use wildcards as well) so that all manifests and hence MRenclaves are the same. The challenge with this workaround it becomes the application's responsibility to capture measurements of files before graphene deployment & verify integrity when files are loaded within graphene (since we longer can make use of the trusted file option).
Feature Request
From @dimakuv I learn that it is possible to separate measurements into two fields, 1) "Basic" measurement captured via MRENCLAVE and 2) "Extended" measurements (measuring list of trusted/allowed/protected files, etc) captured via CONFIGID.
It would be great for our use case (and I guess also a number of other data analytics use cases) if the above feature gets added to Graphene. If it is already part of the plan, could I kindly ask for a timeline of when this feature can be expected?
Thanks!
Prakash
The text was updated successfully, but these errors were encountered: