-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create proposal for DAG encryption. #135
Conversation
I appreciate the level of thought and effort put into this @kevincox, I think @warpfork has outlined some of the reasons that this is very difficult to get over the line, although certainly not impossible. I think this can probably be merged as is, in the spirit of the notebook/ directory. In terms of pushing it forward, there are two things that I'd suggest:
To reiterate, both @warpfork and I have already suggested that dag-pb may not be the best place to start. While it's true that it's at the base of the majority of data that exists to day in IPFS because UnixFS relies on it, it's a home of many regrets and there's little interest in doubling-down on it rather than pursuing more modern and flexible alternatives. i.e. our current recommendation is CBOR / dag-cbor. We have some other folks that are considering and pursuing encryption at the data layer, so it would probably be good for us to get a bit more organised with some groups discussions/forums to start coming up with better answers than what we have available today so there's also likely to be some good networking and group opportunities to figure this out. |
|
||
#### Example | ||
|
||
QmVhiZNnvhqrbTLRsh6JnryJ5eTUwygTjTg5hUBKAfeP1H-z7Z9ZajGKvb6C6LaiB7fnsWQZNwq8roEKCFdt3Fbb1qt9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tricky one, there's so much infrastructure baked into the CID pattern, where it either needs to be a valid CID, or a CID + a valid path (/like/this
), otherwise these strings don't propagate through the various layers. So adding an arbitrary extension to a CID is going to be a difficult thing to make happen.
There's always handwaving we do about CIDv2 that may allow valid extensions for various reasons, but it really is just handwaving and thought-bubbling for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this will definitely break some assumptions. If you have a solution that could make this easier I'm open to ideas. I'm not particular on the exact format. But the "magic" in this design is that you need to pass a key in to get the decrypted data out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the helpful tips! Since you had so many various comments I decided to split the topics into different threads so that the discussion is easier to follow. (The lines they are opened on is irrelevant, but GitHub makes you pick one if you want threads)
@@ -0,0 +1,202 @@ | |||
# DAG-PB Encryption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Engage with folks who have more expertise in encryption
If you have ideas I'd be happy to invite them to the conversation.
it would probably be good for us to get a bit more organised with some groups discussions/forums
Also feel free to drag me into any relevant conversations. I'm just trying to move this forward.
@@ -0,0 +1,202 @@ | |||
# DAG-PB Encryption | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or piggy-back existing encryption standards more
I purposely tried to stay very vanilla with the encryption. We are simply encrypting blocks of data. The only interesting bit is that we are doing convergent encryption by using the hash of the object as the key. This has been done before and is well understood. If you are aware of any related standards that would be helpful I'll gladly consider if they provide any value.
# DAG-PB Encryption | ||
|
||
This is a proposed update to the DAG-PB spec that includes the possibility of transparent encryption. All discussion about the data-model and constraints of the previous specification apply except where specifically changed by this specification. It aims to be a complete superset of the previous spec so that eventually DAG-PB can be retained only for reading old blocks. For now it is expected that DAG-PB will continue to be used for unencrypted data until support for DAG-PB-Encrypted is widespread. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what dag-jose does
The dag-jose spec appears to be largely unrelated to what I am describing here. If I understand it is largely about representing a JOSE object in the DAG, whereas here I am talking about encrypting a DAG.
The JOSE spec itself also seemed unhelpful to the goal. It deals a lot with authentication, recipients and encoding of data structures. Since this is all already defined by the IPFS DAG with serialization it seems like unnecessary complication. We can simply focus on encrypting the data, then IPFS and IPNS can provide authenticity.
the JOSE spec might not be a bad place to start to look at some of these things because they lean on much more recent thinking about how best to structure encryption and signing standards
I'll try to find some time to read it closely, but the objective of the spec itself seems wildly unrelated to what is being proposed here. Although maybe you are right that some of the fundamentals used can reused here.
This is a proposed update to the DAG-PB spec that includes the possibility of transparent encryption. All discussion about the data-model and constraints of the previous specification apply except where specifically changed by this specification. It aims to be a complete superset of the previous spec so that eventually DAG-PB can be retained only for reading old blocks. For now it is expected that DAG-PB will continue to be used for unencrypted data until support for DAG-PB-Encrypted is widespread. | ||
|
||
Warning: I am not a cryptographer but tried to stick to simple patterns that have been encouraged by real cryptographers. If you are a cryptographer I would love feedback. (If desired I can keep you name private.) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PoC Implementation
Thanks. I'll definitely consider that but would like to get feedback on the spec first. The implementation would be very simple so it seems more important to get some early review first. (of course the implementation always looks simple, so it is important not to put it off for too long).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interesting thing in dag-jose that's relevant is that it takes an arbitrary IPLD "payload" and passes it through JOSE to get the right forms for encryption and/or signing. To avoid getting entangled in JOSE payload specifics they use an identity CID to encode dag-cbor data inside of and tell JOSE that it's just bytes. So anything that can be represented as dag-cbor (basically the entire IPLD data model) can be passes through this process.
Now there are concerns around the use of identity CIDs for this, but I think that it's an internal-enough concern that it we can treat it as simply an implementation detail and not something that gets exposed outside of the format.
So what dag-jose offers (in theory) is the ability to put arbitrary IPLD data through an existing encryption and/or signing format that's well specified and worked on by a large group of well qualified security researches, and it already has a large implementation base outside of what it's being used for with IPLD. So that doesn't seem like a bad starting point to me. Maybe we can do better ourselves, we've tried and others have tried, but we've yet to get anything off the ground.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm having trouble understanding the spec then. Because it seems it seems that it can only reference a CID. I also can't really find the docs of an identity CID but it sounds like that is keeping the data inline. Furtermore multiformats/multihash#130 suggests that CIDs must be quite small most people seem to agree <64B. So it seems that you can only encrypt really small dags. I must me missing something critical.
This spec talks about encrypting arbitrary block, and provides the system for referencing them which allows for arbitary size DAGs. If dag-jose does in fact provide that then maybe this spec isn't helpful.
Warning: I am not a cryptographer but tried to stick to simple patterns that have been encouraged by real cryptographers. If you are a cryptographer I would love feedback. (If desired I can keep you name private.) | ||
|
||
## Objective | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dag-pb may not be the best place to start
I am definitely interested in this conversation. I simply picked dag-pb because it is what UnixFS is based on, that is my main interest so this made sense. However if it would be more favorable to replace UnixFS with something based on the encoding format of the day I am completely fine with that. If it makes it easier to discuss we can start by just replacing the protocol buffer messages with something more abstract. Or if you are sure that COBR is the way to go and that we just want to replace UnixFS I can do that. At the end of the day I just want to get this implemented for UnixFS so whatever you think the best route is. I'm not the expert on the current preference of IPFS developers.
Some prior discussions that may be interesting:
There's been plenty more but we've not organised it very well! |
Okay, circling back around in my queue -- I confess I don't have any time allocated to review encryption proposals in depth. I don't anticipate this changing any time soon (unfortunately). Since this is going in as an exploration report file, I'm still provisionally okay with merging it, on the understanding that it collects some interesting notes (with no warranty implied!) and makes something discoverable in the hope that more community could find it and engage in the future. I think we could also put a link to this file in the repo from https://ipld.io/docs/synthesis/encryption/ , and I'd be happy to do that after merging. Any objections? |
I've added links to the page, as threatened :) @kevincox, do you mind if I also put a short tagline about the author at the top of the file? Just name and github handle and/or email or something. (We haven't done this 100% consistently in documents in the notebook folder, but it seems like a good practice, especially when the hope is that people connect over it.) |
Sure, you can put ***@***.*** or my GitHub handle.
|
See requests and discussion on ipfs/notes#270