Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dag-jose codec by default #8364

Closed
3 tasks done
Tracked by #8343
oed opened this issue Aug 20, 2021 · 13 comments · Fixed by #8569
Closed
3 tasks done
Tracked by #8343

Support dag-jose codec by default #8364

oed opened this issue Aug 20, 2021 · 13 comments · Fixed by #8569
Assignees
Labels
kind/feature A new feature need/maintainer-input Needs input from the current maintainer(s)
Milestone

Comments

@oed
Copy link

oed commented Aug 20, 2021

Checklist

  • My issue is specific & actionable.
  • I am not suggesting a protocol enhancement.
  • I have searched on the issue tracker for my issue.

Description

With ipld-prime merged (🎉) into go-ipfs we (Ceramic team) are currently working on creating a plugin that adds support for dag-jose to go-ipfs. However, the plugin approach has several disadvantages. The main one being that a secondary binary needs to be distributed and used by everyone that wants to support dag-jose. Another is that it's unclear how the plugin would work with various different versions of ipfs (maybe someone could provide clarity around this?).

To mitigate the issues with adding dag-jose using a plugin support could be added for it in go-ipfs directly. Created this ticket to gauge support for this from the ipfs maintainers.

DAG-JOSE adoption:
Several projects are already looking to use dag-jose and it really is the easiest integration point between the Decentralized Identifier (DID) community and IPFS.

  • Specification
  • In Ceramic, dag-jose is a central component
  • IdentityHubs also uses dag-jose as the main (currently only) way to store signed/encrypted data
  • Textile is experimenting with an alternative implementation of threads which uses dag-jose for signatures
  • QRI has expressed interest in the format as well

Implementations:

  • javascript, currently used in production by Ceramic
  • go, was created as part of a joint grant from PL + EF

Maintenance:
The go-dag-jose code is very minimal and the overhead in go-ipfs should be negligible. As for the go-dag-jose package itself, it was created by @alexjg, but the Ceramic team is happy to take over maintenance work. Repo can be transfered to ceramicnetwork or ipld github orgs, whatever makes the most sense.

If a go ahead is given we would be happy to contribute the integration to the go-ipfs code base!

@oed oed added the kind/feature A new feature label Aug 20, 2021
@BigLep
Copy link
Contributor

BigLep commented Aug 20, 2021

@oed: thanks for opening. The usecase makes sense. This is hitting on bigger questions on how 3rd party plugins are supported (onboarded, maintained, offboarding). I know there have been some conversations about this, but we need to track that down and get decisions made. I haven't engaged much here myself so don't have a timeline on this yet, but we'll provide an update on where we're at by end of next week.

@BigLep BigLep added the need/maintainer-input Needs input from the current maintainer(s) label Aug 20, 2021
@warpfork
Copy link
Member

+1 to @BigLep's comment that some story work is needed on what it means to have a plugin be contributed, what it would mean for something to be bundled or become default, etc.


Here are some technical questions I suspect will be asked, because they could be relevant to estimating impact on release processes and maintainability:

  • How many SLOC are new code? (As very crude heuristic for the real question, which is "how much maintenance might this require?")
  • What's the compiled binary size?
  • What's the churn rate of serial form and its mapping to DM? (AKA is the spec solid, and has it settled?)
  • What's the approximate churn rate of the implementation code?
  • Does the implementation code have any transitive dependencies that are unusually concerning in any of the above axis?

These are in addition to the other questions that are likely relevant (e.g. relating to the human side of ownership and maintenance, or relating to ecosystem usage pull size, etc). I'm identifying these as some of the more objective, technical, and thus hopefully easy-to-answer, metrics I can think of :)

@warpfork
Copy link
Member

warpfork commented Aug 25, 2021

Here as my attempts to answer those questions:

  • SLOC:
    • methodology: some good ol' find -name \*.go | xargs wc -l.
    • result: 2618
    • detail: This seems like a very moderate moderate number.
  • binary size:
    • methodology: todo
    • result: todo
  • churn in spec:
    • methodology: eyeballing it.
    • result: it seems solid.
    • detail: The dag-jose spec hasn't moved in a while. The upstream spec dag-jose is based on is firmly settled, afaik. I think @oed and friends have already been using this via the JS implementation and as a result we expect it to be load-bearing already, and that increases the confidence in solidity dramatically.
  • churn in implementation:
    • methodology: eyeballing it.
    • result: We don't know until it becomes load-bearing, I guess. But I see no reason to expect this to be bad/high.
    • detail: I have inspected the code briefly. There are probably more manual implementations of "Node" in here than seems like a joy, but it works, and I don't argue with what works. The interfaces this is coupled with are interfaces in go-ipld-prime that are coupled with very nearly "everything" so this does not represent significant new burdens (just a repeat the same burdens we already have).
    • detail 2: I remember we discussed for a long time whether this would be better implemented as an ADL vs a Codec. The reason we discussed this was concern that treating it as a Codec would result in a design where low-level details aren't inspectable. I don't know if we ever were able to confirm a definite lack of future problems like that from the design that happened. Personally: I don't care (and I love to say that -- it means good segmentation of boundaries). If the folks who are using this are also promising to maintain it, and they've already accepted whatever the situation is on this, then I'm happy to let them make those choices and own whatever the consequences are. Also: since, afaik, the JS implementation should've forced the same discussions: if problems haven't been raised there yet, we're probably good to assume problems won't be raised for the Go implementation either.
  • transitive dependencies:
    • gopkg.in/square/go-jose.v2...
      • is somewhat sizable -- 🤔 may be good reason to double-check the binary size impacts.
      • but is maintained by somebody else, and I think we can safely assume they'll maintain it well -- 👍 not worried about this.
      • EDIT: actually, this is also just a test dependency. Excellent. Even less to worry about.
    • that's it. (There are some testing dependencies that are large but I think we do not generally concern ourselves with this; it's unfortunate that go mod doesn't have a way to account for them separately, but that's a legibility problem, not a real problem.)

Measuring the binary size seems to be the most tricky part, so I'm going to post these notes with that still as a todo. I think the best way to do this will be to literally build the plugin and look at the file size of go-ipfs built with and without it. (The real-world-relevance number is how much larger the total binary gets -- so measuring the package alone is an underestimate, and measuring the package and its transitives alone is an overestimate because many dependencies will already be in go-ipfs. There exist tools like goda weight to explore binary sizes, but I don't think they help with this trivially, unfortunately.) @oed , do you think you have anyone who could do this and report? Or @aschmahmann do we have any tool or script that would make this check easy?

@oed
Copy link
Author

oed commented Aug 25, 2021

We would definitely be willing to look at the binary impact of this. If anyone has pointers for the best way to go about it that would be helpful ofc!

Afaict gopkg.in/square/go-jose.v2 is only a test dep?

About the spec: yeah we've been using js-dag-jose which implements the spec in production for quite some time. I don't expect the spec to change.

@warpfork
Copy link
Member

I'm not actually deeply familiar with this myself -- but I think https://github.com/ipfs/go-ipfs/blob/4e132af3ef11d4d1690a0e5b9e3639526eb742f6/docs/plugins.md#preloaded-plugins contains some description of how to get something built? Then just good ol' du -h bin/go-ipfs (or wherever it comes out), with and without the thing, is the ultimate answer.

(I think that how big the thing is if built as a golang plugin (concretely: go [...] -buildmode=plugin happens at some point) is a different question. That would be interesting to know too, but optional. We can perhaps skip that info because I think regardless of where the sourcecode actually lands, it seems as if we usually favor the build the all-in-one binary approach for operational reasons.)

@warpfork
Copy link
Member

warpfork commented Aug 25, 2021

Okay, couple updates:

  • yes, you're right, gopkg.in/square/go-jose.v2 is only a test dep. Even better.

.

  • I have built a go-ipfs that includes the dag-jose library and have size results! (I believe this is what the go-ipfs docs call "preloaded" style.)
    • The total go-ipfs binary is 60585kb without, 60713kb with. Empirically: dag-jose costs 128kb of binary size increase.
      • Cool. Not much. Not much at all!
      • Verified that this sounds sane via goda weight -h -- yeah, it describes something similar.
  • I have also, for fun, built the same dag-jose-glue-to-go-ipfs code using go plugin mode, and have size results!
    • It makes a plugin file that's a whopping 49M, lol.
      • Maybe let's not use this approach.
      • This is... some powerful additional support for internal discussions we've been having about whether go plugins should be, um, used. cough
        • n.b., terminology is confusing here. We talk about "plugins" to go-ipfs as a logical concept for go-ipfs, and then separately and distinctly there's "go plugins", by which I mean https://golang.org/pkg/plugin/ and some CLI incantations about go build -buildmode=plugin -- I'm saying the latter is problematic, not commenting on the former.

.

As a result of that last item that's an upstream need, I don't quite have working code to push into a PR right now. But I imagine it's close and I can hand this off to someone to finish together with that last bit of glue.

@warpfork
Copy link
Member

warpfork commented Aug 29, 2021

I come with notes from a meeting that included myself, @BigLep, @aschmahmann, @Stebalien , @momack2 , @jbenet , @rvagg , and @willscott . There has been a great deal of deliberation about this, because while we want to accept all the enhancements and features into go-ipfs that we reasonably can, we have to balance this against maintainability and stability. In addition to merely technical concerns, there are also social concerns, in that we should have some care regarding what we communicate and commit to by how we connect the things, and what we suggest by making something available by default.

Ultimately, our collective inclination is that yes, we will try to get dag-jose bundled into go-ipfs and available by default.

We feel that bundling this codec will be net valuable because of our overall high confidence in the implementors; becase of our faith that the work will be maintained; because the burden it adds in a mechanical sense (e.g. binary size) is fairly minimal; because the code is relatively simple and we do not think it is likely to represent an operational hazard; and we've been convinced that enough value will be provided to enough users to make the trade overall worth it.

This is not a choice made lightly, and it's also not one we're entirely thrilled about. There was a heavy expression of desire in the meeting to move towards some form of codec pluggability based on technology such as e.g. WASM, which would allow hot-loading of codecs (and, provide a form of sandboxing), and thus remove many reasons to deliberate about a process and conditions for codec inclusion at all. However, we ultimately had to realize that we don't have that technology in place today; and the dag-jose codec proposal is sufficiently well-developed and has sufficiently immediate need that it would not be reasonable to make it hold up until such hot-loading technology is in place.

It should be noted that we've done some very cursory review of the codec code, but with an aim of ensuring it won't destabilize go-ipfs, and little further. Specifically, we check for basic rules of engagement such as: the code does not reach out to the network; the code does not execute system processes; the code does not use the unsafe package; the code does not spawn goroutines unreasonably; the code does not have unreasonably large transitive dependencies that would unduly expand the surface area that go-ipfs is exposed to and may need to audit; etc. We do not review the code for logical correctness, fitness for purpose, cryptographic security, performance, etc. We assume that the developers and maintainers of the codec have done this work to their own satisfaction, and our bundling of it does not represent any especial or additional certification effort. Belatedly finding major issues in any of these areas could result in dropping the plugin from go-ipfs.

As to how exactly we expect to implement the relationship:

  • We will have a small folder of glue code in go-ipfs for the plugin, and use the "preload_list" mechanism to build it into the go-ipfs binary.
    • As a result of how that causes source code to be organized, that means all the dependencies will be pinned by go-ipfs's go.mod file, which provides an immediate answer to many operational questions.
  • The source code for the codec can stay where it is.
    • (We're worried about having versions pinned; but we do, per above. And we'd be worried about disaster recovery in case of repos disappearing; but this is no worse than for any other golang dependency; and also, the go module proxy servers are a last resort disaster recovery option that we can lean on with no special advance effort.)

Note that we are expecting that the codec will be maintained. We have not decided in advance on an exact policy for how go-ipfs will respond in the event that the codec falls out of maintenance, but if it becomes difficult to make new go-ipfs releases which pull in updates of other libraries that would also affect the codec, then it is possible that we would be forced to drop the codec plugin if the situation can not be resolved in reasonable time.


Next steps:

  • want encoder and decoder methods ceramicnetwork/go-dag-jose#4 does need to be addressed.
    • (As of today, there's already discussion ongoing in that issue about how to resolve it, so it seems reasonable to hope it will progress presently.)
  • I'll make my preliminary plugin wiring PR available. Edit: Here it is! outline of how dag-jose codec could be connected as a plugin. #8399
    • Someone will need to pick this up and finish it when the above issue has been addressed.
  • We have one more condition for bringing this into wide deployment: we need to get at least a handful of serial fixtures into the ipld meta-repo.
    • A likely place for this would be in some folder under https://github.com/ipld/ipld/tree/master/specs/codecs/dag-jose
    • The purpose of this is to be a forcing function to make sure we've agreed and locked in on exactly what the serial forms of the data are.
    • It's possible either @rvagg or I will try to do some path-forging work here to make an example of how we'd like the fixtures to be presented -- but we'll probably hand off a fill-in-the-blank sort of PR back to the folks working on dag-jose to finish filling in with data. If the dag-jose developers want to push this ahead without waiting for our cues, that's fine too -- we should be able to converge on something satisfactory no matter who gets the ball started rolling first.
  • We would like to see some end-to-end tests verifying that we know what data flowing in and out of the go-ipfs APIs looks like.
    • "sharness 53" should be a template of what this testing should look like.

@warpfork
Copy link
Member

I've made a draft PR which shows how the plugin wiring can be done: #8399

(It just contains some example code and highlights where the changes need to appear -- the best way to carry it forward is probably for someone from the ceramic team to grab that diff and iterate on it in their own PR to add dag-jose, and do any of the testing needed in go-ipfs in the same PR, etc. I'll close that draft PR when we get there. (Don't worry about trying to maintain attribution on that diff; it's not interesting enough :)))

@BigLep
Copy link
Contributor

BigLep commented Sep 28, 2021

Hi @oed: no pressure from PL on this, but wanted to see how this is going on your end so we plan accordingly.

@smrz2001
Copy link
Contributor

smrz2001 commented Sep 28, 2021

Hi @BigLep, I've been looking into the integration and made some progress. I've been traveling this week and will resume this activity once I'm back this weekend.

I "think" I'm close to completion but will post questions should I get stuck at any point.

@BigLep
Copy link
Contributor

BigLep commented Nov 13, 2021

Hi @smrz2001 : just curious how the integration is going. Again, no pressure here, but let us know if you have any questions.

@smrz2001
Copy link
Contributor

smrz2001 commented Nov 13, 2021

Hey @BigLep, I can finally say that I am a couple of days away from a fully ready PR 😁

The PR is up as a draft while I finalize the unit tests.

I'm also testing my branch against @rvagg's new dag-jose fixtures and working through a couple of questions there.

Some quick history/context for everyone monitoring this thread:
I had an earlier PR for this that I have now (with great help from @warpfork) almost completely reworked to use IPLD codegen.

There are now no handwired node assemblers and only small, surgical changes over generated code. The handwired assemblers were (at least for me) a nightmare to debug, and also (IMHO) somewhat fragile in general. Having said that, the prior version of the code was a great starting point thanks to Alex, and done before codegen was available.

@smrz2001
Copy link
Contributor

Just marked the PR ready for review.

I am still adding more tests, but the updated code is now working against @rvagg's fixtures PR.

@guseggert guseggert mentioned this issue Nov 23, 2021
80 tasks
@BigLep BigLep modified the milestones: go-ipfs 0.11, go-ipfs 0.13 Nov 23, 2021
@BigLep BigLep linked a pull request Nov 30, 2021 that will close this issue
7 tasks
@BigLep BigLep modified the milestones: go-ipfs 0.13, go-ipfs 0.11 Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature A new feature need/maintainer-input Needs input from the current maintainer(s)
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants