Skip to content
This repository has been archived by the owner on Dec 6, 2022. It is now read-only.

Switch to @ipld/multiformats #7

Merged
merged 23 commits into from
Aug 28, 2020
Merged

Switch to @ipld/multiformats #7

merged 23 commits into from
Aug 28, 2020

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented May 13, 2020

The Uint8Array thing in multiformats.CID is really getting in the way of compatibility unfortunately, this conversion has to be all-or-nothing.

Currently not passing in the browser because it's still using ipld-dag-cbor which is having trouble covering CIDs to tags because of the Buffer / Uint8Array thing. When @ipld/dag-cbor is done this might help.

One interesting problem area is in constructing a CID from a parsed form. I'm using Buffer here for decoding and encoding and don't plan to switch from that because of the utility of it, but if I take a Buffer slice and feed it to new multiformats.CID() I get an error, so I have to wrap it:new Uint8Array(slice). This might be something we can handle in multiformats/bytes.js#coerce automatically.

Copy link
Member

@vmx vmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not deep enough into this code base. Why did you do the switch from the Block API to raw { cid, data }?

lib/util.js Outdated
return typeof block === 'object' && isCID(block.cid) && isBuffer(block.binary)
}

function isCID (key) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the isCID() check from CID just pass with old and new CIDs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I think it uses a symbol check still

So in this case, the latest iteration on my computer even drops the codec check so it's basically checking for new CIDs only and old CIDs won't pass. So I should probably drop this function entirely and use multiformats.CID.isCID() instead. In most places it's used it falls back to attempting a new multiformats.CID(thing.toString()) so that should be fine for converting old CIDs.

Trying to mix the two just doesn't work, it's like playing wac-a-mole with errors. I'm discovering that you need to be fully in either ecosystem for any of this to work. Code my my computer has dropped ipld-dag-cbor for the multiformats branch of @ipld/dag-cbor which seems to be working well but it's still using ipld-dag-pb and that gets very awkward as you should be able to see in fixture-data.js. If I was using more complex linking within those PB blocks then I'd be in real trouble but the simplicity of my graphs is saving me.

Things to either fix or document very well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I think it uses a symbol check still

Yes. @mikeal's plan was to use the same symbol for the new version. I also don't know if that's a good idea.

@rvagg
Copy link
Member Author

rvagg commented May 13, 2020

I'm not deep enough into this code base. Why did you do the switch from the Block API to raw { cid, data }?

Good question with a potentially interesting answer.

First: because I need this code to work now with my ipld-bitcoin code, and js-block hasn't had its makeover yet so I don't have an option to even use it and I don't want to rush Mikeal to get around to it.

But also, looping back around to our discussions previously about needing a better blockstore interface for IPLD, this library (and zipcar too) stretches the interface-datastore API too far, it's just not working with what I want to be able to do with it. key / value pairs where the key is assumed by default to be a string and there's not space for a single entity that contains both of these things. So for now, it's just cleaner to ignore js-block and go with a basic abstraction.

I'd like to change this and bring Block back as a first-class part of this, but it probably needs an honest assessment of interface-datastore just being the wrong fit and we need to come up with something new.

  • store.put(Block)
  • store.get(CID) : Block
  • store.has(CID) : boolean
  • store.iterator({ queryParams }?) : AsyncIterator<Block>
  • maybe store.cids({ queryParams }?) : AsyncIterator<CID>

Something like that, simple but enough primitives to build useful things. I'm sure Mikeal's work on dagdb could inform some of this too.

@vmx
Copy link
Member

vmx commented May 13, 2020

Good question with a potentially interesting answer.

Thanks for the detailed answer. So the tl;dr is that it's easier like this for now, but in the future the Block API should be used.

@rvagg
Copy link
Member Author

rvagg commented May 14, 2020

Tests all passing on Node.js and browser. Introduced a top-level factory: const CarDatastore = require('datastore-car')(multiformats) so you don't have to pass multiformats to everything (although it's still possible to do it that way if you really wanted to.

Needs docs but I think think this is otherwise done.

@rvagg
Copy link
Member Author

rvagg commented May 15, 2020

docs done

this will have to sit and wait though, it's got two dependencies pointing to github

Copy link
Member

@vmx vmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor comment that does not block merging.

lib/coding-browser.js Outdated Show resolved Hide resolved
@rvagg
Copy link
Member Author

rvagg commented May 22, 2020

Latest commit adds a readFileIndexed() mode. You give it a path to a CAR file, it'll do a quick scan of it and keep an index of CID:location mapping in memory and then do random access reads for get() and query() operations (quick from-memory has() of course too). So it's not a memory hog like readFileComplete() for large files and not as fast as readFileStreaming() for sequential reads but supports random access reads.

Pretty neat for interacting with large files and wanting to do arbitrary get()s.

@rvagg
Copy link
Member Author

rvagg commented Jun 29, 2020

Updated to latest multiformats with the ESM+CJS exports but poledina now fails to run. I'm guessing this is because of the self-referencing used in multiformats now which isn't supported by Webpack. I don't know if it's possible to sort that out at the top-level though. The hack used in multiformats/js-multiformats#22 probably isn't going to work here (process.cwd() is going to resolve to this package, not multiformats).

@mikeal
Copy link
Contributor

mikeal commented Jun 29, 2020

Webpack shouldn’t even be finding the CJS exports since the version in polendina doesn’t look at export maps. That’s why we keep the exported names identical to the module’s file layout, so the old webpack algorithms find the correct files without parsing the export map.

@rvagg
Copy link
Member Author

rvagg commented Jul 1, 2020

ESM is still a very sad land to live in. @mikeal the root cause is this: webpack/webpack#3929

multiformats/basics.js is bundled like this:

/*! exports provided: default */
/***/ (function(module, __webpack_exports__, __webpack_require__) {
"use strict";
__webpack_require__.r(__webpack_exports__);
var _index_js__WEBPACK_IMPORTED_MODULE_0__ = __webpack_require__(/*! ./index.js */ "./node_modules/multiformats/index.js");
...

const multiformats = Object(_index_js__WEBPACK_IMPORTED_MODULE_0__["create"])()
multiformats.multihash.add(_hashes_sha2_js__WEBPACK_IMPORTED_MODULE_5__["default"])
...

__webpack_exports__["default"] = (multiformats);
/***/ }),

And when I require('multiformats/basics.js') and that gets bundled, it does this:

const multiformats = __webpack_require__(/*! multiformats/basics.js */ "./node_modules/multiformats/basics.js")

But then I just get an object that has a single 'default' property.

So your mapping works OK (ish) on Node, but in Webpack you get a different behaviour still. I now have to either:

  1. add a test for .default every time I require() a multiformats thing and switch to that if so (no)
  2. switch to ESM everywhere (no)
  3. rage and say this library doesn't support the browser (what I'm going to do for now cause I have better things to do)
  4. something else?

Again, I think ESM is too new, this stuff is too immature and this is begging to take on more work than is necessary right now. ESM should still be considered broken until all of the tooling has a consistent story about how the pieces fit together.

@rvagg
Copy link
Member Author

rvagg commented Jul 1, 2020

and I can't bypass this crazyness with require('multiformats/dist/basics.cjs') because it's not defined by the package exports in package.json

@rvagg rvagg force-pushed the rvagg/multiformats branch from 9b8594e to 9dd5d1a Compare July 1, 2020 05:07
@rvagg
Copy link
Member Author

rvagg commented Jul 1, 2020

I've removed browser support. We just have to make sure this doesn't leak into the browser version of js-ipfs if we add cmdline js-ipfs export and js-ipfs import there.

@mikeal
Copy link
Contributor

mikeal commented Jul 1, 2020

looks like webpack has slightly different behavior here than I thought.

webpack merged support for package exports a few weeks ago, if you can update polendina to use the latest I can track down what the best option here is. it might actually do “the right thing” here and bundle the require endpoint by default, but even if it doesn’t I’ll have more options for how to deal with it.

@mikeal
Copy link
Contributor

mikeal commented Jul 1, 2020

this is actually a rather contentious issue in webpack that people have been arguing about for years webpack/webpack#4742

i’m pretty sure rollup doesn’t do this because we’re using rollup to do the CJS compile and it’s doing what you’d expect instead of this weird webpack thing.

@rvagg
Copy link
Member Author

rvagg commented Jul 1, 2020

if you can update polendina to use the latest

if I do that then I have to deal with the missing node_libs support and everything that's bundled needs to have explicit require()s and inclusion in package.json, including Buffer. That's my understanding at least and I'm hoping to avoid that because it's yet another viral rabbit hole that requires updating everything to work with it.

@rvagg
Copy link
Member Author

rvagg commented Jul 1, 2020

  • webpack v5 is still in beta (a very long beta) and imposes considerable migration costs on users (especially those trying to support both Node and the browser), there are good reasons that people are not jumping on board the v5 train in bulk at the moment but would rather defer those costs until v5 has properly solidified and the benefits are clear. It's not even clear that webpack@5 would solve this particular problem.
  • it's not just a matter of updating polendina (I've been considering putting out a polendina w/ webpack@5 version on an alt dist-tag for folks who want to start preparing), this is something we'd be asking all users that want to have a browser story to adopt unless they are prepared to migrate all of their stack to ESM.
  • webpack isn't the only bundler, ESM has a very inconsistent story across the ecosystem still. It'll clear up, but it's not clear today, at all.

multiformats is at the heart of everything IPLD and therefore IPFS, or at least that's the proposal for this new multiformats+IPLD stack. Therefore, unless you can provide a seamless experience for CJS on Node and in the browser with the various bundlers people are using, then you're essentially saying: either adopt ESM wholesale through your entire application stack, or adopt a specific bundler that'll do exactly what's needed to make this work.

This is very user-hostile and is not a good path to encouraging adoption. Maybe experimenting with ESM in a library that is optional and/or toward the edge of our stack would be a good idea, but this is imposing very large costs on anyone that wants to adopt our stuff. I'm not keen on saying that to potential users. There's also no good reason for this, at all. ESM provides us and our users with zero real benefits today. Yeah, there's some future utopia where we're serving all our files directly, but that future isn't soon and people are going to be bundling for a long time to come until the whole stack is solid and proven - from server to browser (I'd also like to place a bet on people still bundling for performance reasons even in the utopian parallel download http/2&3 world).

@rvagg rvagg force-pushed the rvagg/multiformats branch from d88bb07 to c58509c Compare August 27, 2020 09:14
@rvagg
Copy link
Member Author

rvagg commented Aug 27, 2020

Massive changes in here now! Full ESM support, using the latest ipjs method and I get to put browser support back in. I also took the opportunity to remove explicit Buffer support so it's Uint8Arrays all round now. I've picked up the latest js-multiformats with the breaking CID constructor changes. Upgraded to mocha@8 and a newer polendina to support that (there's a minor bug in 8.1.2 that leads to a bundle warning but it's not fatal and will presumably go away when resolved upstream).

Also pulled in the changes in #8 to here.

Good to merge and release I think, although very breaking so will be a 2.0.0.

The next major change (later) I think might be to extract the pieces of this that aren't needed for interface-datastore and move to a simpler API that isn't trying to be like a datastore, and use the Block abstraction for all operations rather than these awkward {key:string, value:Uint8Array} interfaces.

@rvagg
Copy link
Member Author

rvagg commented Aug 27, 2020

I added a verify-car.js executable in here, didn't add it to a "bin" list though because it's a bit limited so far. If you run it against a .car file and it has the right codecs installed it'll read the CAR and do a round-trip re-serialization of the blocks. I added this after doing a validation on Filecoin blockchain .car files while looking for a specific bad block.

@rvagg
Copy link
Member Author

rvagg commented Aug 28, 2020

Some more refactoring to clean up the CID handling.

I'm going to merge this but not release, I think I want to attempt to replace interface-datastore with something better, it's getting too frustrating.

@rvagg rvagg merged commit 57cc917 into master Aug 28, 2020
@rvagg rvagg deleted the rvagg/multiformats branch August 28, 2020 11:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants