Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

How to get/cat a hash when don't know the type of the hash? #1049

Closed
mitra42 opened this issue Oct 23, 2017 · 9 comments
Closed

How to get/cat a hash when don't know the type of the hash? #1049

mitra42 opened this issue Oct 23, 2017 · 9 comments
Assignees
Labels
kind/support A question or request for support kind/wontfix-migration-available status/ready Ready to be worked topic/docs Documentation

Comments

@mitra42
Copy link

mitra42 commented Oct 23, 2017

The scanario is where our archive.org gateway is adding a file to IPFS and we want to retrieve it on a browser from its multihash. Its easy to do from the IPFS gateways but seems impossible in JS with the current APIs ?

Lets Take two cases ...

A - 10.1001/jama.2009.1064 paper about Alzheimers 262438 bytes and
B: 10.1002/asjc.93 (paper about microscopes). 184324 bytes
I'm guessing the sharding size is 250k, which accounts for the different behavior.

All have been submitted using the HTTP API and returned hashes
A = Qmbzs7jhkBZuVixhnM3J3QhMrL6bcAoSYiRPZrdoX3DhzB
B= QmTds3bVoiM9pzfNJX6vT2ohxnezKPdaGHLd4Ptc4ACMLa

lets fetch them locally:
A: https://ipfs.dweb.me/ipfs/Qmbzs7jhkBZuVixhnM3J3QhMrL6bcAoSYiRPZrdoX3DhzB
B: https://ipfs.dweb.me/ipfs/QmTds3bVoiM9pzfNJX6vT2ohxnezKPdaGHLd4Ptc4ACMLa
All good

or via ipfs.io
A: https://ipfs.dweb.me/ipfs/Qmbzs7jhkBZuVixhnM3J3QhMrL6bcAoSYiRPZrdoX3DhzB
B: https://ipfs.dweb.me/ipfs/QmTds3bVoiM9pzfNJX6vT2ohxnezKPdaGHLd4Ptc4ACMLa
Also both work.

If we try and retrieve as bytes via block.get
ipfs.block.get(new CID("Qmbzs7jhkBZuVixhnM3J3QhMrL6bcAoSYiRPZrdoX3DhzB") retrieves 102 bytes which I presume is the IPLD which is not what we want.
ipfs.block.get(new CID("QmTds3bVoiM9pzfNJX6vT2ohxnezKPdaGHLd4Ptc4ACMLa") retrieves 184324 bytes which is the paper.

Lets move to file.get
ipfs.files.cat(new CID("Qmbzs7jhkBZuVixhnM3J3QhMrL6bcAoSYiRPZrdoX3DhzB") retrieves a stream that then generates events for a total of 262438 bytes GOOD
ipfs.files.cat(new CID("QmTds3bVoiM9pzfNJX6vT2ohxnezKPdaGHLd4Ptc4ACMLa") retrieves a stream but that stream generates NO events, just sits there - no data, end or error events

The problem seems to be that block.get or files.cat work depending on the hash, but I don't have a way to know which I've got. I think the files.cat behavior is particularly bad as there is no error, just a hung thread.

I've also seen a third behavior where Block.get just sits and hangs, which seems to correspond to cases where ipfs.io also hangs, later attempts on the same URLs seem to work, so I think this is just the case of it taking a really long to fetch, and I don't have a repeatable case yet

@mitra42
Copy link
Author

mitra42 commented Oct 26, 2017

Any thought on this ... is there really no deterministic way in Javascript to load a hash returned by the HTTP interface when you don't know if its to a DAG/IPLD or to the bytes ?

@daviddias
Copy link
Member

@mitra42 The Readable Streams in the browsers sometimes have a weird behavior and don't resume automatically. Have you tried calling .resume()?

We are doing it in the example to make sure it always works https://github.com/ipfs/js-ipfs/blob/master/examples/exchange-files-in-browser/public/js/app.js#L102

We are solving this by providing an alternative API with pull-streams and another one that buffers everything. Follow here: ipfs-inactive/interface-js-ipfs-core#162

@mitra42
Copy link
Author

mitra42 commented Oct 27, 2017

I can try that, but it doesnt sound like the problem.

Once the streams open they work just fine, the issue is that
IF I have a multihash returned by the HTTP api, for a file smaller than some size (I believe 250k) I MUST call the block.get API,
and if I have a multihash for a file larger than 250k I MUST call the Files API.

I'm presuming because the larger files are sharded and turned into IPLDs. The problem of course, is that I have no way that I can see (since all I have is the multihash) of knowing which kind of hash I have.

@mitra42
Copy link
Author

mitra42 commented Oct 27, 2017

@diasdavid - I added the resume.
Long file: files.cat("Qmbzs7jhkBZuVixhnM3J3QhMrL6bcAoSYiRPZrdoX3DhzB") it works with or without the resume.
Short file: files.cat("QmTds3bVoiM9pzfNJX6vT2ohxnezKPdaGHLd4Ptc4ACMLa").
Without resume(): Sits waiting - no events generated
With resume(): Immediate 'end' event, with no 'data' events, so 0 bytes.
That sounds like an bug, ie. it should always be EITHER data, or an 'error' event ?

I guess worst case I could try the files.cat and then if it returns with 0 bytes try the block.get - I've tried that and it works - but sounds like an awful kludge !

mitra42 added a commit to internetarchive/dweb-transport that referenced this issue Oct 27, 2017
@mitra42
Copy link
Author

mitra42 commented Oct 29, 2017

I think I'm onto something .... looks like there may be a difference between Node and Chrome on this. i built a simpler test case - stripped out my code, and just included IPFS. It works in Node, fails, in Chrome.

I've created three files which it wont let me attach here, so I've put them in my repository
:
test_ipfs.js - simple test Javascript
test_ipfs_bundled.js - bundled version of this (after update npm packages)
test_ipfs.html - skeleton to call test_ipfs_bundled.js
You can also run the bundled test directly at https://dweb.me/examples/test_ipfs.html

With the attached files, if you run "node test_ipfs.js" both the tests run - the long file version gets 3 chunks: 0, 262144, 294 bytes. ANd the short file gets a single chunk 184324 bytes.

if you open test_ipfs.html in Chrome, and look at the Console, the long file version gets 2 chunks, 262144, and 294 while the short file just ends immediately (no calls of "on")

I don't know the guts of IPFS well enough to fix, but now that I've stripped all my code out of this test and it still shows the difference, I'm pretty sure the problem is internal to IPFS.

@mitra42
Copy link
Author

mitra42 commented Oct 29, 2017

P.S. If you looked at them earlier, I just pushed a cleaner set of tests - the notes in the previous comment still apply.

@daviddias daviddias added status/deferred Conscious decision to pause or backlog kind/support A question or request for support labels Jan 25, 2018
@daviddias daviddias added status/ready Ready to be worked and removed status/deferred Conscious decision to pause or backlog labels Dec 9, 2018
@daviddias daviddias added the topic/docs Documentation label Oct 1, 2019
@jacobheun
Copy link
Contributor

Is this still a problem in the latest version of js-ipfs? There have been some significant updates in these systems.

@mitra42
Copy link
Author

mitra42 commented Jun 18, 2020

Sorry, no idea - we obviously built workarounds for this and other bugs when we were still using IPFS.

@whizzzkid
Copy link

js-ipfs is being deprecated in favor of Helia. You can #4336 and read the migration guide.

Please feel to reopen with any comments before 2023-06-05. We will do a final pass on reopened issues afterward (see #4336).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/support A question or request for support kind/wontfix-migration-available status/ready Ready to be worked topic/docs Documentation
Projects
No open projects
Status: Done
Development

No branches or pull requests

4 participants