Way to get CIDs of intermediate objects when querying with a path #8526

stbrody · 2021-10-26T19:55:26Z

Checklist

My issue is specific & actionable.
I am not suggesting a protocol enhancement.
I have searched on the issue tracker for my issue.

Description

Summary:
dag.get with a path argument should be able to return an array of CIDs, representing all the intermediate IPFS objects it traversed along the path to eventually reach the object it ultimately returns. That would enable much more efficient sequential iteration over complex IPLD data structures.

Use case:
Imagine you are trying to do an in-order traversal over a tree structure encoded in IPLD. From knowing the number of elements in the tree (which could be stored in the root of the tree) and how many children each intermediate node has, you can deterministically calculate the depth of the tree. That would allow you to build a path selector specifying the path from the root of the tree to the left-most leaf node fairly easily, which could then be passed to ipfs.dag.get to get the data from the first leaf node in the tree. But now you want to fetch the second leaf node. You could once again deterministically build a path selector from the root to the second leaf node, but that would have the path once again running from the root, which if the tree is large may involve traversing many intermediate nodes multiple times. Instead, ideally you'd like to already have the CID of the parent node of the first leaf node, and then be able to issue a new query with just the path from that parent node to its second child to get the second leaf node of the overall tree. The problem is that dag.get with the path to the first leaf node will only return the data of the leaf node, not any information about the intermediate nodes it passed through to get there, so you have no way to know the CID of its parent. If the dag.get call returned not just the data from the first leaf node, but also an array of the CIDs it passed through to get there when traversing the path, then you'd be able to intelligently pop CIDs off the back of the resulting list to move back up the tree, and issue new dag queries with new paths to other children nodes as you continue to iterate over the tree structure.

The text was updated successfully, but these errors were encountered:

welcome · 2021-10-26T19:55:27Z

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

"Priority" labels will show how urgent this is for the team.
"Status" labels will show if this is ready to be worked on, blocked, or in progress.
"Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

stbrody · 2021-10-26T19:56:47Z

An alternative to returning an array of CIDs that were traversed over that would accomplish the same thing would be to instead return a CAR file containing the CIDs AND data for every ipfs object along the path that was given in the initial query.

aschmahmann · 2021-12-03T16:24:57Z

@stbrody is this a feature request for something like ipfs resolve --give-cids /ipfs/path... that outputs all the CIDs along the path, or would just doing #8239 be enough?

BigLep · 2022-01-07T16:24:45Z

2022-01-07 discussion: this would be be common usecase-specific form of #8239 . We'd likely implement this specific usecase using the more generic form of being able to fetch for a specific selector.

@stbrody : do you have a sense from Ceramic's perspective as to which of these two is higher priority?

Also, this isn't something the go-ipfs mainteners expect to getting to in the short term but could certainly direct others into where/how to solve.

I'm marking this as blocked until #8239 is handled.

stbrody · 2022-01-10T19:54:28Z

I suppose if #8239 were done in such a way that we could get the entire tree structure loaded onto our local ipfs node, then doing multiple iterative calls over the same paths in the tree wouldn't be nearly as bad. You'd still wind up re-processing the same path multiple times, but with data that's all local so it will be much more performant.

My sense is that both this and #8239 are valuable in different ways, but I'd imagine this one would likely be easier to implement. And there are cases where this ticket actually helps more than #8239 does. Like if you're doing an in-order traversal over part of a tree structure. If you're only going to wind up processing some part of the tree, then pulling the whole tree to your local node is overkill, which can be especially bad if the tree is large. It would also be bad if you had to wait for all the data matching the selector (in this case the whole tree) had to be loaded locally before you can get the result from the first item you want to process.

is this a feature request for something like ipfs resolve --give-cids /ipfs/path... that outputs all the CIDs along the path

Yes, that's more or less what I'm imagining, though I'd want it exposed via the http-client.

do you have a sense from Ceramic's perspective as to which of these two is higher priority?

I'll defer to @oed on this one

stbrody · 2022-01-10T19:56:00Z

I'm marking this as blocked until #8239 is handled.

FWIW, while I do see these two as related in the use cases they help improve, technically I think they're probably fairly independent.

BigLep · 2022-03-18T15:56:22Z

2022-03-18 conversations: maintainer priority and plan of record is:

(in progress) Add selector support in gateways (https://github.com/ipfs/go-ipfs/issues/8769 )
(easy followup) Add support for selectors in dag-export (dag API should let a user ask for the daemon to fetch data matching a selector #8239 )

General:
Paths are easy
Selectors are harder to use

As people have been asking about selectors, we're going to add them to more APIs but we don't want to overload users with the more complicated selector syntax.

We're treating paths and selectors separately (resolve the path and then apply the selector).
We're starting with CAR files because those users are already more "advanced". It's also possible to write a path as a selector (in most cases).

BigLep · 2022-06-03T15:07:23Z

2022-06-03 conversation: this is still blocked per the discussion above. There will be a relevant gateway selector spec in the next month.

stbrody added the kind/feature A new feature label Oct 26, 2021

stbrody mentioned this issue Oct 26, 2021

dag API should let a user ask for the daemon to fetch data matching a selector #8239

Open

aschmahmann added exp/intermediate Prior experience is likely helpful exp/expert Having worked on the specific codebase is important P2 Medium: Good to have, but can wait until someone steps up and removed exp/intermediate Prior experience is likely helpful labels Nov 19, 2021

BigLep added exp/intermediate Prior experience is likely helpful help wanted Seeking public contribution on this issue status/blocked Unable to be worked further until needs are met and removed exp/expert Having worked on the specific codebase is important labels Jan 7, 2022

BigLep added this to the Best Effort Track milestone Mar 3, 2022

BigLep added this to IPFS Shipyard Team Jun 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Way to get CIDs of intermediate objects when querying with a path #8526

Way to get CIDs of intermediate objects when querying with a path #8526

stbrody commented Oct 26, 2021

welcome bot commented Oct 26, 2021

stbrody commented Oct 26, 2021 •

edited

Loading

aschmahmann commented Dec 3, 2021

BigLep commented Jan 7, 2022

stbrody commented Jan 10, 2022

stbrody commented Jan 10, 2022

BigLep commented Mar 18, 2022

BigLep commented Jun 3, 2022

Way to get CIDs of intermediate objects when querying with a path #8526

Way to get CIDs of intermediate objects when querying with a path #8526

Comments

stbrody commented Oct 26, 2021

Checklist

Description

welcome bot commented Oct 26, 2021

stbrody commented Oct 26, 2021 • edited Loading

aschmahmann commented Dec 3, 2021

BigLep commented Jan 7, 2022

stbrody commented Jan 10, 2022

stbrody commented Jan 10, 2022

BigLep commented Mar 18, 2022

BigLep commented Jun 3, 2022

stbrody commented Oct 26, 2021 •

edited

Loading