Skip to content
This repository has been archived by the owner on Dec 27, 2022. It is now read-only.

Feature Request: Linked Dats/files #794

Closed
HughIsaacs2 opened this issue Dec 19, 2017 · 19 comments
Closed

Feature Request: Linked Dats/files #794

HughIsaacs2 opened this issue Dec 19, 2017 · 19 comments
Labels
discussion feature request Suggested change that's under consideration but not yet on the roadmap

Comments

@HughIsaacs2
Copy link
Contributor

I've been thinking about this quite a bit (especially in the context of torrents), we need a feature that let's Dats declare other Dats or files from them that'll be needed to function for when we store them offline in our libraries.

Like how almost all Rotonde pages use the same JavaScript file ("dat://2714774d6c464dd12d5f8533e28ffafd79eec23ab20990b5ac14de940680a6fe/rotonde.js").

There should be a way to tell the browser that when the user adds this Dat site to the library to prompt them to also store another Dat or specific files from it, complete with version support to protect the host site from any breaking changes.

@webdesserts
Copy link
Contributor

Related Discussion: #752

@pfrazee pfrazee added discussion feature request Suggested change that's under consideration but not yet on the roadmap labels Dec 20, 2017
@pfrazee
Copy link
Member

pfrazee commented Dec 20, 2017

We'll keep this on our minds. I think subdats may end up being the solution for this but we'll see.

@HughIsaacs2
Copy link
Contributor Author

What are subdats?

Also I was thinking something like an additional resources list within dat.json

@taravancil
Copy link
Contributor

taravancil commented Dec 22, 2017 via email

@taravancil
Copy link
Contributor

taravancil commented Dec 22, 2017 via email

@Treora
Copy link

Treora commented Dec 23, 2017

Thanks for looping me in Tara; I do have thoughts but no answers however. All I will note right now is that having a deterministic way to tell which assets constitute a document/app/site seems generally desirable, and it might be best to try find more generic solutions than solving this problem in a way that is specific to beaker/dats.

I am not very knowledgeable about this, but here are some related specs and efforts I came across, that in some way list the resources required for offline consumption:

Instead of adding an explicit manifest with resources, I would also consider the option of extracting the required resources from the document itself; e.g. all the srcs of imgs and scripts are required dependencies of an html document. See this discussion in the wpub group for a similar view. Although more complex to define and implement, a big bonus is that this way, content creators need not do anything special to make their documents usable offline, so it could work with existing websites.

@webdesserts
Copy link
Contributor

I would also consider the option of extracting the required resources from the document itself; e.g. all the srcs of imgs and scripts are required dependencies of an html document.

I think the biggest problem with this is that these tags can be changed at runtime. For example, Rotonde right now injects its list of dependencies (img, script, and style tags) after you load the initial script tag on the portal's index.html. Also these are generally direct links to single files. For dependencies that function more as a database (a folder of json files), you would need to declare a dependency on a folderset.

@Treora
Copy link

Treora commented Dec 23, 2017 via email

@millette
Copy link

Subdats, and eventually a dat-cdn, who knows :-)

@pfrazee
Copy link
Member

pfrazee commented Dec 31, 2017

Most of my observations are already captured in #752. I'll just add some observations.

HTML elements are not the ideal place for this because any policy we'd want to create regarding "save to library" would operate at the site level, not the page level. So, we need a specific file that can tell us the policy information. JSON is much easier to parse, in that case, than HTML is, and you might not always have HTML in a site, but you will have the dat.json manifest.

The 'subdat' concept is an idea that gets brought up a lot. In unix terms, it's basically a symlink from one dat to another. In git terms, it's like a submodule. It's a way to map an archive to a subfolder of another archive. Eg:

/foo -> dat://ffff..ff/
/foo/index.html -> dat://ffff..ff/index.html

Subdats are interesting because they could solve a lot of problems at once -- one such problem being this question of caching dependent dats. We could do a policy where subdats are saved along with their parent dats.

I've been hesitant to 👍 subdats so far because they also add complexity to the core rules of dat, but I think there's a good chance we'll end up implementing them eventually. I just want to give us time to think about it.

@taravancil
Copy link
Contributor

We should consider the Web Packaging standard in our discussions about this

https://github.com/WICG/webpackage

@RangerMauve
Copy link

Not sure if it was mentioned, but this sounds like a perfect extensionf or the existing dat.json manifest.

Maybe something as simple as

{
  "title": "Application Title",
  "dependencies": [
    "url": "dat://4483a2..66/",
    "url": "dat://4483a2..66/"
  ]
}

This could have potential for performence improvements by pre-fetching the dat metadata when the initial metadata is being downloaded.

Plus this is a dat-specific extension that could work for dats that weren't necessarily made to work with HTML or even a browser.

@Gozala
Copy link

Gozala commented Jun 5, 2018

I was under impression that Dat protocol also uses content addressablity via merkle trees under the hood (is it not) but it seems that unlike IPFS it is scoped to an individual archive.

Are there technical reasons (other than implementation effort it wolud take) why Dat could not make content addressablity across all of the Dat protocol ? It seems like it would resolve the issue and likely improve overall network performance.

In general I think supporting links at the protocol level say dat ln foo dat://ffff..ff/index.html would be a much better option than storing that elsewhere as all other dat clients would get support for this out of the box.

@RangerMauve
Copy link

The concepts page in the docs and the security and privacy page have a pretty good overview of why it is the way it is.

One of the main advantages of this is privacy. With IPFS where everything is content addresed, it's easy to globally see who has a given file. With Dat, you only know if somebody is looking for a specific dat. And if you don't know the URL, you don't know what's in it or who has it. If you're looking for a specific piece of content, it's impossible to know which dats contain in.

@HughIsaacs2
Copy link
Contributor Author

HughIsaacs2 commented Jun 5, 2018

Just returning to say that dat.json now has a links object.

https://github.com/datprotocol/dat.json

It's likely that'll be used for this feature.

This opens dat.json up to the possibility of using the subresource, prefetch, dns-prefetch, preconnect, prerender and preload features in browsers, so those are options now.

I vote for "subresource" it was a non-standard addition to Chrome (removed in Chrome 50) and while the term doesn't fit the HTTP web use case, I think it fits the Dat web well. Plus many developers are already familiar with using it and it's use in Dat sites wouldn't be far off from its original intent in Chrome (only problem I can think of right now is confusion with the subresource-integrity feature).

EDIT: Also we should lock this feature down to just to specific files included in Dats not entire Dats as I can definitely see this being a hard drive space problem in the future. We have to avoid the situation where someone new to all of this loads terabytes of files onto many computers just because they wanted to use X amount of Dat based CDNs.

@pfrazee pfrazee closed this as completed May 6, 2020
@Treora
Copy link

Treora commented May 28, 2020

@pfrazee sorry for necroposting, but just being curious if closing this issue means the idea faded off the radar, or it may have become irrelevant due to other developments? Might you have a pointer to discussions/publications reflecting current state of play, if there are any?

You said above “I think subdats may end up being the solution for this but we'll see.”. And indeed, with the one-way mounts now having been introduced in Hyperdrive 10, I suppose one could mount all external resources’ drives and only use relative paths to point at them (though I guess you would have to mount their whole drives..). Does this solve the issue in your view?

@Treora
Copy link

Treora commented May 28, 2020

PS Also related seems this recent discussion in dat-ecosystem/comm-comm#134 about a format-agnostic approach to linked dats: “a generic seeding service should not need any data structure specific code to know how to seed the data.” (source)

@pfrazee
Copy link
Member

pfrazee commented May 29, 2020

@Treora I do think mounts are our answer for Beaker. Ultimately for commanding any remote to cohost data, I think the API will be based on hypercores, so then the client commanding the remote needs to be data-structure aware

@serapath
Copy link

serapath commented Jun 12, 2020

@Treora thank you for linking the comm-comm issue and the source link.

If you want to discuss further I'll answer here datdotorg/datdot-research#17 (comment)
I think there are multiple approaches with different pros/cons and I think a standard is needed, not only for key rotation/replacement/revocation, but also for dependencies and having a custom solution per app/protocol/datastructure is bad.
Also it's different when people control domains and want to change the content for one compared to provide proof they have the writekey to any given archive.
Yes - the latter can always be proven by challenging somebody to add a specific message, but why not avoid that by having a proper standard.

There are many ways why feeds need to be linked parent to dependant to dependencies, dependencies to dependant, domain to content, feed to author, related feeds amongst each other and I think it would be bad to have everyone (app/protocol/datastructure) make those things up instead of following a general standard

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
discussion feature request Suggested change that's under consideration but not yet on the roadmap
Projects
None yet
Development

No branches or pull requests

9 participants