Feature Request: Linked Dats/files #794

HughIsaacs2 · 2017-12-19T23:03:56Z

I've been thinking about this quite a bit (especially in the context of torrents), we need a feature that let's Dats declare other Dats or files from them that'll be needed to function for when we store them offline in our libraries.

Like how almost all Rotonde pages use the same JavaScript file ("dat://2714774d6c464dd12d5f8533e28ffafd79eec23ab20990b5ac14de940680a6fe/rotonde.js").

There should be a way to tell the browser that when the user adds this Dat site to the library to prompt them to also store another Dat or specific files from it, complete with version support to protect the host site from any breaking changes.

webdesserts · 2017-12-19T23:15:36Z

Related Discussion: #752

pfrazee · 2017-12-20T16:56:10Z

We'll keep this on our minds. I think subdats may end up being the solution for this but we'll see.

HughIsaacs2 · 2017-12-22T15:52:30Z

What are subdats?

Also I was thinking something like an additional resources list within dat.json

taravancil · 2017-12-22T17:25:28Z

Resource listing, for example in a manifest file, is actually a bit controversial (I know @Treora has thoughts about this). With <img>, <script>, <link>, etc., we already have a way to declare what resources a website/app depends on. While using a manifest file does allow you to do things like state which resources are mandatory and which are optional, it also introduces maintenance problems. Every time you update a <script> tag in your document you then have to update the manifest. Realistically, manifest files won’t be well-maintained, so you have to wonder if theyre worth using to solve this problem at all.

…

On Fri, Dec 22, 2017 at 09:52 Hugh Isaacs II ***@***.***> wrote: What are subdats? Also I was thinking something like an additional resources list within dat.json — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#794 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHO8QUa7BOYLFaRauvTk7kYybu2pZ3USks5tC9BBgaJpZM4RHrXl> .

taravancil · 2017-12-22T17:28:44Z

That said, this is an important problem to solve. We need to explore whether or not it makes sense to also cache external assets when you save an app to your library, or choose to help seed it. I’m just not sure that using a manifest is the right choice.

…

On Fri, Dec 22, 2017 at 11:25 Tara Vancil ***@***.***> wrote: Resource listing, for example in a manifest file, is actually a bit controversial (I know @Treora has thoughts about this). With <img>, <script>, <link>, etc., we already have a way to declare what resources a website/app depends on. While using a manifest file does allow you to do things like state which resources are mandatory and which are optional, it also introduces maintenance problems. Every time you update a <script> tag in your document you then have to update the manifest. Realistically, manifest files won’t be well-maintained, so you have to wonder if theyre worth using to solve this problem at all. On Fri, Dec 22, 2017 at 09:52 Hugh Isaacs II ***@***.***> wrote: > What are subdats? > > Also I was thinking something like an additional resources list within > dat.json > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#794 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AHO8QUa7BOYLFaRauvTk7kYybu2pZ3USks5tC9BBgaJpZM4RHrXl> > . >

Treora · 2017-12-23T00:21:42Z

Thanks for looping me in Tara; I do have thoughts but no answers however. All I will note right now is that having a deterministic way to tell which assets constitute a document/app/site seems generally desirable, and it might be best to try find more generic solutions than solving this problem in a way that is specific to beaker/dats.

I am not very knowledgeable about this, but here are some related specs and efforts I came across, that in some way list the resources required for offline consumption:

The manifest of an epub lists all resources it consists of.
The Web Publications group is discussing (e.g. here, here) the same question of how to list a publication's constituent resources, possibly building upon web application manifests.
Different but related are formats that don't only list the identifiers of assets, but also include their contents. The introduction of the web packages draft lists a some such formats.

Instead of adding an explicit manifest with resources, I would also consider the option of extracting the required resources from the document itself; e.g. all the srcs of imgs and scripts are required dependencies of an html document. See this discussion in the wpub group for a similar view. Although more complex to define and implement, a big bonus is that this way, content creators need not do anything special to make their documents usable offline, so it could work with existing websites.

webdesserts · 2017-12-23T00:28:54Z

I would also consider the option of extracting the required resources from the document itself; e.g. all the srcs of imgs and scripts are required dependencies of an html document.

I think the biggest problem with this is that these tags can be changed at runtime. For example, Rotonde right now injects its list of dependencies (img, script, and style tags) after you load the initial script tag on the portal's index.html. Also these are generally direct links to single files. For dependencies that function more as a database (a folder of json files), you would need to declare a dependency on a folderset.

Treora · 2017-12-23T10:47:58Z

You would indeed need to explicitly declare that these will (possibly) be required. The questions are (Q1) how, and (Q2) whether the 'statically' depended assets also have to be declared in that way. Two answer sets that seem natural to me: * (Q1) declare assets in a separate manifest file; (Q2) yes, everything goes in there. * (Q1) add <link rel="preload"> for each dynamic dependency; (Q2) no, these links will be extracted just like the src of an img.

Also these are generally direct links to single files. For dependencies that function more as a database (a folder of json files), you would need to declare a dependency on a folderset.

Another good point. One thought on this: if you would solve this using a syntax for folders, e.g. putting dat:1234ab/mydata/* in your manifest file (or better even without the asterisk), you would also be able to put it in a link tag. While http does not have a concept of folders, perhaps dat urls do?

millette · 2017-12-31T22:42:34Z

Subdats, and eventually a dat-cdn, who knows :-)

pfrazee · 2017-12-31T23:37:32Z

Most of my observations are already captured in #752. I'll just add some observations.

HTML elements are not the ideal place for this because any policy we'd want to create regarding "save to library" would operate at the site level, not the page level. So, we need a specific file that can tell us the policy information. JSON is much easier to parse, in that case, than HTML is, and you might not always have HTML in a site, but you will have the dat.json manifest.

The 'subdat' concept is an idea that gets brought up a lot. In unix terms, it's basically a symlink from one dat to another. In git terms, it's like a submodule. It's a way to map an archive to a subfolder of another archive. Eg:

/foo -> dat://ffff..ff/
/foo/index.html -> dat://ffff..ff/index.html

Subdats are interesting because they could solve a lot of problems at once -- one such problem being this question of caching dependent dats. We could do a policy where subdats are saved along with their parent dats.

I've been hesitant to 👍 subdats so far because they also add complexity to the core rules of dat, but I think there's a good chance we'll end up implementing them eventually. I just want to give us time to think about it.

taravancil · 2018-01-09T02:27:15Z

We should consider the Web Packaging standard in our discussions about this

https://github.com/WICG/webpackage

RangerMauve · 2018-05-01T21:21:53Z

Not sure if it was mentioned, but this sounds like a perfect extensionf or the existing dat.json manifest.

Maybe something as simple as

{
  "title": "Application Title",
  "dependencies": [
    "url": "dat://4483a2..66/",
    "url": "dat://4483a2..66/"
  ]
}

This could have potential for performence improvements by pre-fetching the dat metadata when the initial metadata is being downloaded.

Plus this is a dat-specific extension that could work for dats that weren't necessarily made to work with HTML or even a browser.

Gozala · 2018-06-05T00:05:43Z

I was under impression that Dat protocol also uses content addressablity via merkle trees under the hood (is it not) but it seems that unlike IPFS it is scoped to an individual archive.

Are there technical reasons (other than implementation effort it wolud take) why Dat could not make content addressablity across all of the Dat protocol ? It seems like it would resolve the issue and likely improve overall network performance.

In general I think supporting links at the protocol level say dat ln foo dat://ffff..ff/index.html would be a much better option than storing that elsewhere as all other dat clients would get support for this out of the box.

RangerMauve · 2018-06-05T13:58:35Z

The concepts page in the docs and the security and privacy page have a pretty good overview of why it is the way it is.

One of the main advantages of this is privacy. With IPFS where everything is content addresed, it's easy to globally see who has a given file. With Dat, you only know if somebody is looking for a specific dat. And if you don't know the URL, you don't know what's in it or who has it. If you're looking for a specific piece of content, it's impossible to know which dats contain in.

HughIsaacs2 · 2018-06-05T14:15:52Z

Just returning to say that dat.json now has a links object.

https://github.com/datprotocol/dat.json

It's likely that'll be used for this feature.

This opens dat.json up to the possibility of using the subresource, prefetch, dns-prefetch, preconnect, prerender and preload features in browsers, so those are options now.

I vote for "subresource" it was a non-standard addition to Chrome (removed in Chrome 50) and while the term doesn't fit the HTTP web use case, I think it fits the Dat web well. Plus many developers are already familiar with using it and it's use in Dat sites wouldn't be far off from its original intent in Chrome (only problem I can think of right now is confusion with the subresource-integrity feature).

EDIT: Also we should lock this feature down to just to specific files included in Dats not entire Dats as I can definitely see this being a hard drive space problem in the future. We have to avoid the situation where someone new to all of this loads terabytes of files onto many computers just because they wanted to use X amount of Dat based CDNs.

Treora · 2020-05-28T19:32:31Z

@pfrazee sorry for necroposting, but just being curious if closing this issue means the idea faded off the radar, or it may have become irrelevant due to other developments? Might you have a pointer to discussions/publications reflecting current state of play, if there are any?

You said above “I think subdats may end up being the solution for this but we'll see.”. And indeed, with the one-way mounts now having been introduced in Hyperdrive 10, I suppose one could mount all external resources’ drives and only use relative paths to point at them (though I guess you would have to mount their whole drives..). Does this solve the issue in your view?

Treora · 2020-05-28T19:32:49Z

PS Also related seems this recent discussion in dat-ecosystem/comm-comm#134 about a format-agnostic approach to linked dats: “a generic seeding service should not need any data structure specific code to know how to seed the data.” (source)

pfrazee · 2020-05-29T19:05:40Z

@Treora I do think mounts are our answer for Beaker. Ultimately for commanding any remote to cohost data, I think the API will be based on hypercores, so then the client commanding the remote needs to be data-structure aware

serapath · 2020-06-12T03:15:02Z

@Treora thank you for linking the comm-comm issue and the source link.

If you want to discuss further I'll answer here datdotorg/datdot-research#17 (comment)
I think there are multiple approaches with different pros/cons and I think a standard is needed, not only for key rotation/replacement/revocation, but also for dependencies and having a custom solution per app/protocol/datastructure is bad.
Also it's different when people control domains and want to change the content for one compared to provide proof they have the writekey to any given archive.
Yes - the latter can always be proven by challenging somebody to add a specific message, but why not avoid that by having a proper standard.

There are many ways why feeds need to be linked parent to dependant to dependencies, dependencies to dependant, domain to content, feed to author, related feeds amongst each other and I think it would be bad to have everyone (app/protocol/datastructure) make those things up instead of following a general standard

pfrazee added discussion feature request Suggested change that's under consideration but not yet on the roadmap labels Dec 20, 2017

pfrazee closed this as completed May 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Linked Dats/files #794

Feature Request: Linked Dats/files #794

HughIsaacs2 commented Dec 19, 2017

webdesserts commented Dec 19, 2017

pfrazee commented Dec 20, 2017

HughIsaacs2 commented Dec 22, 2017

taravancil commented Dec 22, 2017 via email

taravancil commented Dec 22, 2017 via email

Treora commented Dec 23, 2017

webdesserts commented Dec 23, 2017

Treora commented Dec 23, 2017 via email

millette commented Dec 31, 2017

pfrazee commented Dec 31, 2017

taravancil commented Jan 9, 2018

RangerMauve commented May 1, 2018

Gozala commented Jun 5, 2018

RangerMauve commented Jun 5, 2018

HughIsaacs2 commented Jun 5, 2018 •

edited

Loading

Treora commented May 28, 2020

Treora commented May 28, 2020

pfrazee commented May 29, 2020

serapath commented Jun 12, 2020 •

edited

Loading

Feature Request: Linked Dats/files #794

Feature Request: Linked Dats/files #794

Comments

HughIsaacs2 commented Dec 19, 2017

webdesserts commented Dec 19, 2017

pfrazee commented Dec 20, 2017

HughIsaacs2 commented Dec 22, 2017

taravancil commented Dec 22, 2017 via email

taravancil commented Dec 22, 2017 via email

Treora commented Dec 23, 2017

webdesserts commented Dec 23, 2017

Treora commented Dec 23, 2017 via email

millette commented Dec 31, 2017

pfrazee commented Dec 31, 2017

taravancil commented Jan 9, 2018

RangerMauve commented May 1, 2018

Gozala commented Jun 5, 2018

RangerMauve commented Jun 5, 2018

HughIsaacs2 commented Jun 5, 2018 • edited Loading

Treora commented May 28, 2020

Treora commented May 28, 2020

pfrazee commented May 29, 2020

serapath commented Jun 12, 2020 • edited Loading

HughIsaacs2 commented Jun 5, 2018 •

edited

Loading

serapath commented Jun 12, 2020 •

edited

Loading