Allow users to pin a URI-M or all captures for a URI-R from their local replay UI #201

machawk1 · 2017-06-27T15:49:36Z

Related to #60.

When replaying a URI-M whose header and payload are accessed through another node via IPFS, the header and payload will eventually get garbage collected from the local system per https://discuss.ipfs.io/t/how-are-conflicts-handled/469/3 . Provide a UI element to allow a user to explicitly indicate that they wish to retain the URI-M (i.e., the payload and headers associated with the URI-M) on their local system. This can be accomplished by ipfs pinning.

Doing so will allow a user to accumulate captures locally and will facilitate collaboration of arbitrary sets of captures.

The text was updated successfully, but these errors were encountered:

ibnesayeed · 2017-06-27T23:56:55Z

We need to understand what do we mean by "user" here. Pining functionality would work in the context of the IPFS service our API is connecting to. For replay however, we don't necessarily need to connect to a service, instead we can just query the global network to resolve the hash for us. By client, if we mean end users, then that scenario would be helpful if we were using client-side JS-based implementation.

machawk1 · 2017-06-28T00:32:30Z

However we fetch the content, through ipwb replay or the js ipfs implementation, it would be useful to allow a user to pin the contents* for persistent offline viewing and allow them to view and share it without needing to reconnect to other peers. We may also want to allow the user that indexes the WARC and pushes to IPFS (via the ipwb indexer) to recommend pinning in the index -- for this I am think on an individual or very small group basis.

* The locally stored dereferenced IPFS hash contents. Disregard the sloppy nomenclature.

ibnesayeed · 2017-06-28T00:45:55Z

I am still not sure on which layer you are focusing on. Some of these things can be achieved by utilizing client-side caching too.

machawk1 · 2017-06-28T00:50:22Z

This is dealing with the client side -- the user of the replay system. The discussion above was a tangent of allowing this functionality from the indexer's perspective.

Per the ticket title, the goal is to add a mean of allowing a user to explicitly pin the payload retrieved from IPFS via the IPWB replay UI. I'd rather make this independent on the client, i.e., the browser -- the ticket essentially amounts to enabling client-side caching using an agent without regard to the user-agent user in subsequent replay.

ibnesayeed · 2017-06-28T00:59:14Z

Replay users in general should have no business to tell the replay server to pin or not to pin the content on the IPFS service it is primarily connected to (which is not an essential piece for replay). However, the replay server itself might decide to ask it's corresponding IPFS server (if there is one) to pin more frequently resolved (or all) content locally for faster successive fetches. The ultimate client, that is the browser, can utilize regular caching of the combined response, which will only be useful if the same client is requesting the same URI-M multiple times.

machawk1 · 2017-06-28T01:07:18Z

Ok, you're probably right in the scenario where the user viewing the replay web UI -- that user has no business dictating what's pinned with the potentially remote ipfs daemon.

What if there was something akin to "pin locally" that could instruct a local daemon to pin what they're viewing, which more often than not will probably be the same local instance?

The idea of browser-based caching is a few steps off, still, as we have yet to really resolve the impending issues of remotely accessing a ipwb replay instance (#146).

The crux of his ticket is a single user, reading in a CDXJ shared with them (or locally generated), ensuring that the content of the hashes they push from WARCs are pinned -- a sort of base case.

ibnesayeed · 2017-06-28T01:25:40Z

If the replay is connected to a local IPFS instance (or controlled by the same body), then we have a couple potential options to ensure availability of the content when resolved. The replay can ask to pin every resource when it is requested as pinning is an idempotent action so duplicate requests will cause no harm. If when creating index the content was pushed to the same IPFS node which is linked to the replay, the content will already be there, no need to perform explicit pinning. However, if the index was shared/moved elsewhere and/or the replay is connected to a different IPFS node, then we can have a separate process that can be run one time (after every index change) to pull all the references in the index to the local IPFS instance and pin them. This can be an independent process which is not tied to the replay system.

machawk1 · 2017-06-28T17:02:51Z

the content will already be there, no need to perform explicit pinning

From my understanding, it may be there now but potentially not in the future if garbage collected.

As a related note, a user will likely wish to also pin all embedded resources. There exists an opportunity to pin resources (if replay is running locally) as they are fetched. This would allow a subset of the CDXJ entries to be locally pinned instead of requiring everything listed in the index.

ibnesayeed · 2017-06-28T17:15:37Z

This would allow a subset of the CDXJ entries to be locally pinned instead of requiring everything listed in the index.

You are mixing something up here. At the moment, losing entries from the CDX would be disastrous, even if those entries are for the embedded resources, because from the replay perspective, they are all independent resources, it is the browser that put them together to compose the page they way it looks. Pinning resources locally and losing entries from the CDX would make them non-discoverable. Pinning, in my opinion can be done separately as a batch process independent of the replay system. One can grab the list of hashes extracted from the index or from the access log (or from any other source for that matter) and drain them down locally and pin them. This should be done at the node where the IPFS is running, and not necessarily couple with the replay. I think, separation of concerns is very important here.

machawk1 · 2017-06-28T17:23:17Z

I think, separation of concerns is very important here.

I agree and batch pinning is not what this ticket is about.

losing entries from the CDX would be disastrous

Entries will not be lost, just a subset will be pinned on an interactive basis.

machawk1 added enhancement ipwb replay labels Jun 27, 2017

machawk1 added this to the 2.0 (Extended more featureful implementation) milestone Jun 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow users to pin a URI-M or all captures for a URI-R from their local replay UI #201

Allow users to pin a URI-M or all captures for a URI-R from their local replay UI #201

machawk1 commented Jun 27, 2017

ibnesayeed commented Jun 27, 2017

machawk1 commented Jun 28, 2017 •

edited

Loading

ibnesayeed commented Jun 28, 2017

machawk1 commented Jun 28, 2017

ibnesayeed commented Jun 28, 2017

machawk1 commented Jun 28, 2017

ibnesayeed commented Jun 28, 2017

machawk1 commented Jun 28, 2017

ibnesayeed commented Jun 28, 2017

machawk1 commented Jun 28, 2017

Allow users to pin a URI-M or all captures for a URI-R from their local replay UI #201

Allow users to pin a URI-M or all captures for a URI-R from their local replay UI #201

Comments

machawk1 commented Jun 27, 2017

ibnesayeed commented Jun 27, 2017

machawk1 commented Jun 28, 2017 • edited Loading

ibnesayeed commented Jun 28, 2017

machawk1 commented Jun 28, 2017

ibnesayeed commented Jun 28, 2017

machawk1 commented Jun 28, 2017

ibnesayeed commented Jun 28, 2017

machawk1 commented Jun 28, 2017

ibnesayeed commented Jun 28, 2017

machawk1 commented Jun 28, 2017

machawk1 commented Jun 28, 2017 •

edited

Loading