Skip to content
This repository has been archived by the owner on Dec 6, 2022. It is now read-only.

Inlineing Small Files? #4

Closed
kevina opened this issue Nov 3, 2017 · 4 comments
Closed

Inlineing Small Files? #4

kevina opened this issue Nov 3, 2017 · 4 comments

Comments

@kevina
Copy link
Contributor

kevina commented Nov 3, 2017

@Stebalien mentioned in #1 suggested the inlining of small files and directories. This issue is to discuss if this.

@kevina
Copy link
Contributor Author

kevina commented Nov 3, 2017

The major question I have is should the contents be in place of the link?

If both the link and the contents are included this creates the possibility that the file's content is only available in the directory and when someone tried to retrieve the file using its hash it will fail.

If the link is not included then this is a special case that needs to handled in things like directory listing.

@Stebalien
Copy link

Inline Tiny Objects

One solution is to embed files at most 32 bytes in length in the link using the identity multihash (maybe 64 bytes if we're willing to make some links a bit longer). Honestly, we should be doing this as a part of IPLD anyways.

If we do 64 (or even 128), we may even be able inline deep and narrow directory trees.

Inline Small Files

If we want to support inlining larger files, then yeah, we'll have issues with direct links to files. We can create files on the fly as needed but that may hurt retrievability (if I auto-generate a file on the fly so I can link to it, nobody else will have that file).

On the other hand, if we assume that people will generally put files in directories and not directly link to them, then this won't really be an issue. This is not how things work today but it may actually be a better approach.

Application Layer Problem

An alternative is to leave this optimization up to the application layer. That is,

  1. Use DAGSwap to send objects. We're already planning on doing this.
  2. More efficient datastores. Technically, we don't need to store our filesystems in their encoded IPDL format as long as we can seamlessly (and quickly) transition back and forth.

Given this, it may be worth it to just punt on this issue and keep UnixFS simple (although I would still embed small tiny IPLD objects).

@warpfork
Copy link

The question kevina raised about retrieving the file from the hash seems like the crux to me.

If even directory objects are published with small files inlined, the files should still be published and replicated with the same attention as if they were never published inline. Inline publishing is a nicety done by a publisher for the latency reduction of some future retriever. It goes without saying that it MUST be transparent to the content hash of parent objects; it should also be precisely as much of a MUST that it be transparent to the availability.

And to me that hints that inlining small objects is not a concern specific to filesystem representations in IPLD. Should this be still be an open issue on this repo?

@rvagg
Copy link
Member

rvagg commented Dec 6, 2022

closing for archival

@rvagg rvagg closed this as completed Dec 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants