You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 6, 2022. It is now read-only.
I’ve listed every feature I can find that has been considered for UnixFSv2 below. We discussed this in a short meeting (notes at the end of the document, recording posted soon) and the following action items surfaced:
@mikeal will kick off an issue in this repo to define and scope a UnixFSv2* we can ship on a reasonable timeline.
UnixFS vNext Reboot
For some time we’ve been directing issues, feature requests, and the general future of UnixFS at “UnixFSv2.” Since the size and scope of this future version were never locked down this has delayed improvements to UnixFSv1 and has failed to tie UnixFSv2 to a clear deadline and set of functionality.
The goal of this document is to describe the various issues and features we’d like to see in UnixFS and link to the historical discussions about those features. We can then use this document to discuss and prioritize each feature and find the best path to development whether it be improvements to UnixFSv1, an incremental UnixFSv2 on dag-cbor, or a bigger future version built on features that are still being researched.
This section briefly describes the difficulties and limitations of different development strategies which should help inform how to best approach solving each issues.
Improvements to UnixFSv1
One problem with improving UnixFSv1 is that every generic improvement we make cannot be leveraged by other applications outside of IPFS. For instance, the work we’ve done for directory sharding lives in UnixFSv1 and can’t be used for other generic sharding problems. This means that solving fairly generic problems via UnixFSv1 is less valuable and eventually duplicated effort.
The other problem is dag-pb, best summarized by @stebalian. In short, it’s very rigid and adding fields and other features are more cumbersome than dag-cbor.
UnixFSv2 on dag-cbor soonish
This development route solves the dag-pb related issues and makes some of the generic improvements leveragable outside of IPFS.
However, there is one major problem remaining: upgradability. All new features and improvements must exist and be relatively consistent between two versions of IPFS manipulating the same data. There is no good way to ensure this without future IPLD features that are still in the research phase.
This route of development is most problematic when tackling the “Reproducible Hashes” issue.
It should also be noted that, given we know that there is future un-developed IPLD work that we want to leverage for UnixFS we have a high degree of certainty that if we were to release this version of UnixFSv2 that we would still at some point in the future have another major version migration as well.
The actual development time for this would not be very long. @mikeal has already written draft implementations of several iterations of the UnixFSv2 spec in JS. A much more important factor to consider is the upgrade cost to IPFS users.
UnixFSv2 on “IPLD Future”
Most of the big problems facing UnixFS are problems facing IPLD generally. These problems are all being actively worked on in the form of engineering and research and at some future date can be leveraged for an ideal, future-proof (upgradable), version of UnixFS. However, when this will be available can’t be predicted with a high level of certainty.
Put simply, this is the ability for a given UnixFS implementation to look at an existing UnixFS encoded file and a file on a traditional file system and to reproduce the UnixFS encode identically.
This feature is relatively simple if there is no optionality and every version of IPFS is in perfect alignment. However, this is almost never the case.
IPFS has several options that can be used when encoding a file that alter the encode.
One path is to encode all options into the encoded version of the file. This would work as long as both versions of IPFS are in alignment, which means this can fail to produce identical hashes often in new upgrade scenarios. The only to way to completely guarantee reproducible hashing is to have a guarantee that the applications are also identical but this is very difficult without “IPLD Future.”
For small files and directories the benefits of de-duplication are often out-weighed by the cost of retrieving additional blocks.
There are also use cases, like websites, where it may be highly beneficial to inline certain data into the root block of the directory tree for faster early rendering.
It’s often necessary to paginate through large directories and the current implementations do not easily support this.
Question:Given that you can only paginate through a randomized ordering using the current sharding data structure, how useful would this be without ordered collections?
While I’ve heard people say on numerous occations that dag-pb performance is an issue (compared to dag-cbor) I can‘t find any good links or resources to what the real impact of this is.
TLDR;
I’ve listed every feature I can find that has been considered for UnixFSv2 below. We discussed this in a short meeting (notes at the end of the document, recording posted soon) and the following action items surfaced:
ipfs/spec
to add file metadata to UnixFSv1UnixFS vNext Reboot
For some time we’ve been directing issues, feature requests, and the general future of UnixFS at “UnixFSv2.” Since the size and scope of this future version were never locked down this has delayed improvements to UnixFSv1 and has failed to tie UnixFSv2 to a clear deadline and set of functionality.
The goal of this document is to describe the various issues and features we’d like to see in UnixFS and link to the historical discussions about those features. We can then use this document to discuss and prioritize each feature and find the best path to development whether it be improvements to UnixFSv1, an incremental UnixFSv2 on
dag-cbor
, or a bigger future version built on features that are still being researched.General Links
Development Targets
This section briefly describes the difficulties and limitations of different development strategies which should help inform how to best approach solving each issues.
Improvements to UnixFSv1
One problem with improving UnixFSv1 is that every generic improvement we make cannot be leveraged by other applications outside of IPFS. For instance, the work we’ve done for directory sharding lives in UnixFSv1 and can’t be used for other generic sharding problems. This means that solving fairly generic problems via UnixFSv1 is less valuable and eventually duplicated effort.
The other problem is
dag-pb
, best summarized by @stebalian. In short, it’s very rigid and adding fields and other features are more cumbersome thandag-cbor
.UnixFSv2 on
dag-cbor
soonishThis development route solves the
dag-pb
related issues and makes some of the generic improvements leveragable outside of IPFS.However, there is one major problem remaining: upgradability. All new features and improvements must exist and be relatively consistent between two versions of IPFS manipulating the same data. There is no good way to ensure this without future IPLD features that are still in the research phase.
This route of development is most problematic when tackling the “Reproducible Hashes” issue.
It should also be noted that, given we know that there is future un-developed IPLD work that we want to leverage for UnixFS we have a high degree of certainty that if we were to release this version of UnixFSv2 that we would still at some point in the future have another major version migration as well.
The actual development time for this would not be very long. @mikeal has already written draft implementations of several iterations of the UnixFSv2 spec in JS. A much more important factor to consider is the upgrade cost to IPFS users.
UnixFSv2 on “IPLD Future”
Most of the big problems facing UnixFS are problems facing IPLD generally. These problems are all being actively worked on in the form of engineering and research and at some future date can be leveraged for an ideal, future-proof (upgradable), version of UnixFS. However, when this will be available can’t be predicted with a high level of certainty.
Issues
Standard File/Directory metadata
Links
Arbitrary file metadata
The ability for users to add their own optional metadata to files could be very useful. However, doing arbitrary anything in
dag-pb
is problematic.Reproducible Hashing
Put simply, this is the ability for a given UnixFS implementation to look at an existing UnixFS encoded file and a file on a traditional file system and to reproduce the UnixFS encode identically.
This feature is relatively simple if there is no optionality and every version of IPFS is in perfect alignment. However, this is almost never the case.
IPFS has several options that can be used when encoding a file that alter the encode.
One path is to encode all options into the encoded version of the file. This would work as long as both versions of IPFS are in alignment, which means this can fail to produce identical hashes often in new upgrade scenarios. The only to way to completely guarantee reproducible hashing is to have a guarantee that the applications are also identical but this is very difficult without “IPLD Future.”
“Inline” files and directories
For small files and directories the benefits of de-duplication are often out-weighed by the cost of retrieving additional blocks.
There are also use cases, like websites, where it may be highly beneficial to inline certain data into the root block of the directory tree for faster early rendering.
Support for non-utf8 Filenames
Link
Seeking in large directories
It’s often necessary to paginate through large directories and the current implementations do not easily support this.
Question: Given that you can only paginate through a randomized ordering using the current sharding data structure, how useful would this be without ordered collections?
Symlinks
Link
Protobuf Performance
While I’ve heard people say on numerous occations that
dag-pb
performance is an issue (compared todag-cbor
) I can‘t find any good links or resources to what the real impact of this is.Miscellaneous
Meeting Notes: August 8th 2019
performance things
questions about external information we can feed into priorities
generation style versioning?
more worried about changes to things like rabin chunking than anything else
some kinds of data might be easier to maintain read of and maybe that's useful?
[][]byte
is easy, even if chunker to write it changedworth mentioning that dir list order in most existing filesystems isn't... really specified.
anyone wanna talk about attribs?
https://gist.github.com/warpfork/3948bd951e93c0f0b4e355d78b736f83
The text was updated successfully, but these errors were encountered: