Skip to content
This repository has been archived by the owner on Dec 6, 2022. It is now read-only.

"Size" fields to include #9

Closed
kevina opened this issue May 2, 2018 · 7 comments
Closed

"Size" fields to include #9

kevina opened this issue May 2, 2018 · 7 comments

Comments

@kevina
Copy link
Contributor

kevina commented May 2, 2018

Th old unixfs has two sizes the file size which and the total size of the protocol-wrapped objects (the physical size). The same sizes where used for directory entries except perhaps not for sharded directories (see #7).

The question is are both sizes still useful to include? Based on some discussion on #2 I think maybe we should simplify things and just have the file size. If the any use for the physical size even used anywhere?

In addition, the file size isn't really useful for directories. A better size to include would be a count on the number of entries. This count will also allow seeking of shared directories (see #6).

Thoughts?

@warpfork
Copy link

warpfork commented May 9, 2018

+1 to both file size only (not "physical" size) and +1 to dirent as a count.

I've always been completely bewildered by ls reporting dirs as 4k (the "physical" size -- which I've precisely never cared about). Can't imagine there's much use for physical size reports on files either; the only context I can imagine is if generating a report on the overall physical size use of an IPFS repo, and that would need to report non-unixfs objects as well, making a special inclusion of that in unixfs objects redundant at best.

@Stebalien
Copy link

I would still include both the physical size and the file size. I would not include directory sizes.

  • The physical size is useful for calculating file download progress, space usage (well, modulo reduplication), etc.
  • The file size is useful for seeking, etc.

@mikeal
Copy link
Contributor

mikeal commented Sep 21, 2018

I'm trying to understand the use case for knowing the size of all the nodes and not just the data.

I wouldn't want to trust this kind of information for managing quotas since it's not a guarantee.

For space usage it's also not entirely accurate. There's no guarantee that because I have one of these blocks that I've succeeded in also storing the rest of the graph.

Have the content size of each file, and the cumulative size of all the files in each directory, is enough to shows download progress.

If we adopt the file-data format we could even get away with not including the size attribute since you can easily figure this out by looking at the data array.

@achingbrain
Copy link
Member

Can the directory size as the sum of all the directory entry sizes be included as well?

In v1 We can't calculate directory sizes without traversing all children of the node as it may be a HAMT shard so is out of the question, but we can't create the directory unless we know which files are in it so we do have the directory size at creation time. Seems weird to throw that information away.

I would not include directory sizes.

@Stebalien could you expand on why not?

@mikeal
Copy link
Contributor

mikeal commented Jun 5, 2019

@achingbrain that’s actually how it works now :) https://github.com/ipfs/unixfs-v2/blob/master/SPEC.md#ipld-dir

The size of a directory is the sum of the size of all the size properties in data, so that includes the size of files and sub-directories.

However, this is the cumulative size of file “data” and not the size of the blocks. We got rid of that information because it doesn’t really work well in this new model where the block boundaries are transparent.

Also, as @warpfork reminded me today, we need to call out in the spec that while implementations of unixfsv2 MUST encode this accurate, readers of this data should consider the property advisory since there is no way to guarantee it is accurate without parsing the entire graph.

@achingbrain
Copy link
Member

Hooray!

V1 DAGLink sizes have been similarly untrustworthy since forever.

@rvagg
Copy link
Member

rvagg commented Dec 6, 2022

closing for archival

@rvagg rvagg closed this as completed Dec 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants