Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include store path exact spec in the docs #9295

Merged
merged 12 commits into from
Feb 12, 2024
1 change: 1 addition & 0 deletions doc/manual/src/SUMMARY.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@
- [Architecture and Design](architecture/architecture.md)
- [Protocols](protocols/index.md)
- [Serving Tarball Flakes](protocols/tarball-fetcher.md)
- [Exact Store Path Specification](protocols/store-path.md)
Ericson2314 marked this conversation as resolved.
Show resolved Hide resolved
- [Derivation "ATerm" file format](protocols/derivation-aterm.md)
- [Glossary](glossary.md)
- [Contributing](contributing/index.md)
Expand Down
104 changes: 104 additions & 0 deletions doc/manual/src/protocols/store-path.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Complete Store Path Calculation

This is the complete specification for how store paths are calculated.

Regular users do *not* need to know this information --- store paths can be treated as black boxes computed from the properties of the store objects they refer to.
But for those interested in exactly how Nix works, e.g. if they are reimplementing it, this information can be useful.

```bnf
<realized-path> ::= <store-dir>/<digest>-<name>
```
where

- `<digest>` = base-32 representation of the first 160 bits of a [SHA-256] hash of `<pre>`
Ericson2314 marked this conversation as resolved.
Show resolved Hide resolved

Th is :the hash part of the store name

- `<pre>` = the string `<type>:sha256:<inner-digest>:<store>:<name>`;

Note that it includes the location of the store as well as the name to make sure that changes to either of those are reflected in the hash
(e.g. you won't get `/nix/store/<digest>-name1` and `/nix/store/<digest>-name2`, or `/gnu/store/<digest>-name1`, with equal hash parts).

- `<name>` = the name of the store object.

- `<store>` = the [store directory](@docroot@/store/store-path.md#store-directory)

- `<type>` = one of:

- ```bnf
text:<r1>:<r2>:...<rN>
Ericson2314 marked this conversation as resolved.
Show resolved Hide resolved
```

for encoded derivations written to the store.
`<r1> ... <rN>` are the store paths referenced by this path.
Those are encoded in the form described by `<realized-path>`.

- ```bnf
source:<r1>:<r2>:...:<rN>:self
```

For paths copied to the store and hashed via a [Nix Archive (NAR)] and [SHA-256][sha-256].
Just like in the text case, we can have the store objects referenced by their paths.
Additionally, we can have an optional `:self` label to denote self reference.

- ```bnf
output:<id>
```

For either the outputs built from derivations,
paths copied to the store hashed that area single file hashed directly, or the via a hash algorithm other than [SHA-256][sha-256].
(in that case "source" is used; it's silly, but it's done that way for compatibility).
Ericson2314 marked this conversation as resolved.
Show resolved Hide resolved

`<id>` is the name of the output (usually, "out").
For content-addressed store objects, `<id>`, is always "out".

- `<inner-digest>` = base-16 representation of a SHA-256 hash of `<inner-pre>`
Ericson2314 marked this conversation as resolved.
Show resolved Hide resolved

- `<inner-pre>` = one of the following based on `<type>`:

- if `<type>` = `text:...`:

the string written to the resulting store path.

- if `<type>` = `source:...`:

the the hash of the [Nix Archive (NAR)] serialization of the [file system object](@docroot@/store/file-system-object.md) of the store object.

- if `<type>` = `output:<id>`:

- For input-addressed derivation outputs:

the [ATerm](@docroot@/protocols/derivation-aterm.md) serialization of the derivation modulo fixed output derivations.

- For content-addressed store paths:

the string `fixed:out:<rec><algo>:<hash>:`, where

- `<rec>` = one of:

- `r:` hashes of the for [Nix Archive (NAR)] (arbitrary file system object) serialization

- `` (empty string) for hashes of the flat (single file) serialization

- `<algo>` = `md5`, `sha1` or `sha256`

- `<hash>` = base-16 representation of the path or flat hash of the contents of the path (or expected contents of the path for fixed-output derivations).

Note that `<id>` = `out`, regardless of the name part of the store path.
Also note that NAR + SHA-256 must not use this case, and instead must use the `<type>` = `source:...` case.

[Nix Archive (NAR)]: @docroot@/glossary.md#gloss-NAR
[sha-256]: https://en.m.wikipedia.org/wiki/SHA-256

## Historical Note
Ericson2314 marked this conversation as resolved.
Show resolved Hide resolved

The `<type>` = `source:...` and `<type>` = `output:out` grammars technically overlap, in that both can represent data hashed by its SHA-256 NAR serialization.

The original reason for this way of computing names was to prevent name collisions (for security).
For instance, the thinking was that it shouldn't be feasible to come up with a derivation whose output path collides with the path for a copied source.
The former would have an `<inner-pre>` starting with `output:out:`, while the latter would have an `<inner-pre>` starting with `source:`.

Since `64519cfd657d024ae6e2bb74cb21ad21b886fd2a` (2008), however, it was decided that separting derivation-produced vs manually-hashed content-addressed data like this was not useful.
Ericson2314 marked this conversation as resolved.
Show resolved Hide resolved
Now, data this is to be SHA-256 + NAR-serialization content-addressed always uses the `source:...` construction, regardless of how it was produced (manually or by derivation).
Ericson2314 marked this conversation as resolved.
Show resolved Hide resolved
This allows freely switching between using [fixed-output derivations](@docroot@/glossary.md#gloss-fixed-output-derivation) for fetching, and fetching out-of-band and then manually adding.
Ericson2314 marked this conversation as resolved.
Show resolved Hide resolved
It also removes the ambiguity from the grammar.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should reference future possibilities, but

86 changes: 7 additions & 79 deletions src/libstore/store-api.cc
Original file line number Diff line number Diff line change
Expand Up @@ -65,85 +65,13 @@ StorePath Store::followLinksToStorePath(std::string_view path) const
}


/* Store paths have the following form:

<realized-path> = <store>/<h>-<name>

where

<store> = the location of the Nix store, usually /nix/store

<name> = a human readable name for the path, typically obtained
from the name attribute of the derivation, or the name of the
source file from which the store path is created. For derivation
outputs other than the default "out" output, the string "-<id>"
is suffixed to <name>.

<h> = base-32 representation of the first 160 bits of a SHA-256
hash of <s>; the hash part of the store name

<s> = the string "<type>:sha256:<h2>:<store>:<name>";
note that it includes the location of the store as well as the
name to make sure that changes to either of those are reflected
in the hash (e.g. you won't get /nix/store/<h>-name1 and
/nix/store/<h>-name2 with equal hash parts).

<type> = one of:
"text:<r1>:<r2>:...<rN>"
for plain text files written to the store using
addTextToStore(); <r1> ... <rN> are the store paths referenced
by this path, in the form described by <realized-path>
"source:<r1>:<r2>:...:<rN>:self"
for paths copied to the store using addToStore() when recursive
= true and hashAlgo = "sha256". Just like in the text case, we
can have the store paths referenced by the path.
Additionally, we can have an optional :self label to denote self
reference.
"output:<id>"
for either the outputs created by derivations, OR paths copied
to the store using addToStore() with recursive != true or
hashAlgo != "sha256" (in that case "source" is used; it's
silly, but it's done that way for compatibility). <id> is the
name of the output (usually, "out").

<h2> = base-16 representation of a SHA-256 hash of <s2>

<s2> =
if <type> = "text:...":
the string written to the resulting store path
if <type> = "source:...":
the serialisation of the path from which this store path is
copied, as returned by hashPath()
if <type> = "output:<id>":
for non-fixed derivation outputs:
the derivation (see hashDerivationModulo() in
primops.cc)
for paths copied by addToStore() or produced by fixed-output
derivations:
the string "fixed:out:<rec><algo>:<hash>:", where
<rec> = "r:" for recursive (path) hashes, or "" for flat
(file) hashes
<algo> = "md5", "sha1" or "sha256"
<hash> = base-16 representation of the path or flat hash of
the contents of the path (or expected contents of the
path for fixed-output derivations)

Note that since an output derivation has always type output, while
something added by addToStore can have type output or source depending
on the hash, this means that the same input can be hashed differently
if added to the store via addToStore or via a derivation, in the sha256
recursive case.

It would have been nicer to handle fixed-output derivations under
"source", e.g. have something like "source:<rec><algo>", but we're
stuck with this for now...

The main reason for this way of computing names is to prevent name
collisions (for security). For instance, it shouldn't be feasible
to come up with a derivation whose output path collides with the
path for a copied source. The former would have a <s> starting with
"output:out:", while the latter would have a <s> starting with
"source:".
Comment on lines -131 to -146
Copy link
Member Author

@Ericson2314 Ericson2314 Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the first paragraph at the end is wrong, and the second two are misleading. That is why the new version (the historical note) looks quite different.

/*
The exact specification of store paths is in `protocols/store-path.md`
in the Nix manual. These few functions implement that specification.

If changes do these functions go behind mere implementation changes but
Ericson2314 marked this conversation as resolved.
Show resolved Hide resolved
also update the user-visible behavior, please update the specification
to match.
*/


Expand Down
Loading