Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sources: Add explicitSource #56985

Closed
wants to merge 1 commit into from
Closed

Conversation

nh2
Copy link
Contributor

@nh2 nh2 commented Mar 6, 2019

Motivation for this change

Edit: PR description slightly outdated, please see the commit message for the current one (which has feedback addressed).

This has the intention to address various issues with the existing
builtins.filterSource and surrounding library functions:

  • The library functions existing so far are not straightforward to use correctly and are hard to remember. Consequently only few people use them.

  • Most existing functions are about excluding unwanted files, making various hardcodes (such as .o and .so in cleanSourceFilter) that typically still forget some other unwanted files. The added approach is more about explicitly listing wanted files,
    thus being more reliable.

  • This very frequently results in non-reproducible builds when working from project working trees (as opposed to nixpkgs pkgs builds, which always operate from clean checkouts), with "build", "gen" or similar directories being accidentally included as sources.
    It is also a common sources of the dreaded
    warning: dumping very large path (> 256 MiB); this may run out of memory
    message (and as indicated, bad for performance).

  • The most common antipattern regarding the above is src = ./..
    To prevent it, we need to provide functionality that's straightforward to understand and convenient.

  • builtins.filterSource f ./. can result in build impurities depending on the basename of the parent directory (see Basename of current directory goes into derivation for filterSource on ./. nix#1305).
    A solution to that is builtins.path. The added function makes its use very convenient.

  • The existing functions make it difficult to debug what files are included and why.
    The added function provides an argument debugEnableTracing, that makes debugging easy.
    Output looks like:

    trace: myproject: include regular   /home/user/myproject/Setup.hs
    trace: myproject: skip    regular   /home/user/myproject/myproject.nix
    trace: myproject: skip    directory /home/user/myproject/dist
    trace: myproject: include directory /home/user/myproject/images
    trace: myproject: include regular   /home/user/myproject/images/image.svg
    trace: myproject: include directory /home/user/myproject/src
    trace: myproject: include directory /home/user/myproject/src/MyDir
    trace: myproject: include regular   /home/user/myproject/src/MyDir/File1.hs
    trace: myproject: include regular   /home/user/myproject/src/MyDir/File2.hs
    

After spending tens of hours of tracking down irreproducible builds resulting from bad src = assignments in various projects, I hope that this function will put an end to it.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Assured whether relevant documentation is up to date
  • Fits CONTRIBUTING.md.
  • Manual testing of the added functions
    • Examples:
      • lib.explicitFilterSource "lib" ./lib { srcDirs = [ ./lib ]; pathComponentExcludes = ["tests"]; }

TODO

  • Get general feedback on the idea.
  • Update nixpkgs manual, adding a section on using nix packages in project working trees, explaining how build files are often accidentally included, and recommending explicitFilterSource as a solution.
  • Update nix manual, mentioning in the sections about builtins.filterSource and builtins.path's filter that nixpkgs has convenience functions to handle common situations.

@nh2 nh2 requested review from edolstra and nbp as code owners March 6, 2019 20:55
@nh2
Copy link
Contributor Author

nh2 commented Mar 6, 2019

Thanks to @infinisil @tilpner @srhb @toonn @Myrl for helping with various parts of it.

@tazjin
Copy link
Member

tazjin commented Mar 6, 2019

Very cool! One request from my side would be to ensure the doc comments are formatted in the loose convention understood by nixdoc. This means using multi-line comments for "public" functions (those which should be included in the docs), having a Type: line and putting examples at the end following an Example: line. See #49383 for examples.

We don't include sources.nix in the manual generation yet, but we should and this would help 🙂

@GrahamcOfBorg GrahamcOfBorg added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Mar 6, 2019
Copy link
Member

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As already indicated on IRC, I really like this function :D.

You might want to consider adding sources.nix to the autogenerated library docs we now have since #53055. Would require polishing up the comments even more though

#
# Requiring an explicit name prevents the basename of the directory
# making it into the store path when `./.` is used, thus preventing
# impure builds in situations where ./. is a directory with random
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how I feel about requiring a name, it makes the function more annoying to use, and I personally don't care much about being able to identify source derivations, so I'd be fine with some constant name = "filtered-src". Having to specify a name for a src and the derivation itself is a bit weird too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how eg. fetchpatch has an optional name. imo a default constant is great though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK; I think it's very useful to have good names in nix-store -qR or to identify slow downloads, but given that it being optional seems to be popular, so I will change it accordingly.

Luckily fetchpatch users have started to use name more aggressively; hopefully it will also be the case here.

lib/sources.nix Outdated Show resolved Hide resolved
lib/sources.nix Outdated
# isPrefixOfLeafPath [a b] == true
# isPrefixOfLeafPath [a b c] == true
# isPrefixOfLeafPath [a b c x] == true
# isPrefixOfLeafPath [a d] == false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this function. Shouldn't isPrefixOfLeafPath [a d] == true? Because isPrefixOfLeafPath [a] == true and [a] is a prefix of [a d].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, isPrefixOfLeafPath [a] == false; that was a typo.

lib/sources.nix Outdated Show resolved Hide resolved
lib/sources.nix Outdated Show resolved Hide resolved
lib/sources.nix Outdated Show resolved Hide resolved
lib/sources.nix Outdated
# trace: myproject: include directory /home/user/myproject/src/MyDir
# trace: myproject: include regular /home/user/myproject/src/MyDir/File1.hs
# trace: myproject: include regular /home/user/myproject/src/MyDir/File2.hs
debugEnableTracing ? false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe debugTraceEnable, aligning it with debugTracePrefix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

then false
else
let
component = builtins.head path;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these builtins (except builtins.trace) are also reexported by lib, so you can drop builtins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't work, as with lib; is only enabled on a per-function level and isPrefixOfLeafPath doesn't have it, so it seems that it's needed.

I could add a with lib; to it, but is there a drawback/benefit of using lib vs builtins? It appeared to me that it's nice to only use builtins when possible, so that it's clear that you could easily copy out the function if needed / that it has not dependencies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's preferable to avoid with, so Nix can perform scope checking and improve lookup performance of performance-sensitive code like this.

lib/sources.nix Show resolved Hide resolved
@nh2
Copy link
Contributor Author

nh2 commented Apr 23, 2019

any progress in the last 1.5 months?

Not yet, currently bisecting some kernel issues :D

@FRidh FRidh added this to the 19.09 milestone Apr 29, 2019
lib/sources.nix Outdated Show resolved Hide resolved
@stale
Copy link

stale bot commented Jun 1, 2020

Thank you for your contributions.
This has been automatically marked as stale because it has had no activity for 180 days.
If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity.
Here are suggestions that might help resolve this more quickly:

  1. Search for maintainers and people that previously touched the
    related code and @ mention them in a comment.
  2. Ask on the NixOS Discourse. 3. Ask on the #nixos channel on
    irc.freenode.net.

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 1, 2020
@toonn
Copy link
Contributor

toonn commented Jun 1, 2020

I subscribed to this issue because I ran into confusion with the currently available methods so I'd definitely still like to see progress.

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 1, 2020
@ryantm ryantm added 2.status: merge conflict This PR has merge conflicts with the target branch and removed 2.status: merge conflict This PR has merge conflicts with the target branch labels Oct 3, 2020
This has the intention to address various issues with the existing
`builtins.filterSource` and surrounding library functions:

* The library functions existing so far are not straightforward to use
  correctly and are hard to remember.
  Consequently only few people use them.

* Most existing functions are about *excluding* unwanted files,
  making various hardcodes (such as `.o` and `.so` in `cleanSourceFilter`)
  that typically still forget some other unwanted files.
  The added approach is more about explicitly listing wanted files,
  thus being more reliable.

* This very frequently results in non-reproducible builds when working
  from project working trees (as opposed to nixpkgs `pkgs` builds, which
  always operate from clean checkouts), with "build", "gen" or similar
  directories being accidentally included as sources.
  It is also a common sources of the dreaded
      warning: dumping very large path (> 256 MiB); this may run out of memory
  message (and as indicated, bad for performance).

* The most common antipattern regarding the above is `src = ./.`.
  To prevent it, we need to provide functionality that's straightforward
  to understand and convenient.

* `builtins.filterSource f ./.` can result in build impurities
  depending on the basename of the parent directory
  (see NixOS/nix#1305).
  A solution to that is `builtins.path`, or its use in `cleanSourceWith`
  (see NixOS#67996).
  The added function makes its use very convenient.

* The existing functions make it difficult to debug what files are included
  and why. The added function provides an argument `debugTraceEnable`,
  that makes debugging easy.

After spending tens of hours of tracking down irreproducible builds resulting
from bad `src =` assignments in various projects, I hope that this function
will put an end to it.
@nh2 nh2 force-pushed the explicitFilterSource branch from d20e5d2 to d3dcd71 Compare December 13, 2020 22:27
Copy link
Contributor Author

@nh2 nh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed the feedback, and force-pushed a rebase on top of master.

I've also slightly improved the alignment in the debug output ("symlink" is a new file type that can also occur).

So I think this is quite ready from my side now.

@ofborg ofborg bot removed the 2.status: merge conflict This PR has merge conflicts with the target branch label Dec 13, 2020
@nh2 nh2 changed the title sources: Add explicitFilterSource sources: Add explicitSource Dec 13, 2020
@nh2 nh2 mentioned this pull request Dec 13, 2020
1 task
Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function implementation does too much by itself. Its implementation must be refactored by pulling out functions that perform the various tasks. Otherwise we'll end up maintaining multiple implementations of the same operations side by side.

I don't think a function with many parameters is a good replacement for something that can be expressed as set-like operations, because the structure of such an expression matters to the meaning. To illustrate (with bool ops instead), (a && b) || c is generally not equal to a && (b || c), so the meaning of an expression f { inherit a b c; } is ambiguous without knowing the intricate details of f that aren't easily expressed in natural language docs.

We can only judge the value of explicitSource by knowing how users use the currently missing or undocumented functions. Recommending it may or may not be a mistake in this new situation. If the implementation of explicitSource is small enough, perhaps we can take the risk.

The library functions existing so far are not straightforward to use correctly and are hard to remember. Consequently only few people use them.

We seem to lack the functions that make this easy. This PR has the right ideas for what those functions could be.

The added approach is more about explicitly listing wanted files,
thus being more reliable.

👍 this has been missing.

The most common antipattern regarding the above is src = ./..
To prevent it, we need to provide functionality that's straightforward to understand and convenient.

We need to document the source functions in the manual and update our examples.

A composition of well-named filters source operations will be even easier to understand than a set of parameters that don't exhibit their priorities at the call site.

The functions

The parameters are a good guide of what's missing.

  • Instead of includeDirs, we should have unionSources from list of sources to a single source starting at the longest common subpath.

  • includeFiles should be a function [path] -> source -> source the returns true for the paths and their ancestry without consulting the original source. includeFiles can be implemented with unionSources.

  • excludeHidden should be a function source -> source.

  • pathComponentExcludes should be a function [string] -> source -> source. Large part of its added value is that it only applies the excludes below origSrc.

  • debugTraceEnable should be turned into a function source -> source. Similarly, we could have traceSourceWith : string -> source -> source or traceSourceWith : (path -> type -> string) -> source -> source

  • name can be turned into a function too, to make the possibility of renaming more visible. cleanSourceWith { inherit name src; } will do the trick, or we could refactor cleanSourceWith to call the new renaming function.

Functions like these will make explicitFilterSource either redundant, or easy to understand and maintain.

@Profpatsch
Copy link
Member

I really want to see something like this to succeed, the missing best practices around filterSource are a giant problem.

A composable design with mostly straightforward semantics would be amazing to have, yet for some reason it seems like it’s very hard to create.

@roberth
Copy link
Member

roberth commented Dec 14, 2020

for some reason it seems like it’s very hard to create.

Appearances can be deceiving. I'm thinking a combination of learned helplessness and lack of urgency.

cleanSourceWith is about three years old. Took another 21 months for another contributor to make name settable and then another 6 months for yet another contributor to improve on that and solve the name problem by default. Meanwhile, I suspect everyone assumes that this is "core nixpkgs" or something and infer incorrectly from there. Nixpkgs itself only uses cleanSourceWith indirectly and only in 8 places.

@nh2
Copy link
Contributor Author

nh2 commented Dec 15, 2020

@roberth Splitting the function into reusable components sounds good, but it's a bigger task than I have time to tackle currently.

Would you mind giving a shot at implementing your proposed ideas?

@roberth
Copy link
Member

roberth commented Dec 15, 2020

@roberth Splitting the function into reusable components sounds good, but it's a bigger task than I have time to tackle currently.

Would you mind giving a shot at implementing your proposed ideas?

I'm currently prioritizing reviews and maintenance over code contributions for the same reason.

@veprbl veprbl removed this from the 19.09 milestone May 31, 2021
@stale
Copy link

stale bot commented Jan 9, 2022

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jan 9, 2022
if tree == null
then true
else
if path == []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid the hadoukif
image

Usually we pile the else ifs

if cond1 then val1
else if cond2 then val2
else if cond3 then val3
else ...

component = builtins.head path;
restPath = builtins.tail path;
in
if !(builtins.hasAttr component tree)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert the logic here. if not is a bad, less readable pattern.

@stale stale bot removed 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md labels Dec 7, 2022
@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 18, 2023
@infinisil
Copy link
Member

infinisil commented Jun 19, 2023

I recently developed a more composable, safer and easier-to-use abstraction for source filtering in a draft PR to Nixpkgs, please take a look, try it out, and give feedback! https://discourse.nixos.org/t/easy-source-filtering-with-file-sets/29117

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 19, 2023
@AndersonTorres
Copy link
Member

I will close this as abandoned. You are free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: enhancement Add something new 9.needs: community feedback 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux
Projects
None yet
Development

Successfully merging this pull request may close these issues.