Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sources_fingerprint to peek on source-creating targets #18383

Merged
merged 4 commits into from
Mar 3, 2023

Conversation

huonw
Copy link
Contributor

@huonw huonw commented Feb 28, 2023

This has the peek output include the fingerprint of the sources referenced in a target. This is a step towards #8445, by putting more information into peek.

For instance, with this, one way to get a crude "hash" of a target would be something like:

{ 
   pants dependencies --transitive --closed path/to:target | xargs pants peek
   # these might change behaviour and so need to be included
   cat pants.toml
   cat 3rdparty/python/default.lock # or whatever other lock files are relevant
} | openssl sha256

This is conservative: the hash can be different without the behaviour of the target changing at all. For instance:

  • irrelevant changes in pants.toml: adjusting comments, unrelated subsystem config (e.g. settings in [golang] when path/to:target is a Python-only pex_binary)
  • upgrading 3rd party dependencies in the resolve that aren't (transitively) used by path/to:target. This relates to --dependencies-transitive doesn't list third-party transitive dependencies #12733: if all transitive 3rd party deps appeared in pants dependencies --transitive, and pants peek included the right info for them (e.g. version and fingerprints), the cat 3rdparty/... could be removed because the peek pipe would handle it.
  • target fields that don't impact execution behaviour, e.g. changing the skip_black setting on a python_source target, without changing the file contents (this might be most fields on the (transitive) dependencies of a packageable target?)

This is also only the hash of the input configuration, rather than a hash of a built artefact. If there's processes that aren't deterministic (e.g. shell_command(command="date > output.txt", output_files=["output.txt"]) somewhere in the chain), the exact output artefact might be different if built twice, even if the hash hasn't changed.

This PR is, in some sense, a partial revival of #8450, although is much simpler, because the JSON-outputting peek target already exists, and this doesn't try to solve the full problem.

@huonw huonw changed the title Add "sources_fingerprint" to peek on a source-creating target Add sources_fingerprint to peek on source-creating targets Feb 28, 2023
@kaos
Copy link
Member

kaos commented Feb 28, 2023

as we're getting more data into the peek output, I am more and more inclined to suggest we add more scoping on the data, to reduce the risk of name conflicts and also make it easier to trim/pick the desired data out of it.

Concretely what I'm suggesting is to have the actual target field values pushed down into fields for instance, so the peek output could look like:

[
  {
    "address": "src/python/pants/bin/remote_pants_runner.py",
    "target_type": "python_source",
    "fields": {
      "dependencies": [
        "src/python/pants/__init__.py",
        "src/python/pants/base/exiter.py",
        "src/python/pants/engine/internals/native_engine.pyi",
        "src/python/pants/option/global_options.py",
        "src/python/pants/option/options_bootstrapper.py",
        "src/python/pants/pantsd/pants_daemon_client.py"
      ],
      "description": null,
      "interpreter_constraints": null,
      "resolve": null,
      "restartable": false,
      "run_goal_use_sandbox": null,
      "skip_autoflake": false,
      "skip_black": false,
      "skip_docformatter": false,
      "skip_flake8": false,
      "skip_isort": false,
      "skip_mypy": false,
      "sources": [
        "src/python/pants/bin/remote_pants_runner.py"
      ],
      "tags": null
    },
    "raw_fields": {
      "dependencies": null,
      "source": "remote_pants_runner.py"
    },
    "sources_fingerprint": "...",
    "visibility": {
      "dependents_rules": ..,
      ..
    }
  },
  ..
]

maybe even write a schema for it's output format.. ;)

@huonw
Copy link
Contributor Author

huonw commented Mar 2, 2023

Yeah, that sounds good. Do you think this PR should wait before doing that? My inclination is that that sort of refactoring should/could happen independently (but soon).

@huonw huonw marked this pull request as ready for review March 2, 2023 02:23
@huonw huonw requested review from kaos and benjyw March 2, 2023 02:24
@kaos
Copy link
Member

kaos commented Mar 2, 2023

Yeah, that sounds good. Do you think this PR should wait before doing that? My inclination is that that sort of refactoring should/could happen independently (but soon).

Agree to not do that refactoring here but in a dedicated PR. Just mentioned it here as we both have been introducing new fields to the peek data and I've been thinking about it for a while and this got me into typing it out ;)

Copy link
Member

@kaos kaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm :)

Copy link
Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants