add --output-format=json output option to v2 list #8450

cosmicexplorer · 2019-10-11T00:56:34Z

Problem

Resolves #8445.

Solution

Add an --output-format option to the list v2 @console_rule, and make --provides and --documented point to values of that enum option.
Add --output-format=json, which prints out lines of json with the keys:
- was_root: whether the target was a target root, or one of the transitive dependencies of one of the roots
- address: the target address
- target_type: the stringified version of the target's BUILD file name, e.g. python_library
- intransitive_fingerprint: the intransitive fingerprint for that TargetAdaptor
- transitive_fingerprint: the transitive fingerprint for that TargetAdaptor

Result

The following command line will output a string representing a stable hash of the transitive closure of the target my/python:binary:

$ ./pants list --output-format=json my/python:binary \
  | jq -r 'select(.was_root) | .transitive_fingerprint'
ef54aa0c26c8d91bb74cd575e0cac9378bef8e4a

Once #7356 lands, we can use the following command to print the fingerprints of all python_binary targets whenever their source files change:

$ ./pants --loop --query="type_filter('python_binary')" list --output-format=json :: \
  jq 'select(.was_root) | {.address, .transitive_fingerprint}'
{"address": "my/python:binary", "transitive_fingerprint": "ef54aa0c26c8d91bb74cd575e0cac9378bef8e4a"}

cosmicexplorer · 2019-10-11T01:01:09Z

Note that I didn't discuss this specific implementation with folks earlier and would love to hear input on this approach.

cosmicexplorer · 2019-10-11T01:07:04Z

This is likely to be extremely useful when running pants with --loop for scalameta/metals#935, according to a discussion with the author of that PR (but this does not at all block that PR).

benjyw

How do you feel about renaming the option to (in order of my personal preference) --output=json or --json or or --verbose or something? That allows us to add more info in the future if needed. Essentially we generalize this from a specific hack to a more general-purpose mechanism, currently only used for the specific hack...

cosmicexplorer · 2019-10-11T02:53:28Z

I absolutely love --output=json or maybe --output-format=json!

cosmicexplorer · 2019-10-11T02:54:07Z

(Ideally that sets the stage for a v2 ./pants export too!)

cosmicexplorer · 2019-10-11T17:24:50Z

Done! Made an enum --output-format option, defaulting to --output-format=address-specs!

benjyw

Neat!

blorente

If I understand this code correctly, this calculates the hashes of the fields of the target, right? So, if I have

target(
  name = "a",
  sources=["a.py", "b.py"],
)

The intransitive hash will be: hash([hash("a"), hash("a.py"), hash("b.py")]).

If this is true, should we consider hashing the contents of the files, instead of (or in addition to) the filenames?
So it would be:
hash([hash("a"), hash(read("a.py")), hash(read("b.py"))])

I think for the case of #8445, they might want to redeploy every time a source file changes, not just the list of sources. Or maybe not, I'm not too sure.
We could also have each target adaptor define its own intransitive_fingerprint, so targets with sources would know how to hash itself. This might not be best, because it might introduce a layer of caching above the engine graph itself, but could be worth thinking about.

Also, I might totally be missing something.

Even if we end up hashing only the filenames, I think there's still value in this, so wouldn't be opposed to merging it.

cosmicexplorer · 2019-10-16T19:39:14Z

It's correct that this previously calculated just the hashes of the fields. It's not clear to me why the sources field is excluded from calculation in e.g. Struct._key(), but this implementation I've just pushed will explicitly attempt to extract sources from targets.

illicitonion · 2019-10-17T09:20:14Z

Stepping-back question: What's this actually useful for, if it doesn't include the digests of source files?

(FWIW it would now be trivial to mix in a source-file digest by mixing-in TargetAdaptor.sources.snapshot)

illicitonion · 2019-10-17T09:25:10Z

Stepping-back question: What's this actually useful for, if it doesn't include the digests of source files?

(FWIW it would now be trivial to mix in a source-file digest by mixing-in TargetAdaptor.sources.snapshot)

Oops, I was just looking at the comments here not the code - you're already doing this! :)

blorente

Thank you so much for the thorough testing!

blorente · 2019-10-17T09:30:04Z

src/python/pants/engine/legacy/graph.py

+    for dep in tht.dependencies:
+      dep_intransitive_fingerprint = intransitive_fingerprint_dict.get(dep.root.address, None)
+      if not dep_intransitive_fingerprint:
+        dep_sources = getattr(dep.root.adaptor, 'sources', None)


Nit: This block and the one before looks like it could be extracted into a common function

Done now via @memoized_classproperty!

illicitonion

Looks good, but I have a couple of questions :) Thanks!

illicitonion · 2019-10-17T09:31:33Z

src/python/pants/engine/legacy/structs.py

+    # `stable_json_sha1()` to fail with a cycle detection. Since some python targets are only mapped
+    # to `TargetAdaptor` (and not `PythonTargetAdaptor`), we check every single target for a
+    # `requirements` kwarg, which is fine for now.
+    key, value = super()._coerce_key_values(key, value)


I'm not sure from reading - does this cover all actual key-values of the Target, or just the ones explicitly listed in BUILD files?

In particular, if we change a default value in pants, or set a default with a flag or something, and the target doesn't set it, will the fingerprint change? Presumably it should, right?

This covers all the key-values that are provided as field_adaptors, I believe, essentially because we're getting this info from a TargetAdaptor, which will explicitly only use HydrateableField @union members as the _kwargs provided by Struct -- see hydrate_targets:

pants/src/python/pants/engine/legacy/graph.py

Lines 497 to 509 in 1988587

@rule

def hydrate_target(hydrated_struct: HydratedStruct) -> HydratedTarget:

target_adaptor = hydrated_struct.value

"""Construct a HydratedTarget from a TargetAdaptor and hydrated versions of its adapted fields."""

# Hydrate the fields of the adaptor and re-construct it.

hydrated_fields = yield [Get(HydratedField, HydrateableField, fa)

for fa in target_adaptor.field_adaptors]

kwargs = target_adaptor.kwargs()

for field in hydrated_fields:

kwargs[field.name] = field.value

yield HydratedTarget(target_adaptor.address,

type(target_adaptor)(**kwargs),

tuple(target_adaptor.dependencies))

In particular, if we change a default value in pants, or set a default with a flag or something, and the target doesn't set it, will the fingerprint change? Presumably it should, right?

Short answer: no, this fingerprint will not necessarily change, and yes, I absolutely think it should before we merge this.

Longer answer: Not all the things that contribute to Target#fingerprint do not transfer to a TargetAdaptor (which subclasses StructWithDeps, which subclasses Struct), only the things which are marked as hydrateable fields. As you imply, this is possibly not what we want at all for this purpose, and my use of TargetAdaptor here might not be correct.

@stuhood do you have any insight on how to bridge this? Does it break the v2 build graph traversal model if we're able to get an instance of a real Target in order to get a more normal fingerprint (i.e. a fingerprint containing everything the target payload does)? Are the Payload/Target concepts necessarily v1-only/requiring a v1 build graph, or is it possible to avoid having to reimplement payloads for all targets? I would love to pair on this or make an issue as necessary.

Can we remove the fingerprints from the output before merging this?
It feels like target_type is an obviously useful thing to include; it's less obvious that these fingerprints are things we should be exposing as-is, but we can always add them in the future if we need to / firm them up :)

illicitonion · 2019-10-17T09:34:16Z

tests/python/pants_test/backend/graph_info/tasks/test_list_targets.py

+        "was_root": True,
+        "address": "f:alias",
+        "target_type": "target",
+        "intransitive_fingerprint": "c108686e1fc1b327af1dbc295008762559d0b410",


These tests look like they may be fragile because of the hard-coding of both ordering of dumped fields, and fingerprints. I'm worried that if we add new values to Target constructors, or change defaults, we'll need to blindly update fingerprints. Could we instead phrase the tests more like:

Run ./pants list --ouptut-format=json f:alias

json.loads(outptut) and assert that the fields we expect to be stable are correct

Make a change to a file which we expect to alter the fingerprint, run ./pants list again, and see that the fingerprints we expect to change do, and the ones we don't expect to don't.
?

The particular tests I'd be wanting to see are:

If I change a source file, both of the owning target's fingerprints change, but for a dependee only the transitive fingerprint changes

If I change a random attribute on the target, as above

If I run pants with a flag which will change an attribute (say whether strict_deps is default on or off), as above

Yes, I agree to all of the above, and would absolutely prefer not to add new testing that blindly tries to match fingerprints. I will add this testing!

ShaneDelmore · 2020-02-12T17:33:15Z

+1 I could use this for the work I am currently doing. The addition of target type to list is particularly useful.

ShaneDelmore · 2020-02-12T17:35:04Z

An additional attribute of targets that would be useful, but that I would not block the PR for, is internal/external.

cosmicexplorer · 2020-02-12T18:26:09Z

internal/external

./pants list as we've implemented here only lists target roots -- are we looking to list dependencies as well?

cosmicexplorer · 2020-02-15T04:24:18Z

What is the definition of target root? I thought it was the output of filter-minimize, the targets you needed to invoke to get all code compiled eventually, but based on your usage I'm thinking I was wrong and it is actually all targets defined in BUILD.* files maybe.

"target roots" are all targets on the command line. filter-minimize (which is a v1 task that subclasses Filter in the twitter monorepo) mimics that to obtain the smallest number of target roots that pull in all of the invalid dependent targets. that's useful for us because we can then execute each target in its own pants invocation, which twitter uses in its internal CI -- that reduces the number of separate parallel pants invocations we know we have to invoke via https://github.com/twitter/scoot.

As a note, I've added a minimize() method to #7356, so filter-minimize should be available for the general public soon too.

cosmicexplorer · 2020-02-20T08:08:25Z

This failed on unrelated test timeouts and a mypy check which is now fixed, so it is definitely green. Since @ShaneDelmore and @olafurpg have expressed interest in this feature and I don't know of an obvious alternative, I would love to merge this change. @stuhood I have modified the tests to avoid relying on any specific fingerprinting mechanism, which means that no code should be relying on these fingerprints, or the structure of this json output, being the same across pants versions (yet). Would that guarantee be sufficient to overcome your concern about investing further in list vs --query?

cosmicexplorer · 2020-02-20T21:51:06Z

Going to merge this if there are no further comments in the next day or so since there is a clear user need and some review has been performed.

add --with-fingerprints to list coerce the provides key when fingerprinting targets coerce the provides key when fingerprinting targets convert the option to be named --output-format! ensure fingerprints incorporate sources snapshots use the new Enum type! make fingerprints easier to create clean up impl bump deprecation version fix ci [ci skip-rust-tests] # No Rust changes made. [ci skip-jvm-tests] # No JVM changes made.

cosmicexplorer · 2020-05-04T03:20:26Z

Splitting this into two PRs to separate:
(1) refactoring list_targets.py
(2) adding TransitiveFingerprintedTargets (and conform to the new v2 target api!)

cosmicexplorer · 2020-05-14T17:40:55Z

Noting that the buck build tool supports a “show target hash” option for a while, which has allowed hooking it up to a system containing bazel (https://eng.uber.com/go-monorepo-bazel/). It would have been really great to have had less pushback on this PR initially when there was a clearly present user need.

stuhood · 2020-05-14T19:23:55Z

See the discussion in the linked ticket for more information on why this is subtle: bazelbuild/bazel#7962 ... file digest and action digests are useful for very different things: it's really important that users don't think that this is the latter thing (ie, they will need to do their own digesting of all of pant's other config, etc).

I've suggested in slack that I think a RuleGraph-aware query might allow for exposing more of the subtlety here, because it has the potential to allow for querying the digest of a particular goal, or of the inputs to a particular process (which would include all of the kinds of config you need in an action graph fingerprint).

Given that that might be a ways away though, I think that we could move forward here if we resolve a few things:

the naming of the properties: we should make sure it is clear that they only include file contents (ie, the digest will not change if pants' configuration changes): maybe files_only_fingerprint?
by default, list does not require hydrating targets, or walking into their transitive deps: both of those add a non-trivial cost, so the json output should probably not contain all of those properties by default. So the json should maybe further allow for field filtering, with digests disabled by default.

benjyw · 2020-05-14T19:32:01Z

content_fingerprint ?

stuhood · 2020-05-14T19:41:04Z

content_fingerprint ?

Probably not specific enough... content of what? But for that matter, files_only is not very clear either, heh. Not sure what to call this.

Eric-Arellano · 2020-05-30T05:49:47Z

FYI #9912 will impact this, hopefully making things easier. We go back to having only one list implementation, this time using the Target API.

Eric-Arellano · 2021-07-19T19:02:56Z

Closing as stale, which we're doing for all changes that haven't been touched in 1+ years to simplify project management.

This would still be a really neat feature, though. Do feel free to reopen. Thank you for showing what --query could look like for Pants!

This has the `peek` output include the fingerprint of the sources referenced in a target. This is a step towards #8445, by putting more information into `peek`. For instance, with this, one way to get a crude "hash" of a target would be something like: ```shell { pants dependencies --transitive --closed path/to:target | xargs pants peek # these might change behaviour and so need to be included cat pants.toml cat 3rdparty/python/default.lock # or whatever other lock files are relevant } | openssl sha256 ``` This is conservative: the hash can be different without the behaviour of the target changing at all. For instance: - irrelevant changes in `pants.toml`: adjusting comments, unrelated subsystem config (e.g. settings in `[golang]` when `path/to:target` is a Python-only `pex_binary`) - upgrading 3rd party dependencies in the resolve that aren't (transitively) used by `path/to:target`. This relates to #12733: if all transitive 3rd party deps appeared in `pants dependencies --transitive`, and `pants peek` included the right info for them (e.g. version and fingerprints), the `cat 3rdparty/...` could be removed because the `peek` pipe would handle it. - target fields that don't impact execution behaviour, e.g. changing the `skip_black` setting on a `python_source` target, without changing the file contents (this might be _most_ fields on the (transitive) dependencies of a packageable target?) This is also only the hash of the input configuration, rather than a hash of a built artefact. If there's processes that aren't deterministic (e.g. `shell_command(command="date > output.txt", output_files=["output.txt"])` somewhere in the chain), the exact output artefact might be different if built twice, even if the hash hasn't changed. This PR is, in some sense, a partial revival of #8450, although is much simpler, because the JSON-outputting `peek` target already exists, and this doesn't try to solve the full problem.

cosmicexplorer requested review from stuhood, jsirois, benjyw, pierrechevalier83, blorente and Eric-Arellano October 11, 2019 00:56

cosmicexplorer requested a review from illicitonion October 11, 2019 01:51

benjyw reviewed Oct 11, 2019

View reviewed changes

cosmicexplorer changed the title ~~add --with-fingerprints json output option to v2 list~~ add --output-format=json output option to v2 list Oct 11, 2019

benjyw approved these changes Oct 11, 2019

View reviewed changes

cosmicexplorer force-pushed the fingerprint-targets-v2 branch from 2a65176 to 343d648 Compare October 14, 2019 05:22

blorente reviewed Oct 14, 2019

View reviewed changes

blorente approved these changes Oct 17, 2019

View reviewed changes

illicitonion reviewed Oct 17, 2019

View reviewed changes

cosmicexplorer force-pushed the fingerprint-targets-v2 branch from 4f68eb8 to 564a926 Compare October 17, 2019 19:01

cosmicexplorer mentioned this pull request Dec 19, 2019

option to output all command lines for all subprocesses #8838

Closed

cosmicexplorer force-pushed the fingerprint-targets-v2 branch 2 times, most recently from 47ae05f to 30e15de Compare February 12, 2020 18:24

cosmicexplorer force-pushed the fingerprint-targets-v2 branch from 30e15de to be490ad Compare February 12, 2020 18:29

cosmicexplorer force-pushed the fingerprint-targets-v2 branch from 650dbc7 to 668425e Compare February 15, 2020 03:25

cosmicexplorer force-pushed the fingerprint-targets-v2 branch from 668425e to b752427 Compare February 20, 2020 07:23

cosmicexplorer force-pushed the fingerprint-targets-v2 branch from b752427 to 52fe062 Compare March 25, 2020 00:56

cosmicexplorer mentioned this pull request Mar 31, 2020

hygienic BUILD file modifications (like buildozer) with libCST #9434

Closed

cosmicexplorer force-pushed the fingerprint-targets-v2 branch 10 times, most recently from 8eec473 to 66a7bed Compare May 4, 2020 03:02

cosmicexplorer force-pushed the fingerprint-targets-v2 branch from 66a7bed to dba213f Compare May 4, 2020 03:08

Base automatically changed from master to main March 19, 2021 19:20

Eric-Arellano closed this Jul 19, 2021

huonw mentioned this pull request Feb 28, 2023

Add sources_fingerprint to peek on source-creating targets #18383

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add --output-format=json output option to v2 list #8450

add --output-format=json output option to v2 list #8450

cosmicexplorer commented Oct 11, 2019 •

edited

Loading

cosmicexplorer commented Oct 11, 2019

cosmicexplorer commented Oct 11, 2019 •

edited

Loading

benjyw left a comment

cosmicexplorer commented Oct 11, 2019

cosmicexplorer commented Oct 11, 2019

cosmicexplorer commented Oct 11, 2019

benjyw left a comment

blorente left a comment

cosmicexplorer commented Oct 16, 2019

illicitonion commented Oct 17, 2019

illicitonion commented Oct 17, 2019

blorente left a comment

blorente Oct 17, 2019

cosmicexplorer Oct 17, 2019 •

edited

Loading

illicitonion left a comment

illicitonion Oct 17, 2019

cosmicexplorer Oct 17, 2019

illicitonion Feb 25, 2020

illicitonion Oct 17, 2019

cosmicexplorer Oct 17, 2019

ShaneDelmore commented Feb 12, 2020

ShaneDelmore commented Feb 12, 2020

cosmicexplorer commented Feb 12, 2020

cosmicexplorer commented Feb 15, 2020 •

edited

Loading

cosmicexplorer commented Feb 20, 2020

cosmicexplorer commented Feb 20, 2020

cosmicexplorer commented May 4, 2020

cosmicexplorer commented May 14, 2020 •

edited

Loading

stuhood commented May 14, 2020

benjyw commented May 14, 2020

stuhood commented May 14, 2020

Eric-Arellano commented May 30, 2020

Eric-Arellano commented Jul 19, 2021

	@rule
	def hydrate_target(hydrated_struct: HydratedStruct) -> HydratedTarget:
	target_adaptor = hydrated_struct.value
	"""Construct a HydratedTarget from a TargetAdaptor and hydrated versions of its adapted fields."""
	# Hydrate the fields of the adaptor and re-construct it.
	hydrated_fields = yield [Get(HydratedField, HydrateableField, fa)
	for fa in target_adaptor.field_adaptors]
	kwargs = target_adaptor.kwargs()
	for field in hydrated_fields:
	kwargs[field.name] = field.value
	yield HydratedTarget(target_adaptor.address,
	type(target_adaptor)(**kwargs),
	tuple(target_adaptor.dependencies))

add --output-format=json output option to v2 list #8450

add --output-format=json output option to v2 list #8450

Conversation

cosmicexplorer commented Oct 11, 2019 • edited Loading

Problem

Solution

Result

cosmicexplorer commented Oct 11, 2019

cosmicexplorer commented Oct 11, 2019 • edited Loading

benjyw left a comment

Choose a reason for hiding this comment

cosmicexplorer commented Oct 11, 2019

cosmicexplorer commented Oct 11, 2019

cosmicexplorer commented Oct 11, 2019

benjyw left a comment

Choose a reason for hiding this comment

blorente left a comment

Choose a reason for hiding this comment

cosmicexplorer commented Oct 16, 2019

illicitonion commented Oct 17, 2019

illicitonion commented Oct 17, 2019

blorente left a comment

Choose a reason for hiding this comment

blorente Oct 17, 2019

Choose a reason for hiding this comment

cosmicexplorer Oct 17, 2019 • edited Loading

Choose a reason for hiding this comment

illicitonion left a comment

Choose a reason for hiding this comment

illicitonion Oct 17, 2019

Choose a reason for hiding this comment

cosmicexplorer Oct 17, 2019

Choose a reason for hiding this comment

illicitonion Feb 25, 2020

Choose a reason for hiding this comment

illicitonion Oct 17, 2019

Choose a reason for hiding this comment

cosmicexplorer Oct 17, 2019

Choose a reason for hiding this comment

ShaneDelmore commented Feb 12, 2020

ShaneDelmore commented Feb 12, 2020

cosmicexplorer commented Feb 12, 2020

cosmicexplorer commented Feb 15, 2020 • edited Loading

cosmicexplorer commented Feb 20, 2020

cosmicexplorer commented Feb 20, 2020

cosmicexplorer commented May 4, 2020

cosmicexplorer commented May 14, 2020 • edited Loading

stuhood commented May 14, 2020

benjyw commented May 14, 2020

stuhood commented May 14, 2020

Eric-Arellano commented May 30, 2020

Eric-Arellano commented Jul 19, 2021

cosmicexplorer commented Oct 11, 2019 •

edited

Loading

cosmicexplorer commented Oct 11, 2019 •

edited

Loading

cosmicexplorer Oct 17, 2019 •

edited

Loading

cosmicexplorer commented Feb 15, 2020 •

edited

Loading

cosmicexplorer commented May 14, 2020 •

edited

Loading