-
-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add --output-format=json output option to v2 list #8450
add --output-format=json output option to v2 list #8450
Conversation
Note that I didn't discuss this specific implementation with folks earlier and would love to hear input on this approach. |
This is likely to be extremely useful when running pants with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel about renaming the option to (in order of my personal preference) --output=json
or --json
or or --verbose
or something? That allows us to add more info in the future if needed. Essentially we generalize this from a specific hack to a more general-purpose mechanism, currently only used for the specific hack...
I absolutely love |
(Ideally that sets the stage for a v2 |
Done! Made an enum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat!
2a65176
to
343d648
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this code correctly, this calculates the hashes of the fields of the target, right? So, if I have
target(
name = "a",
sources=["a.py", "b.py"],
)
The intransitive hash will be: hash([hash("a"), hash("a.py"), hash("b.py")])
.
If this is true, should we consider hashing the contents of the files, instead of (or in addition to) the filenames?
So it would be:
hash([hash("a"), hash(read("a.py")), hash(read("b.py"))])
I think for the case of #8445, they might want to redeploy every time a source file changes, not just the list of sources. Or maybe not, I'm not too sure.
We could also have each target adaptor define its own intransitive_fingerprint
, so targets with sources would know how to hash itself. This might not be best, because it might introduce a layer of caching above the engine graph itself, but could be worth thinking about.
Also, I might totally be missing something.
Even if we end up hashing only the filenames, I think there's still value in this, so wouldn't be opposed to merging it.
It's correct that this previously calculated just the hashes of the fields. It's not clear to me why the sources field is excluded from calculation in e.g. |
Stepping-back question: What's this actually useful for, if it doesn't include the digests of source files? (FWIW it would now be trivial to mix in a source-file digest by mixing-in |
Oops, I was just looking at the comments here not the code - you're already doing this! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the thorough testing!
for dep in tht.dependencies: | ||
dep_intransitive_fingerprint = intransitive_fingerprint_dict.get(dep.root.address, None) | ||
if not dep_intransitive_fingerprint: | ||
dep_sources = getattr(dep.root.adaptor, 'sources', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This block and the one before looks like it could be extracted into a common function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done now via @memoized_classproperty
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but I have a couple of questions :) Thanks!
# `stable_json_sha1()` to fail with a cycle detection. Since some python targets are only mapped | ||
# to `TargetAdaptor` (and not `PythonTargetAdaptor`), we check every single target for a | ||
# `requirements` kwarg, which is fine for now. | ||
key, value = super()._coerce_key_values(key, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure from reading - does this cover all actual key-values of the Target
, or just the ones explicitly listed in BUILD files?
In particular, if we change a default value in pants, or set a default with a flag or something, and the target doesn't set it, will the fingerprint change? Presumably it should, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This covers all the key-values that are provided as field_adaptors
, I believe, essentially because we're getting this info from a TargetAdaptor
, which will explicitly only use HydrateableField
@union
members as the _kwargs
provided by Struct
-- see hydrate_targets
:
pants/src/python/pants/engine/legacy/graph.py
Lines 497 to 509 in 1988587
@rule | |
def hydrate_target(hydrated_struct: HydratedStruct) -> HydratedTarget: | |
target_adaptor = hydrated_struct.value | |
"""Construct a HydratedTarget from a TargetAdaptor and hydrated versions of its adapted fields.""" | |
# Hydrate the fields of the adaptor and re-construct it. | |
hydrated_fields = yield [Get(HydratedField, HydrateableField, fa) | |
for fa in target_adaptor.field_adaptors] | |
kwargs = target_adaptor.kwargs() | |
for field in hydrated_fields: | |
kwargs[field.name] = field.value | |
yield HydratedTarget(target_adaptor.address, | |
type(target_adaptor)(**kwargs), | |
tuple(target_adaptor.dependencies)) |
In particular, if we change a default value in pants, or set a default with a flag or something, and the target doesn't set it, will the fingerprint change? Presumably it should, right?
Short answer: no, this fingerprint will not necessarily change, and yes, I absolutely think it should before we merge this.
Longer answer: Not all the things that contribute to Target#fingerprint
do not transfer to a TargetAdaptor
(which subclasses StructWithDeps
, which subclasses Struct
), only the things which are marked as hydrateable fields. As you imply, this is possibly not what we want at all for this purpose, and my use of TargetAdaptor
here might not be correct.
@stuhood do you have any insight on how to bridge this? Does it break the v2 build graph traversal model if we're able to get an instance of a real Target
in order to get a more normal fingerprint (i.e. a fingerprint containing everything the target payload does)? Are the Payload
/Target
concepts necessarily v1-only/requiring a v1 build graph, or is it possible to avoid having to reimplement payloads for all targets? I would love to pair on this or make an issue as necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the fingerprints from the output before merging this?
It feels like target_type is an obviously useful thing to include; it's less obvious that these fingerprints are things we should be exposing as-is, but we can always add them in the future if we need to / firm them up :)
"was_root": True, | ||
"address": "f:alias", | ||
"target_type": "target", | ||
"intransitive_fingerprint": "c108686e1fc1b327af1dbc295008762559d0b410", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests look like they may be fragile because of the hard-coding of both ordering of dumped fields, and fingerprints. I'm worried that if we add new values to Target
constructors, or change defaults, we'll need to blindly update fingerprints. Could we instead phrase the tests more like:
- Run
./pants list --ouptut-format=json f:alias
json.loads(outptut)
and assert that the fields we expect to be stable are correct- Make a change to a file which we expect to alter the fingerprint, run
./pants list
again, and see that the fingerprints we expect to change do, and the ones we don't expect to don't.
?
The particular tests I'd be wanting to see are:
- If I change a source file, both of the owning target's fingerprints change, but for a dependee only the transitive fingerprint changes
- If I change a random attribute on the target, as above
- If I run pants with a flag which will change an attribute (say whether strict_deps is default on or off), as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree to all of the above, and would absolutely prefer not to add new testing that blindly tries to match fingerprints. I will add this testing!
4f68eb8
to
564a926
Compare
+1 I could use this for the work I am currently doing. The addition of target type to list is particularly useful. |
An additional attribute of targets that would be useful, but that I would not block the PR for, is internal/external. |
47ae05f
to
30e15de
Compare
|
30e15de
to
be490ad
Compare
650dbc7
to
668425e
Compare
"target roots" are all targets on the command line. As a note, I've added a |
668425e
to
b752427
Compare
This failed on unrelated test timeouts and a mypy check which is now fixed, so it is definitely green. Since @ShaneDelmore and @olafurpg have expressed interest in this feature and I don't know of an obvious alternative, I would love to merge this change. @stuhood I have modified the tests to avoid relying on any specific fingerprinting mechanism, which means that no code should be relying on these fingerprints, or the structure of this json output, being the same across pants versions (yet). Would that guarantee be sufficient to overcome your concern about investing further in |
Going to merge this if there are no further comments in the next day or so since there is a clear user need and some review has been performed. |
b752427
to
52fe062
Compare
8eec473
to
66a7bed
Compare
add --with-fingerprints to list coerce the provides key when fingerprinting targets coerce the provides key when fingerprinting targets convert the option to be named --output-format! ensure fingerprints incorporate sources snapshots use the new Enum type! make fingerprints easier to create clean up impl bump deprecation version fix ci [ci skip-rust-tests] # No Rust changes made. [ci skip-jvm-tests] # No JVM changes made.
66a7bed
to
dba213f
Compare
Splitting this into two PRs to separate: |
Noting that the buck build tool supports a “show target hash” option for a while, which has allowed hooking it up to a system containing bazel (https://eng.uber.com/go-monorepo-bazel/). It would have been really great to have had less pushback on this PR initially when there was a clearly present user need. |
See the discussion in the linked ticket for more information on why this is subtle: bazelbuild/bazel#7962 ... file digest and action digests are useful for very different things: it's really important that users don't think that this is the latter thing (ie, they will need to do their own digesting of all of pant's other config, etc). I've suggested in slack that I think a Given that that might be a ways away though, I think that we could move forward here if we resolve a few things:
|
|
Probably not specific enough... content of what? But for that matter, |
FYI #9912 will impact this, hopefully making things easier. We go back to having only one |
Closing as stale, which we're doing for all changes that haven't been touched in 1+ years to simplify project management. This would still be a really neat feature, though. Do feel free to reopen. Thank you for showing what |
This has the `peek` output include the fingerprint of the sources referenced in a target. This is a step towards #8445, by putting more information into `peek`. For instance, with this, one way to get a crude "hash" of a target would be something like: ```shell { pants dependencies --transitive --closed path/to:target | xargs pants peek # these might change behaviour and so need to be included cat pants.toml cat 3rdparty/python/default.lock # or whatever other lock files are relevant } | openssl sha256 ``` This is conservative: the hash can be different without the behaviour of the target changing at all. For instance: - irrelevant changes in `pants.toml`: adjusting comments, unrelated subsystem config (e.g. settings in `[golang]` when `path/to:target` is a Python-only `pex_binary`) - upgrading 3rd party dependencies in the resolve that aren't (transitively) used by `path/to:target`. This relates to #12733: if all transitive 3rd party deps appeared in `pants dependencies --transitive`, and `pants peek` included the right info for them (e.g. version and fingerprints), the `cat 3rdparty/...` could be removed because the `peek` pipe would handle it. - target fields that don't impact execution behaviour, e.g. changing the `skip_black` setting on a `python_source` target, without changing the file contents (this might be _most_ fields on the (transitive) dependencies of a packageable target?) This is also only the hash of the input configuration, rather than a hash of a built artefact. If there's processes that aren't deterministic (e.g. `shell_command(command="date > output.txt", output_files=["output.txt"])` somewhere in the chain), the exact output artefact might be different if built twice, even if the hash hasn't changed. This PR is, in some sense, a partial revival of #8450, although is much simpler, because the JSON-outputting `peek` target already exists, and this doesn't try to solve the full problem.
Problem
Resolves #8445.
Solution
--output-format
option to thelist
v2@console_rule
, and make--provides
and--documented
point to values of that enum option.--output-format=json
, which prints out lines of json with the keys:was_root
: whether the target was a target root, or one of the transitive dependencies of one of the rootsaddress
: the target addresstarget_type
: the stringified version of the target's BUILD file name, e.g.python_library
intransitive_fingerprint
: the intransitive fingerprint for thatTargetAdaptor
transitive_fingerprint
: the transitive fingerprint for thatTargetAdaptor
Result
The following command line will output a string representing a stable hash of the transitive closure of the target
my/python:binary
:Once #7356 lands, we can use the following command to print the fingerprints of all python_binary targets whenever their source files change: