-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ngclient: review preorder dfs code #1463
Conversation
Opened as a draft not only because it is dependent on #1408 but also it is not well tested yet. |
I've had a first look and the ideas look fine. Some smaller comments:
I'm suspicious about how this operates when target is not found:
Also, is_in_trusted_paths() returns False if both paths and path_hash_prefixes are unset. Is this correct? I can see how that's the "safe" assumption but it also makes the whole idea of unset paths and unset path_hash_prefixes meaningless: the delegation has no reason to exist... |
I really think it would be a lot simpler as a recursive function... |
tuf/ngclient/updater.py
Outdated
self._load_targets(role_name, parent_role) | ||
|
||
role_metadata = self._bundle[role_name].signed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this implies we should make bundle (or trusted_set as it maybe now) return the loaded metadata to avoid a lookup immediately afterwards. Can you file an issue please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done #1507
I agree, it does make the code look much simpler, let me test implementing those terminating delegations correctly with it :) |
Forget what I wrote, the check is for both being set ... |
ead4b19
to
8bd34de
Compare
Summarizing the changes made to this PR:
Still there are issues pending and I am keeping this a draft to have a chance of reviewing it again myself.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget what I wrote, the check is for both being set ...
Which leads me to the question, should the input validation include the check whether both are not set?
I think we should check for that.
I created an issue where we can discuss that #1497.
I like that it's much simpler and easier to read now and agree that the recursive version simplifies the code.
I think that before making this pr non-draft it's good to have tests for the ngclient updater.
I am a little worried about merging this before confirming we are not missing use-cases with this simplification.
tuf/ngclient/updater.py
Outdated
self, | ||
target_filepath: str, | ||
visited_role_names: Set[str], | ||
current_role_pair: List[Tuple[str, ...]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does ...
mean in List[Tuple[str, ...]]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
annotations for both visited_role_names and current_role_pair are wrong
tuf/ngclient/updater.py
Outdated
return targetinfo, terminated | ||
|
||
# Pop the role name from the top of the stack. | ||
role_name, parent_role = current_role_pair |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the type annotation of current_role_pair
I see it's a List[Tuple[str, ...]
.
Don't you need to get an exact element from current_role_pair
which will contain role_name
and parent_role
pair?
I have some reservations against recursion: when you use it, you then have to think about the call stack depth limitations and how that affects your code... In our case maximum number of delegations now depends on call stack depth. Many hundreds of delegations is not a use case we should explicitly design for, I accept that. Still, I don't like setting an artificial limit when we don't event know what that limit is: stack depth is implementation specific, cpython uses 1000, but we also don't know how much of that we can use. The application developer might love recursion and may have used half the stack already, and maybe the network stack is special as well and needs a few hundred stack levels... Basically I think expected call stack depth is larger than maximum reasonable number of delegations... but it's only maybe 5-10 times as large and that sounds uncomfortably close to me. |
As you said, I don't think any reasonable use case has that many delegations to worry about. Furthermore, it makes for more readable code, which is more important, I think. |
e23e64b
to
c79f063
Compare
Rebased on develop + some fixes of stupid errors in last commit. |
a3ff2b8
to
b502434
Compare
Thanks for the reviews and suggestions. I think it is time for this PR to stop being a draft and get a full review. I've intentionally kept the diffs in multiple smaller commits hoping that it may ease the review but you tell me how valuable it is for the history. I do mean to squash whatever feels redundant. Some issues that were raised meanwhile:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the changes seem reasonable and the history indeed is easier to follow when there are multiple smaller commits. Thanks for that @sechkova!
The only missing part in this pr is validation through more tests.
The _preorder_depth_first_walk
is not an easy to understand function and we should
make sure we aren't introducing bugs.
We have a couple of tests using get_one_valid_targetinfo()
in test_updater_ng.py
, but
there are a couple of code paths not tested yet (according to coverage all):
- delegation role with
terminating
option turn on - test when a target is not found
- reaching the max limit of visiting delegations, but there are more delegations left
Additionally, we never test DelegatedRole.is_in_trusted_paths()
when path_hash_prefixes
is set and thus never actually test calculating the hashes.
Make _visit_child_role a public method of DelegatedRole class. Reduce debug logging. Signed-off-by: Teodora Sechkova <[email protected]>
b502434
to
a182ca9
Compare
Rebased on latest changes in develop. |
No two ways about that but I don't know whether this PR is the right way given that this is a common issue for the whole Updater: #1499 |
|
||
role_metadata = self._trusted_set[role_name].signed | ||
role_metadata: Targets = self._trusted_set[role_name].signed | ||
target = role_metadata.targets.get(target_filepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
target = role_metadata.targets.get(target_filepath) | |
target: Union[TargetFile, None] = role_metadata.targets.get(target_filepath) |
According to the metadata API if role_metadata
is an instance of Targets
, then
role_metadata.targets.get(target_filepath)
should be a TargetFile
if it exists.
I think it make sense to add annotation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(as reply to Martins comment:) since role_metadata is now a Targets
, this does not need an annotation for target: static type check already figures out that target is a TargetFile | None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a major improvement, much easier to follow now. _preorder_depth_first_walk()
could maybe still be simplified a bit but this is so much better that I say let's get it in and do smaller tweaks afterwards (so the testing work can start).
A good set of test cases would be nice but I think we can leave that to #1499 -- the testing needs some serious thinking as well to prevent a mess of testing spagetti so makes sense to do in its own PR
DelegatedRole.is_in_trusted_paths()
seems like a good idea. I tried to think of a better name but ... can't come up with anything, I guess this is fine.
the paths/path_hash_prefixes resolution seems reasonable.
I would remove both the recursive implementation commit and the revert if it's not a direction that was chosen.
I Also left some comments in code but overall this LGTM: I'm approving, but please have a look at whether you want to act on the comments.
tuf/ngclient/updater.py
Outdated
def get_one_valid_targetinfo(self, target_path: str) -> Dict: | ||
def get_one_valid_targetinfo( | ||
self, target_path: str | ||
) -> Union[Dict[str, Any], None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) -> Union[Dict[str, Any], None]: | |
) -> Optional[Dict[str, Any]]: |
tuf/ngclient/updater.py
Outdated
def _preorder_depth_first_walk(self, target_filepath) -> Dict: | ||
def _preorder_depth_first_walk( | ||
self, target_filepath: str | ||
) -> Union[Dict[str, Any], None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) -> Union[Dict[str, Any], None]: | |
) -> Optional[Dict[str, Any]]: |
these mean union is then likely unused import
|
||
role_metadata = self._trusted_set[role_name].signed | ||
role_metadata: Targets = self._trusted_set[role_name].signed | ||
target = role_metadata.targets.get(target_filepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(as reply to Martins comment:) since role_metadata is now a Targets
, this does not need an annotation for target: static type check already figures out that target is a TargetFile | None
tuf/api/metadata.py
Outdated
# Calculate the hash of the filepath to determine which bin | ||
# to find the target. The client currently assumes the repository | ||
# uses sslib's default algorithm to generate hashes. | ||
digest_object = sslib_hash.digest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove the comment about assumptions and specify sha256 as algorithm here (as per spec) , no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right ... it does say exactly sha256
tuf/api/metadata.py
Outdated
raise ValueError( | ||
"At least one of the attributes 'paths' and" | ||
"'path_hash_prefixes' must be set!" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really appreciate if we could squeeze the exceptions and comments to the minimum number of lines that are required to convey the message... in the long term this has a major effect on readability. The content in this one seems a bit misleading as well so maybe:
raise ValueError( | |
"At least one of the attributes 'paths' and" | |
"'path_hash_prefixes' must be set!" | |
) | |
raise ValueError("One of paths or path_hash_prefixes must be set") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to shorten the exception messages and also make them look consistently. It may have slightly changed the meaning though: 34f59c9
Returns: | ||
A targetinfo dictionary or None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine as long as we aim to get rid of the whole dictionary soon... the dict is just bad API.
The goal for this method would be to just return Optional[TargetFile] but I guess we can't do that yet since
- updated_targets() for some reason takes multiple TargetFiles and needs targetpaths for each -- this will not look good if caller needs to provide the targetpaths ... but some API redesign might help
- TargetFile does not have targetpath in it (yet at least)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully we can change that with the progress of #1514
"no hash or path prefix": | ||
'{"keyids": ["keyid"], "name": "a", "terminating": true, "threshold": 3}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't there now be a test that checks that deserialization fails with this input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one with invalid DelegatedRoles: 34f59c9
Thanks for the comments, let me try to address them before merging. |
d49fab1
to
fbca198
Compare
I think I addressed everything and removed the commits about the recursive implementation. Should be ready now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely set of clean ups, nice work @sechkova.
I'd prefer a different name for is_in_trusted_paths()
, added a comment in-line. Let me know what you think.
Rename to DelegatedRole.is_delegated_path and return a boolean flag instead of the role's name. Minor comments and code style improvements. Some code simplification. Signed-off-by: Teodora Sechkova <[email protected]>
Remove _get_filepath_hash and call sslib_hash.digest directly instead. Signed-off-by: Teodora Sechkova <[email protected]>
Simplify the code in _preorder_depth_first_walk. Signed-off-by: Teodora Sechkova <[email protected]>
Reduce redundant logging and simplify code further. Signed-off-by: Teodora Sechkova <[email protected]>
Improve type annotations. Signed-off-by: Teodora Sechkova <[email protected]>
The specification does not state clearly what is the behaviour when none of delegation's "paths" and "path_hash_prefixes" is set. See theupdateframework#1497. Until this issue is clarified, copy current Updater which raises an error in such case. Signed-off-by: Teodora Sechkova <[email protected]>
Give 'role_names' the more verbose description 'delegations_to_visit'. Add some comments and docstrings. Signed-off-by: Teodora Sechkova <[email protected]>
fbca198
to
bc073b8
Compare
Fixes #1398
Description of the changes being introduced by the pull request:
Updater._preorder_depth_first_walk
code._visit_child_role
a member ofDelegatedRole
Please verify and check that the pull request fulfills the following
requirements: