Merge ngclient: a new client library implementation #1408

jku · 2021-05-21T11:42:03Z

This is something we've been working in experimental-client branch so far and would like to merge to develop now. It's a new client library implementation:

Uses Metadata API
Is partly based on current tuf/client/, but no longer shares any modules with it (except exceptions)
There's a major simplification in the way downloads are handled due to removing mirrors support
Tracking of valid metadata has been separated into another component
There's no MultiRepoUpdater or mirrors support

The implementation is roughly feature complete and ready for a test-drive.

There are still TODO-items that we'd like to handle before this could replace the current client, but we feel this would be a good time to merge so future work can be shared and is visible to everyone. Teodora has promised to post a list of things currently on our TODO-list here.

On the PR itself:

Review is very welcome, but I hope we can consider ngclient directory a work-in-progress, in that we can file issues instead of fixing all review items here
I've decided to squash the commits in this branch (the content matches experimental-client) The reasons were
- there were a number of file moves while we were deciding what to do
- we made decisions that we decided to back away from in the branch (but due to other changes did not use revert) so quite a few commits are just red herrings
- there are ~100 commits and multiple merges of develop in experimental-client so history becomes a little hard to rack

If others would rather see the full history, I can make this PR from the experimental-client branch too.

sechkova · 2021-05-21T12:05:25Z

As mentioned in the description, we don't consider this PR as a fully complete client and here is the list of our current TODOs which very likely will be extended further:

review download code ngclient: review download code #1399
review _preorder_depth_first_walk() ngclient: review _preorder_depth_first_walk() #1398
consider API changes client refactor: consider API changes #1317
revise exceptions review/document client exceptions #1312
add tests Add/rewrite tests for the refactored client #1309
use %-style formatting in logging ngclient: Use %-style formatting in logging #1400
proper public module documentation ngclient: Proper public module documentation #1385

jku · 2021-05-21T13:46:53Z

I'll add a few more potential TODOs that I dug from my notes:

experiment with separating local target caching from updater ngclient: experiment with separating local target caching from updater #1412
use MetaFile/TargetFile verification API when it exists ngclient: use MetaFile/TargetFile verification API when it exists #1410
type annotations review and mypy integration ngclient: type annotations review and mypy integration #1409
Use Metadata APIs verify_with_threshold() when possible ngclient: Use Metadata APIs verify_with_threshold() when possible #1413

trishankatdatadog · 2021-05-26T02:59:11Z

Thanks for the great work, and would love to review! Give me a deadline, please...

joshuagl · 2021-05-27T09:18:25Z

Thanks for the great work, and would love to review! Give me a deadline, please...

Thanks @trishankatdatadog! The team are focused on supporting changes targeting develop (with no dependency on this PR) for the next two weeks, but after that it would be really good to keep moving this forward. Would a deadline of Wed 9th work?

MVrachev · 2021-06-01T12:34:53Z

tuf/ngclient/_internal/metadata_bundle.py

+>>>     with open(timestamp_path, "rb") as f:
+>>>         bundle.update_timestamp(f.read())
+>>> except (RepositoryError, OSError):
+>>>     pass # failure to load a local file is ok


From what I understand, the RepositoryError exception will be caught whenBadVersionNumberError or ExpiredMetadataError is thrown (commit 91935fa).
So, I don't think this is a simple fail to load a local file.

Maybe we shouldn't give examples where you would want to just pass on RepositoryError when this exception could be a serious one?

the question is, when would you want to stop the update process here? I don't think those cases exist: I don't want to stop if the local metadata is expired, corrupted or not signed with current keys for example.

If your local keys are broken how you are supposed to verify the new timestamp file?
How do you recover from that?

The spec is fairly silent on what to demand from local metadata. But the basic rule is that we cannot demand local metadata to even exist (apart from the demand that one root version must exist).

I don't know what you mean by broken keys, but for some cases:

local timestamp is signed by keys that are not the correct signing keys according to root: this is uncommon but not an error. The keys have likely been rotated and we must now download the new timestamp and cannot do the rollback check: version number of the trusted timestamp metadata file, if any, MUST be less than or equal to the version number of the new timestamp metadata file -- we do not have a trusted timestamp file at this point in time, so we proceed without one

local timestamp file has been corrupted on disk or tampered with: we do not have a trusted timestamp file at this point so should proceed without one. Logging something might be a good idea but I don't think failing the process makes sense as a trusted timestamp is not required

the local software installation is broken (e.g. a dependency got uninstalled) so e.g. signature verification fails: This would be nice to handle separately but I'm not currently convinced we can meaningfully separate this case from e.g. an attacker messing with the signatures and feeding bad data to us and the signature verification failing weirdly because of that. Again, if local metadata fails to load, we should proceed without it.

local timestamp is signed by keys that are not the correct signing keys according to root: this is uncommon but not an error. The keys have likely been rotated and we must now download the new timestamp and cannot do the rollback check: version number of the trusted timestamp metadata file, if any, MUST be less than or equal to the version number of the new timestamp metadata file -- we do not have a trusted timestamp file at this point in time, so we proceed without one

local timestamp file has been corrupted on disk or tampered with: we do not have a trusted timestamp file at this point so should proceed without one. Logging something might be a good idea but I don't think failing the process makes sense as a trusted timestamp is not required

the local software installation is broken (e.g. a dependency got uninstalled) so e.g. signature verification fails: This would be nice to handle separately but I'm not currently convinced we can meaningfully separate this case from e.g. an attacker messing with the signatures and feeding bad data to us and the signature verification failing weirdly because of that. Again, if local metadata fails to load, we should proceed without it.

You mean the updating process will start all over again and we will download from version 1 of the timestamp right?

there is nothing to start over. The whole timestamp loading process is:

load local timestamp (if it's available, readable and valid)

load remote timestamp (latest version), validating against the already loaded timestamp if it was loaded.

If the remote timestamp was not valid for any reason, abort

MVrachev · 2021-06-02T09:33:17Z

tuf/ngclient/_internal/metadata_bundle.py

+   in the beginning of load_timestamp()...
+ * some metadata interactions might work better in Metadata itself
+ * Progress through Specification update process should be documented
+   (not sure yet how: maybe a spec_logger that logs specification events?)


Do we have an issue for this one?
Seems useful for debugging.
The suggestion for spec_logger.log or something sounds reasonable.

yeah, I don't have an issue... I'm still on the fence about it -- it sounds like a nice idea but it does add quite a few lines of code that maybe aren't necessary (the fact that something is an item in the spec might not mean it's worth logging)

I imagine logging will be useful for the bigger more time-consuming steps.
For example, logging when the update process is finished for each of the metadata classes.

MVrachev

I am sorry I left two reviews according to GitHub.
I thought I had just left comments there, but somehow I shared it as a review...

I like the MetadataBundle.
It's simple and easy to read. Can't wait to see what will happen when we update the rest of the code!
There are one or two important comments I want us to answer, the others are style improvements.

MVrachev · 2021-06-02T10:04:25Z

tuf/ngclient/_internal/metadata_bundle.py

+        raise ValueError("Call is valid only on delegator metadata")
+
+    if role is None:
+        raise ValueError(f"Delegated role {role_name} not found")


Do you allow calling verify_with_threshold when delegator.signed is Targets, but there are no delegations?
Or do you make that check somewhere else?
Then, you would end up with role which is None, and throw that exception.

I do allow that on purpose, and I think the exception is correct: caller is trying to verify a role that isn't delegated by delegator.

But it's a valid allowed situation to have Targets without delegations and thus call verify_with_threshold which will throw that exception.
Then, you should add a check inside update_targets which doesn't call verify_with_threshold on empty delegations or catch the exception.

btw, we should polish this code in the PR for #1306 (I plan to do one soon) -- it's here just as a placeholder

can you comment in the code where we should handle the exception because I am not sure what this refers to?

If there are no delegations, then how does the client end up calling update_targets() for a delegate?

You can call it here: https://github.com/jku/tuf/blob/f057d57e6aa2cc1bad1a153343fc6335c9c5c11c/tuf/ngclient/_internal/metadata_bundle.py#L445 line 445 in your current changes.
What's interesting is that you aren't testing if we support targets with no delegations field.

The reason is that you don't call update_delegated_targets() in your tests with a target file without delegations.
I thought you should been testing this in test_metadata_bundle.py when updating role2.json like this:

with open(os.path.join(repo_dir, "role2.json"), "rb") as f: bundle.update_delegated_targets(f.read(), "role2", "role1")

but after a quick check, I found that role2.json had delegations, but delegations itself consists only of empty keys dict and roles list.
I asked if that was the intention when marking delegations as optional here.

When I removed delegations from role2.json I got:

tuf.exceptions.UnsignedMetadataError: New role2 is not signed by role1

When I removed delegations from role2.json I got:

tuf.exceptions.UnsignedMetadataError: New role2 is not signed by role1

Did you resign the metadata after modifying? If you didn't, this seems like the correct thing to do -- signature in role2 is no longer valid

I'm not saying there isn't a bug here (and we definitely should have better ways of modifying and resigning the data) but I just don't see the bug... The delegations in the delegated metadata are not processed at all (apart from what happens in Metadata.from_bytes()) during update_delegated_targets().

The delegations in the delegator metadata are processed -- but in that case calling update_delegated_targets() on metadata that does not have delegations would be a bug and would result in a ValueError.

Yea, I had forgotten to resign the metadata. I did the experiment again.

The delegations in the delegated metadata are not processed at all (apart from what happens in Metadata.from_bytes()) during update_delegated_targets().
The delegations in the delegator metadata are processed -- but in that case calling update_delegated_targets() on metadata that does not have delegations would be a bug and would result in a ValueError.

Yes, you are right and I verify that after debugging it.
Probably there is no bug here and it's me spending too much time on optional attributes. :D

MVrachev · 2021-06-02T10:58:59Z

tuf/ngclient/_internal/metadata_bundle.py

+    unique_keys = set()
+    for keyid in role.keyids:
+        key_dict = keys[keyid].to_dict()
+        key, dummy = sslib_keys.format_metadata_to_key(key_dict)


Nit: maybe you can just key, _ = sslib_keys.format_metadata_to_key(key_dict) instead of giving a name to the second value.

I agree but our pylint configuration does not currently allow that. Possibly we should change that (Google style guide and their pylintrc does allow it)

btw, Teodora noticed that this call should not be done: this is being fixed in #1417

So, do we need to merge 1417 in order to merge this?

Personally I consider the implementation of verify_with_threshold in this PR a placeholder: I don't mind it if it's not perfect, and I don't think waiting on the replacement is necessary.

I suggest we polish #1423 and #1436 as much as we want and use the result here when they are ready, but don't block this PR on those two

MVrachev · 2021-06-02T11:12:53Z

tuf/ngclient/_internal/metadata_bundle.py

+        self._bundle["root"] = new_root
+        logger.debug("Updated root")
+
+    def root_update_finished(self):


Do we want to be able to reset root_update_finished back to False at some point?
Most likely it's reasonable to assume that a MetadataBundle instance won't live beyond one root version, but I wanted to make sure I understand the assumption.
If this assumption is true, then do we want to document that somewhere?

Correct, idea is that if you ever want to "refresh again", you create a new bundle and start from scratch -- or in other words: updating a specific piece of metadata is only possible until you have loaded any "depending metadata" (this means delegated metadata but also the special cases for top-level metadata where they each depend on the previous one in order timestamp -> snapshot -> targets)

Thanks for the clarifications!

MVrachev · 2021-06-02T11:40:13Z

tuf/ngclient/_internal/metadata_bundle.py

+        except DeserializationError as e:
+            raise exceptions.RepositoryError("Failed to load timestamp") from e
+
+        if new_timestamp.signed.type != "timestamp":


I just realized that we are checking Timestamp.type a couple of times:

once when we call Metadata.from_bytes() -> deserializer.deserialize() -> Metadata.from_dict() and there we check that _type is one from the four metadata types.

from Metadata.from_dict() -> Timestamp.from_dict() -> Signed._common_fields_from_dict() and there we check if _type != cls._signed_type:

is this check

I understand why you made that check here, I am just wondering do we need all of them?
PS: This is true for all of the top metadata classes.

Good question:
Number 1 is the dispatcher, we need to do this somewhere.
Number 2 is just a sanity check IIRC: could be removed but it doesn't hurt anyone
Number 3 we do need but you are right that it looks like we could do better: I wish I could do something like this:

try: new_timestamp = Metadata[Timestamp].from_bytes(data) except DeserializationError as e: raise exceptions.RepositoryError("Failed to load timestamp") from e

Where metadata was a Generic type over (Root/Timestamp/Snapshot/Targets) and the constructor used above would

fail if the type is not Timestamp.type

type annotate new_timestamp.signed correctly as Timestamp

I have no idea if this is possible: I've done an experiment and managed to get close but not quite there.

A simpler alternative is adding an optional argument to the factory constructors: Metadata.from_bytes(data, type=Timestamp) which would raise if the Type is not correct (but would do nothing to fix the annotation issue)

MVrachev · 2021-06-02T11:43:48Z

tuf/ngclient/_internal/metadata_bundle.py

+                new_timestamp.signed.meta["snapshot.json"].version
+                < self.timestamp.signed.meta["snapshot.json"].version
+            ):
+                # TODO not sure about the correct exception here


Do we want another exception here?
I thought that this is the correct one?

yes I guess that's fine

MVrachev · 2021-06-02T11:49:34Z

tuf/ngclient/_internal/metadata_bundle.py

+        meta = self.timestamp.signed.meta["snapshot.json"]
+
+        # Verify against the hashes in timestamp, if any
+        hashes = meta.hashes or {}


Nit: This line made me question the default value for MetaFile.hashes.
Maybe we can use an empty dictionary instead of None?
Then, you wouldn't have to use or.

I thought about it but doing that in Metadata means the content changes when you do deserialize + serialize: no hashes becomes empty hashes...

MVrachev · 2021-06-02T11:51:14Z

tuf/ngclient/_internal/metadata_bundle.py

+            digest_object.update(data)
+            observed_hash = digest_object.hexdigest()
+            if observed_hash != stored_hash:
+                # TODO: Error should derive from RepositoryError


I think this is fixed in 91935fa.

MVrachev · 2021-06-02T11:58:39Z

tuf/ngclient/_internal/metadata_bundle.py

+ * exceptions are not final: the idea is that client could just handle
+   a generic RepositoryError that covers every issue that server provided
+   metadata could inflict (other errors would be user errors), but this is not
+   yet the case


I am not sure if this is not the case anymore.
Do you still throw exceptions that aren't RepositoryError or don't inherit RepositoryError?
The only one I see is RuntimeError and ValueError, but we don't want to document them right?

I think you are probably correct but I also think another close look would be good

at least hash verify failure is not a repositoryerror yet

Okay, well this shouldn't stop us from merging this.
We have an issue where we document this step, so I will consider this comment as resolved.

MVrachev · 2021-06-02T12:34:19Z

tuf/ngclient/updater.py

+        After the client has retrieved the target information for those targets
+        they are interested in updating, they would call this method to
+        determine which targets have changed from those saved locally on disk.
+        All the targets that have changed are returns in a list.  From this


Suggested change

All the targets that have changed are returns in a list. From this

All the targets that have changed are returned in a list. From this

MVrachev · 2021-06-02T12:51:37Z

tests/test_updater_ng.py

+import json
+import tracemalloc
+
+if sys.version_info >= (3, 3):


I don't think we need this check given that we support Python3.6+.

jku · 2021-06-08T08:25:42Z

tuf/ngclient/_internal/metadata_bundle.py

+        if not verify_with_threshold(delegator, role_name, new_delegate):
+            raise exceptions.UnsignedMetadataError(
+                f"New {role_name} is not signed by {delegator_name}",
+                new_delegate,
+            )


this might be more readable if we called it new_delegate_name instead of role_name -- so the connection between the name and new_delegate metadata is clear.

Of course, if Metadata object included the name (#1425), this would be even clearer:

if not verify_with_threshold(delegator, new_delegate): raise exceptions.UnsignedMetadataError( f"New {new_delegate.name} is not signed by {delegator.name}", new_delegate, )

👍 completely agree with using variable names that make the connection clearer.

trishankatdatadog

Very nice work, everyone, I love how the new Updater is a few hundred lines, like my PoC! 👏🏽

A few overall comments:

Could use more typing hints.
I feel the Metadata API could be more intuitive, but that's a different PR.
A few functions are a bit too long. Please consider refactoring.
Please double-check against the spec and another implementation (e.g., my PoC) to check whether you missed anything.

Thank you very much again, great work, the ngclient really lives up to its name! Very nice and easy to read like Joshua promised.

trishankatdatadog · 2021-06-09T04:32:07Z

tuf/ngclient/_internal/metadata_bundle.py

+
+"""TUF client bundle-of-metadata
+
+MetadataBundle keeps track of current valid set of metadata for the client,


As I've mentioned before, I think it is a fantastic idea to use a state machine to keep track of the update workflow, but I do think MetadataBundle is not the best name for this. Could we please rename to something much more descriptive? Maybe something like Workflow? Not the best, but you get the idea.

yeah we can, I've not been totally happy with the name. A lot of the code is indeed about implementing the update workflow so Workflow makes some sense...

The core idea (or at least the big change) I was starting from is that the currently valid set of metadata must be considered a unit. Single metadata file cannot usually be considered "valid": validity can only be decided for the whole set, and the set must be valid through the update process, at every step. I was trying to figure out a name for this set (that could contain just root metadata or possibly dozens of individual metadata files)

Would be nice to get more opinions, I've been looking at this for way too long to be able to switch viewpoints now...

Why no one likes my TrustedMetadata option 😄
This is how the specification refers to metadata files that have been verified by the client and persisted and MetadataBundle is implemented as a mapping class rather than a workflow.

I like the name TrusteMetadata, but MetadataBundle is both a mapping and a workflow class, no?

Yeah Trusted would be nice in the name but I don't like TrustedMetadata because that sounds like a type of Metadata (a class we already have). The component does contain a lot of the update workflow, but in a way that workflow is just a side effect of keeping track of our collection of trusted metadata...

trishankatdatadog · 2021-06-09T04:36:28Z

tuf/ngclient/README.md

+https://theupdateframework.github.io/specification/latest/#detailed-client-workflow)
+
+**tuf.ngclient.FetcherInterface** is an abstract class that client
+implementers can optionally use to integrate with their own


Suggested change

implementers can optionally use to integrate with their own

implementers can implement a concrete class of in order to reuse their own

trishankatdatadog · 2021-06-09T04:37:12Z

tuf/ngclient/README.md

+
+**tuf.ngclient.FetcherInterface** is an abstract class that client
+implementers can optionally use to integrate with their own
+network/download infrastructure -- a Requests-based implementation is


Suggested change

network/download infrastructure -- a Requests-based implementation is

networking/download libraries -- a Requests-based implementation is

trishankatdatadog · 2021-06-09T04:37:27Z

tuf/ngclient/README.md

+This package:
+* Aims to be a clean, easy-to-validate reference client implementation
+  written in modern Python
+* At the same time aims to be the library choice for anyone


Suggested change

* At the same time aims to be the library choice for anyone

* At the same time aims to be the library of choice for anyone

trishankatdatadog · 2021-06-09T04:37:42Z

tuf/ngclient/README.md

+* At the same time aims to be the library choice for anyone
+  implementing a TUF client in Python: light-weight, easy to integrate
+  and with minimal required dependencies
+* Is still under development but planned to become the default client


Suggested change

* Is still under development but planned to become the default client

* Is still under development but is planned to become the default client

trishankatdatadog · 2021-06-10T02:34:25Z

tuf/ngclient/updater.py

+    def __init__(
+        self,
+        repository_dir: str,
+        metadata_base_url: str,


Great idea to separate metadata vs targets.

trishankatdatadog · 2021-06-10T02:39:51Z

tuf/ngclient/updater.py

+    return None
+
+
+def _check_file_length(file_object, trusted_file_length):


Hmm. I get that these are helper functions, but any reason not to fold them into the class?

This is issue #1410.

In short: I think the Metadata API should offer these sort of functions (so that there is a "canonical" implementation) and Teodora is currently implementing those in #1437 (so these functions are here as placeholders)

trishankatdatadog · 2021-06-10T02:42:19Z

tuf/ngclient/updater.py

+            strict_required_length=False,
+        )
+
+    def _load_local_metadata(self, rolename: str) -> bytes:


This feels familiar :)

trishankatdatadog · 2021-06-10T02:45:45Z

tuf/ngclient/_internal/metadata_bundle.py

+        self._bundle["root"] = new_root
+        logger.debug("Updated root")
+
+    def root_update_finished(self):


Where do you check for revoking timestamp/snapshot to recover from fast-forward attacks?

We don't:

because we are strict about the order of loading different metadata, we cannot ever have timestamp or snapshot loaded at this point

after this point, we cannot load local timestamp or snapshot if they are no longer signed by the correct (rotated) keys

So I believe there's just nothing to revoke. The spec does also say delete the trusted timestamp and snapshot metadata files. We don't do that either currently:

the request is a bit odd: the metadata files by definition are not trusted anymore: the keys have been rotated and the old timestamp and snapshot are no longer valid

There's just no way to load those files into the bundle anymore

We could add the file removal of course... and if we don't we should definitely document why. I've been reluctant to add file removing unless really needed because it feels like removing evidence just as something maybe going wrong

Something else we don't make clear is the difference to current implementation: we have no "current" and "previous" metadata: Currently we store a single copy of each roles metadata (new metadata gets written as soon as it's validated by the bundle). There's probably value in storing some versioned metadata (like every Root) but nothing in the spec seems to require storing "current" and "previous".

We could add the file removal of course... and if we don't we should definitely document why.

This is a good discussion, we need some action to come from this. Even if it's just an issue capturing this intent.

trishankatdatadog · 2021-06-10T02:48:58Z

tuf/ngclient/updater.py

+            self._bundle.update_delegated_targets(data, role, parent_role)
+            self._persist_metadata(role, data)
+
+    def _preorder_depth_first_walk(self, target_filepath) -> Dict:


Stack-based preorder DFS. Nice, but too long. Could we refactor for simplicity and to check for correctness?

#1398 : This is still an almost direct copy-paste from current client. We've been avoiding touching it so far since it's so well isolated from the code that we did touch and also because having type annotations will make following the code so much easier.

Draft PR to improve this code in #1463

jku · 2021-06-10T12:56:14Z

Thanks a lot for review

Could use more typing hints.

Agreed, issue #1409, I suggest we do this after merge

I feel the Metadata API could be more intuitive, but that's a different PR.

If you have something that you manage to write down in an issue, that would be great... I feel like there is room for improvement but personally the things I am able to find and express are fairly small details. If there are bigger ideas and/or complaints I think now is a great time to look at them: We don't have too much code built yet on top of it but do have some experience already.

A few functions are a bit too long. Please consider refactoring.

For the bundle: I agree the update_* methods are long but I don't want to split them. There are improvements we can make though (like #1433 removing the type check and making the hash/length checks tighter by implementing in Metadata API).

For the Updater, I think the longer functions are the ones we have not refactored yet, full ack here.

Please double-check against the spec and another implementation (e.g., my PoC) to check whether you missed anything.

👍

joshuagl · 2021-06-16T15:31:22Z

There's a lot of great comments here, much of which has been filed as issues to tackle after the code has landed. Are there any blockers to merging this in its current state? Obviously a rebase and an approving review are required.

trishankatdatadog · 2021-06-16T18:14:42Z

There's a lot of great comments here, much of which has been filed as issues to tackle after the code has landed. Are there any blockers to merging this in its current state? Obviously a rebase and an approving review are required.

Nope, provided all the tests for the old Updater continue to pass for the new Updater, and we make it clear it's not production-ready.

jku · 2021-06-16T18:45:24Z

Potential work left:

fix some smaller issues mentioned in review (Teodora has a branch which I just merged here)
The Big Naming Issue (MetadataBundle, TrustedMetadata, Workflow have been suggested -- all acceptable, none overwhelmingly popular)
finally, rebase on develop

The test issue is a bit tricky: the existing tests are not blackbox tests (in some places not even close) and we have changed API slightly (e.g. with mirrors config) so getting the old tests to run is not trivial (Teodora tried this so can comment with details if needed). I suggest we merge with the tests we have included, acknowledge that they are inadequate, and start improving them as a priority.

trishankatdatadog · 2021-06-16T18:48:23Z

The test issue is a bit tricky: the existing tests are not blackbox tests (in some places not even close) and we have changed API slightly (e.g. with mirrors config) so getting the old tests to run is not trivial (Teodora tried this so can comment with details if needed).

About mirrors: is it possible to ignore it in a backwards-compatible way? When only one mirror, just go with the one-repo workflow, but with multiple mirrors, then thrown an error with a message.

jku · 2021-06-16T18:58:43Z

I don't think I'm interested in trying to preserve compatibility like that:

It's more code that can break
the changes required from the client will be minimal (assuming they aren't actually using mirrors)
I think door should also be kept open for further tweaks to the API client refactor: consider API changes #1317

trishankatdatadog · 2021-06-16T19:47:25Z

I don't think I'm interested in trying to preserve compatibility like that:

It's more code that can break

the changes required from the client will be minimal (assuming they aren't actually using mirrors)

I think door should also be kept open for further tweaks to the API client refactor: consider API changes #1317

Fair, no need to keep the old API happy. We are preserving the old codebase for now anyway.

jku · 2021-06-21T12:42:36Z

The Big Naming Issue (MetadataBundle, TrustedMetadata, Workflow have been suggested -- all acceptable, none overwhelmingly popular)

Some more brainstorming produced the option TrustedMetadataSet:

it's getting a bit long but should still be acceptable as it isn't a class name that will be needed a lot
contains "Trusted" which improves on the current name
does not have the implication of being about a single Metadata (like TrustedMetadata does)

trishankatdatadog · 2021-06-21T15:46:08Z

Some more brainstorming produced the option TrustedMetadataSet:

But it still doesn't hint at any state-keeping about the client update workflow.

jku · 2021-06-22T08:12:31Z

This is true. I think I mentioned this earlier (but it's buried in the nested comments so repeating):
I think the core innovation, if there is any, in this architecture is that the important thing is ensuring that the collection of Metadata is trusted at all points in time (this is in contrast to e.g. current python-tuf that might have metadata in memory that is no longer signed by the delegating metadata, and must handle this situation somehow). Yes, the component does implement most of the steps in the detailed client workflow but the progress is driven by updater: updater decides what is done, the bundle just ensures the collection of metadata is trusted at every step.

Based on that I'd rather emphasize "trusted collection of metadata" than "client workflow", even though both are correct and descriptive.

jku · 2021-06-22T15:37:30Z

Two changes since last comment:

rebased on develop: I intentionally did not use any new Metadata functionality, this was just a rebase with minimal changes to enable the merge
Added a new commit that renames MetadataBundle to TrustedMetadataSet (and variable "bundle" to "trusted_set")

From my POV all open issues are now handled or filed as issues (I thought we had one for the tests but filed #1462 since I could not find one).

joshuagl

Apologies for the delay, but I finally gave this a complete review. Great work folks!

I started out by reading the existing discussion threads and checking to see whether all discussion items are resolved. I believe they are, apart from one where @trishankatdatadog noticed the code does not implement 5.3.11 which @jku indicated was intentional, but that it needs to be documented why it's not implemented per spec or implemented per spec.

I then did a review of the code, which looks great 👍 . I made only a few minor suggestions on strings/comments/docstrings.

Finally, I did a side-by-side comparison of the detailed client workflow to the new code, which wasn't an enormous task 🎉 – another indication of the solid design here. When doing so I noticed that:

When we compare versions for rollback attack checks, the spec says "version number of the [...] metadata file, if any, MUST be less than or equal to the version number of [...]", but the code checks <, not <=. I don't think this is a major issue, but it's something I was able to notice thanks to the quality of the new implementation. Perhaps worth a comment at least?
The current updater implementation does not handle consistent snapshots in the target file downloading (Updater.download_target()).
Step 5.3.11 is not implemented, as discussed above.

I'm OK with handling all of these as follow-ons, so long as we have issues (or PRs) created before merging this.

joshuagl · 2021-07-02T09:29:15Z

tuf/ngclient/updater.py

+from tuf.ngclient._internal import download, metadata_bundle, requests_fetcher
+from tuf.ngclient.fetcher import FetcherInterface
+
+# Globals


and there's a draft PR to improve things here at #1470

joshuagl · 2021-07-02T09:31:22Z

tuf/ngclient/fetcher.py

+    __metaclass__ = abc.ABCMeta
+
+    @abc.abstractmethod
+    def fetch(self, url, required_length):


Annotations added in draft PR #1448

tuf/ngclient/_internal/trusted_metadata_set.py

joshuagl · 2021-07-02T09:41:06Z

tuf/ngclient/_internal/trusted_metadata_set.py

+>>> # load local timestamp, then update from remote
+>>> try:
+>>>     with open(timestamp_path, "rb") as f:
+>>>         trusted_set.update_timestamp(f.read())
+>>> except (RepositoryError, OSError):
+>>>     pass # failure to load a local file is ok
+>>>


Suggested change

>>> # load local timestamp, then update from remote

>>> try:

>>> with open(timestamp_path, "rb") as f:

>>> trusted_set.update_timestamp(f.read())

>>> except (RepositoryError, OSError):

>>> pass # failure to load a local file is ok

>>>

loading a local copy of timestamp isn't required by the spec, should we simplify this example by removing that step?

the spec is basically silent about loading anything local (except the trusted root), as far as I can tell:

it still expects us to compare new files to "trusted metadata" for rollback protection. I think loading a local copy is the logical way to find "trusted metadata".

I think there's possibly even a case to be made for loading expired local metadata for purposes of rollback prevention but since the spec explicitly mentions in the root case that expired metadata is fine, and does not mention that in other cases, I didn't seriously consider that. Still this may be a discussion topic ...

I changed my mind on this while doing the review and should have deleted this comment.

it still expects us to compare new files to "trusted metadata" for rollback protection. I think loading a local copy is the logical way to find "trusted metadata".

Exactly this. The spec implies there's loading of snapshot and timestamp to check for rollback attacks and targets to check for freeze attacks. I made a note while reviewing this PR to do something about this for the spec.

You have a good point that expired metadata is still valid as a source of information for protection against rollback attacks.

tuf/ngclient/updater.py

joshuagl · 2021-07-02T11:59:04Z

tuf/ngclient/updater.py

+            self._bundle.update_delegated_targets(data, role, parent_role)
+            self._persist_metadata(role, data)
+
+    def _preorder_depth_first_walk(self, target_filepath) -> Dict:


Draft PR to improve this code in #1463

joshuagl · 2021-07-02T12:04:00Z

tuf/ngclient/updater.py

+
+def _visit_child_role(child_role: Dict, target_filepath: str) -> str:
+    """
+    <Purpose>


Old style docstring here.

This gets refactored in #1463

joshuagl · 2021-07-02T14:22:23Z

tuf/ngclient/updater.py

+        else:
+            target_base_url = _ensure_trailing_slash(target_base_url)
+
+        full_url = parse.urljoin(target_base_url, targetinfo["filepath"])


Per 5.7.3 of the detailed client workflow, if consistent snapshots are enabled for the repository then we should prefix the target's filename with the hash(es) when downloading.

oops! Filed #1479

jku · 2021-07-05T07:37:56Z

When we compare versions for rollback attack checks, the spec says "version number of the [...] metadata file, if any, MUST be less than or equal to the version number of [...]", but the code checks <, not <=. I don't think this is a major issue, but it's something I was able to notice thanks to the quality of the new implementation. Perhaps worth a comment at least?

I believe the checks are actually as specified: it's just that spec defines it as

if trusted_version <= new_version:
    # all good, continue

and we check for

if new_version < trusted_version:
    # NOT good, raise error

The current updater implementation does not handle consistent snapshots in the target file downloading (Updater.download_target()).

Good call , filed #1479

Step 5.3.11 is not implemented, as discussed above.

There was a comment explaining it in root_update_finished() (where 5.3.11 logically would happen): I tried to make it clearer.

Try to make errors derive from RepositoryError if they can be the result of malicious or malfunctioning remote repository: The idea here is that client can handle just RepositoryError instead of individual errors (as it cannot do anything about any of them). Also improve variable naming. This is backwards compatible. Signed-off-by: Jussi Kukkonen <[email protected]>

Start building the next-gen client: Copy existing components from the current client. All of these files have some changes compared to the already existing copies (because ngclient uses the same linting rules as Metadata API). * download.py is likely to see major changes in the future. * requests_fetcher is likely to see some minor changes (like allowing compression) Signed-off-by: Jussi Kukkonen <[email protected]>

* Use the same rules as tuf/api * omit ngclient from coverage limits for now: theupdateframework#1309 Signed-off-by: Jussi Kukkonen <[email protected]>

This is a new client library implementation using the Metadata API (the only "old" file used is exceptions.py) * Functional but largely still untested * Requires work before it can replace tuf.client but should be good enough to include in the repo The major changes compared to current client so far are: * Use of Metadata API * Major simplification in the way downloads are handled due to removing mirrors support * Separating tracking of valid metadata into a separate component * There's no MultiRepoUpdater We do not expect other major changes (in the sense of moving large amounts of code) but do plan to possibly improve the client API. The API has already changed so this is not going to be a 1:1 compatible implementation, but porting should not be difficult. Signed-off-by: Jussi Kukkonen <[email protected]>

This testing lacks coverage but demonstrates the happy cases. Signed-off-by: Jussi Kukkonen <[email protected]>

Improve the README text. Signed-off-by: Teodora Sechkova <[email protected]>

Remove unused and outdated imports in test_updater_ng.py Signed-off-by: Teodora Sechkova <[email protected]>

Remove outdated comments. Add explanations to non-obvious cases. Signed-off-by: Teodora Sechkova <[email protected]>

For consistency with the rest if the checks. Signed-off-by: Teodora Sechkova <[email protected]>

Document why deleting the timestamp and snapshot files is not needed to recover from a fast-forward attack. Signed-off-by: Teodora Sechkova <[email protected]>

TrustedMetadataSet is a long name but * it better describes the main feature * the name isn't used in too many places Change the variable names "bundle" -> "trusted_set" Signed-off-by: Jussi Kukkonen <[email protected]>

Signed-off-by: Jussi Kukkonen <[email protected]>

* updated_targets() both takes and returns a list * download_target() argument can come from either updated_targets() or get_one_valid_targetinfo() Signed-off-by: Jussi Kukkonen <[email protected]>

jku · 2021-07-05T08:03:01Z

changes since last push:

added three commits to improve docstrings, error messages and annotations based on review comments
rebased to fix conflicts in tuf/exceptions.py (so the commit exceptions: Make more errors RepositoryErrors had minor changes)

All discussions fixed here have been marked resolved

tuf/ngclient/updater.py

joshuagl · 2021-07-05T08:55:31Z

When we compare versions for rollback attack checks, the spec says "version number of the [...] metadata file, if any, MUST be less than or equal to the version number of [...]", but the code checks <, not <=. I don't think this is a major issue, but it's something I was able to notice thanks to the quality of the new implementation. Perhaps worth a comment at least?

I believe the checks are actually as specified: it's just that spec defines it as
if trusted_version <= new_version:
    # all good, continue 
and we check for
if new_version < trusted_version:
    # NOT good, raise error

🤦 whoops, this LGTM.

The current updater implementation does not handle consistent snapshots in the target file downloading (Updater.download_target()).

Good call , filed #1479

Thanks.

Step 5.3.11 is not implemented, as discussed above.

There was a comment explaining it in root_update_finished() (where 5.3.11 logically would happen): I tried to make it clearer.

Updated comment looks good, thanks for doing that. Given the discussion about trusting expired metadata for rollback attack prevention, we may just want to go ahead and delete the old files anyway?

jku · 2021-07-05T09:08:47Z

Updated comment looks good, thanks for doing that. Given the discussion about trusting expired metadata for rollback attack prevention, we may just want to go ahead and delete the old files anyway?

The spec 5.3.11 talks about deleting "trusted metadata" -- what it really means is metadata that is not signed by current root: We'll never load metadata like that. What we might want to use for rollback prevention is correctly signed, but expired metadata

I just don't see the point in deleting the files -- our local files may get mangled by random bit flips, malware, users, cats sitting on keyboards: we just need to accept that they may not be valid.

Signed-off-by: Jussi Kukkonen <[email protected]>

joshuagl · 2021-07-05T09:34:24Z

Updated comment looks good, thanks for doing that. Given the discussion about trusting expired metadata for rollback attack prevention, we may just want to go ahead and delete the old files anyway?

The spec 5.3.11 talks about deleting "trusted metadata" -- what it really means is metadata that is not signed by current root: We'll never load metadata like that. What we might want to use for rollback prevention is correctly signed, but expired metadata

I believe you are right, but it took me a bit of time to reason it through. 🤔
This further points to the need to specify how trusted metadata is loaded, and in what scenarios it remains trusted (as in theupdateframework/taps#128).

jku marked this pull request as ready for review May 21, 2021 11:54

trishankatdatadog self-requested a review May 26, 2021 02:58

MVrachev reviewed Jun 2, 2021

View reviewed changes

MVrachev suggested changes Jun 2, 2021

View reviewed changes

MVrachev mentioned this pull request Jun 7, 2021

Metadata Attribute research: KEY #1438

Closed

jku commented Jun 8, 2021

View reviewed changes

trishankatdatadog reviewed Jun 10, 2021

View reviewed changes

sechkova mentioned this pull request Jun 15, 2021

Review download code #1448

Merged

3 tasks

jku force-pushed the merge-ngclient branch from 74fd891 to ce14e93 Compare June 22, 2021 15:17

sechkova mentioned this pull request Jun 23, 2021

ngclient: review preorder dfs code #1463

Merged

3 tasks

jku mentioned this pull request Jun 30, 2021

ngclient: type annotations review and mypy integration #1409

Closed

MVrachev mentioned this pull request Jul 1, 2021

ng client improve TrustedMetadataSet testing #1477

Merged

3 tasks

jku mentioned this pull request Jul 2, 2021

Refactor idea: type hinting #1476

Closed

joshuagl approved these changes Jul 2, 2021

View reviewed changes

Jussi Kukkonen and others added 13 commits July 5, 2021 10:45

tox: Enable testing and linting ngclient

775c5c6

* Use the same rules as tuf/api * omit ngclient from coverage limits for now: theupdateframework#1309 Signed-off-by: Jussi Kukkonen <[email protected]>

ngclient: Add initial testing

fac151d

This testing lacks coverage but demonstrates the happy cases. Signed-off-by: Jussi Kukkonen <[email protected]>

Improve README

a3ebc7c

Improve the README text. Signed-off-by: Teodora Sechkova <[email protected]>

Remove unsued imports in tests

78a0f4d

Remove unused and outdated imports in test_updater_ng.py Signed-off-by: Teodora Sechkova <[email protected]>

Improve comments

10b28f5

Remove outdated comments. Add explanations to non-obvious cases. Signed-off-by: Teodora Sechkova <[email protected]>

Check snapshot value for None

0f69f28

For consistency with the rest if the checks. Signed-off-by: Teodora Sechkova <[email protected]>

Document fast-forward attack recovery

a31f2ec

Document why deleting the timestamp and snapshot files is not needed to recover from a fast-forward attack. Signed-off-by: Teodora Sechkova <[email protected]>

Rename MetadataBundle to TrustedMetadataSet

4811e88

TrustedMetadataSet is a long name but * it better describes the main feature * the name isn't used in too many places Change the variable names "bundle" -> "trusted_set" Signed-off-by: Jussi Kukkonen <[email protected]>

ngclient TrustedMetadataSet: Improve docstrings

82dcb50

Signed-off-by: Jussi Kukkonen <[email protected]>

ngclient Updater: Fix annotations/docstrings

1b404f3

* updated_targets() both takes and returns a list * download_target() argument can come from either updated_targets() or get_one_valid_targetinfo() Signed-off-by: Jussi Kukkonen <[email protected]>

jku force-pushed the merge-ngclient branch from ce14e93 to 1b404f3 Compare July 5, 2021 07:47

jku force-pushed the merge-ngclient branch from d4c310c to e8454b4 Compare July 5, 2021 08:03

joshuagl reviewed Jul 5, 2021

View reviewed changes

tuf/ngclient/updater.py Outdated Show resolved Hide resolved

ngclient: Improve dosctrings and error messages

ffff7f5

Signed-off-by: Jussi Kukkonen <[email protected]>

jku force-pushed the merge-ngclient branch from e8454b4 to ffff7f5 Compare July 5, 2021 09:12

jku merged commit 745a8f7 into theupdateframework:develop Jul 5, 2021

MVrachev mentioned this pull request Aug 27, 2021

client refactor: consider API changes #1317

Closed

	All the targets that have changed are returns in a list. From this
	All the targets that have changed are returned in a list. From this


		"""TUF client bundle-of-metadata

		MetadataBundle keeps track of current valid set of metadata for the client,

	implementers can optionally use to integrate with their own
	implementers can implement a concrete class of in order to reuse their own

	network/download infrastructure -- a Requests-based implementation is
	networking/download libraries -- a Requests-based implementation is

	* At the same time aims to be the library choice for anyone
	* At the same time aims to be the library of choice for anyone

	* Is still under development but planned to become the default client
	* Is still under development but is planned to become the default client

		return None


		def _check_file_length(file_object, trusted_file_length):

Merge ngclient: a new client library implementation #1408

Merge ngclient: a new client library implementation #1408

Conversation

jku commented May 21, 2021

sechkova commented May 21, 2021 • edited Loading

jku commented May 21, 2021 • edited Loading

trishankatdatadog commented May 26, 2021

joshuagl commented May 27, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jku Jun 8, 2021 • edited Loading

Choose a reason for hiding this comment

MVrachev Jun 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MVrachev left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MVrachev Jun 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jku Jun 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jku Jun 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jku Jun 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jku Jun 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trishankatdatadog left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jku Jun 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jku Jun 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jku Jun 10, 2021 • edited Loading

Choose a reason for hiding this comment

joshuagl Jul 2, 2021 • edited Loading

Choose a reason for hiding this comment

sechkova commented May 21, 2021 •

edited

Loading

jku commented May 21, 2021 •

edited

Loading

jku Jun 8, 2021 •

edited

Loading

MVrachev Jun 9, 2021 •

edited

Loading

MVrachev left a comment •

edited

Loading

MVrachev Jun 2, 2021 •

edited

Loading

jku Jun 8, 2021 •

edited

Loading

jku Jun 8, 2021 •

edited

Loading

jku Jun 2, 2021 •

edited

Loading

jku Jun 8, 2021 •

edited

Loading

jku Jun 10, 2021 •

edited

Loading

jku Jun 10, 2021 •

edited

Loading

jku Jun 10, 2021 •

edited

Loading

joshuagl Jul 2, 2021 •

edited

Loading

jku commented Jun 10, 2021 •

edited

Loading

trishankatdatadog commented Jun 16, 2021 •

edited

Loading

jku commented Jun 16, 2021 •

edited

Loading

jku commented Jun 22, 2021 •

edited

Loading