Update targets delegations in generate_targets_metadata #1074

sechkova · 2020-07-09T10:05:50Z

Fixes issue #: #1037

Description of the changes being introduced by the pull request:

Collect keys and threshold of delegated roles and update delegations in generate_targets_metadata() in a similar manner as generate_root_metadata() does for top-level roles.
Move key dictionary generation from generate_root_metadata() to a helper function _add_keys_to_keydict()
Update tests

Fixing this issue was partially blocked by #574: after loading the repository the delegated roles' 'keyids' and 'threshold' are lost which leads to throwing "missing key" exceptions while trying to update the delegations.

I suggest a short-term solution of #574 by using the recently added logic of hierarchically loading the delegations in load_reposiroty(). This makes it possible to get the delegated roles keyids and threshold from its delegating role metadata.
Clearly this won't work if more than one delegation to the same role exists but ... this is not supported correctly anyway in the current implementation so lets cover the simplest use case here?

Please verify and check that the pull request fulfills the following
requirements:

The code follows the Code Style Guidelines
Tests have been added for the bug fix or new feature
Docs have been added for the bug fix or new feature

joshuagl

Thanks for the PR @sechkova! Your approach seems sound, but I have a couple of questions in-line about the implementation.

joshuagl · 2020-07-09T12:28:55Z

tuf/repository_lib.py

+        storage_backend, repository_name)
+
+    # Update roledb with the latest delegations info
+    if delegations_update:


I assume the reason we're having generate_targets_metadata() return a boolean value and testing it here is because update_roleinfo() is non-performant due to its use of deepcopy()? What is the impact if we just update_roleinfo() without this check?

Yes, the idea is to avoid calling update_roleinfo() for each delegation on every write, especially when their number can grow high if delegated hashed bins are used.

Anyway:

when tested with the scrip from Fix performance and metadata representation issues in roledb #1005 (comment) the overhead when updating all delegations and not using the delegations_update boolean seems to be not more than ~1-2secs (tested in a regular VM on my laptop)

assuming that delegated hashed bins all use the same key, key revocation would mean updating all delegations

So if you are hinting towards simplifying the code at the cost of some negligible efficiency, you may have a point 🙂

In general, the overall overhead of the code in this PR seems to be in the range of 1-2 seconds (too small to actually calculate an exact delta without a dedicated test setup)

Thanks for crunching the numbers. I'd absolutely be in favour of the code simplification.

joshuagl · 2020-07-10T10:32:07Z

tuf/repository_lib.py

+    # Do we have a duplicate?
+    if keyids.count(keyid) > 1:
+      raise securesystemslib.exceptions.Error('Same keyid listed twice:'
+          ' ' + keyid)


This feels odd. We do not handle this exception anywhere, is this likely to happen in practice?
Furthermore, this is all data we manage. If duplicates really are an issue, should we just prune them? What are we expecting the 'user' to do when they see this exception?

There is a test case about it 😁
In practice add_verification_key has a check to avoid duplicates.
For the rest of your questions, I don't have a strong opinion but I agree this does not seem to be the right place of the check. Unfortunately, I cannot claim that I can track all possible data entry points for roledb ... while this is the only "exit" point. Maybe this is the historical reason for it.
I would gladly remove the check and its test but I don't have a great insight on what to put instead. Nothing?

joshuagl · 2020-07-10T10:46:42Z

tuf/repository_lib.py

+    for role in delegations['roles']:
+      rolename = role['name']
+      role_keyids = tuf.roledb.get_role_keyids(rolename, repository_name)
+      role_threshold= tuf.roledb.get_role_threshold(rolename, repository_name)


Suggested change

role_threshold= tuf.roledb.get_role_threshold(rolename, repository_name)

role_threshold = tuf.roledb.get_role_threshold(rolename, repository_name)

joshuagl · 2020-07-10T10:52:45Z

tuf/repository_lib.py

+        role['threshold'] = role_threshold
+        delegation_update = True
+
+      _add_keys_to_keydict(keydict, role_keyids, repository_name)


Should this be part of the if conditional?

No, it has to be outside the if. The key dictionary has to contain delegations keys of all roles, updated or not.

joshuagl · 2020-07-10T10:54:05Z

tuf/repository_lib.py

+    if delegation_update:
+      delegations['keys'] = keydict


Is there any harm in always assigning keydict to delegations['keys']? So far as I can tell delegations['keys'] is only used when delegation_update == True?

No, no obvious harm ...

jku · 2020-07-10T16:14:34Z

tuf/repository_lib.py

+    if keyid not in keydict:
+
+      # This appears to be a new keyid.  Generate the key for it.
+      if key['keytype'] in ['rsa', 'ed25519', 'ecdsa-sha2-nistp256']:


I wonder if there's a reason for hard coding the key types here: we could just let format_keyval_to_metadata() throw FormatError (or handle it and throw something better) ?

Thanks for pointing it :)
After a quick search, format_keyval_to_metadata() is used at a couple of places in tuf the way you described it (just letting it throw FormatError) so I agree applying the same here.

It seems also worth trying to merge the functionality of repository_tool._keys_to_keydict() and repository_lib._add_keys_to_keydict() which seem suspiciously similar 👀

sechkova · 2020-08-05T09:55:35Z

Some time has passed since the last updates on this pull request so I had to rebase and force push. However I think the reformatted set of changes is addressing all the comments up to now.

joshuagl · 2020-08-07T16:02:08Z

Some time has passed since the last updates on this pull request so I had to rebase and force push. However I think the reformatted set of changes is addressing all the comments up to now.

Thank you for the updates. They look good on a first pass, I'm aiming to do a more thorough review next week.

joshuagl

Thank you. I have a few minor comments, mostly on docstring, but otherwise this is looking good.

joshuagl · 2020-08-10T09:27:37Z

tuf/repository_lib.py

+  """
+  Iterate over a list of keys and return a list of keyids and a dict mapping
+  keyid to key metadata
+  """


As we have promoted this to a public method (implicitly, by removing the leading _) we should improve the docstring so that it's easier for callers to make use of. Docstrings for public methods should include <Arguments> and <Returns> sections, at least. The Arguments section for this method should help a reader understand the types and any constraints of the objects in the list.

Done in 1acd07c

joshuagl · 2020-08-10T09:31:49Z

tuf/repository_lib.py

+
+    delegation_update:
+      Boolean set to True if delegations are updated.


I think this value is no longer returned?

Oops ...
Removed in a18beb2

joshuagl · 2020-08-10T09:35:55Z

tuf/repository_lib.py

-  if delegations is not None:
+  if delegations:


Can you help me understand this change?

if delegations is not None ensures we only enter this loop if the delegations argument is changed from its default value of None. As I understand it this is the standard pattern for checking whether an argument with a default value of None has been changed.

My bad ... I didn't think it through ... 🤦
Reverted in a18beb2

joshuagl · 2020-08-10T09:57:55Z

tests/test_repository_lib.py

+    delegations = targets_signable['signed']['delegations']
+
+    roleinfo = {}
+    roleinfo['name'] = 'role1'


Nit: it's probably better read the role name from the loaded data in delegations, as we are for keyids and threshold, in case the generated data changes.

Thanks, done in f1a0fb1

sechkova · 2020-08-17T10:19:43Z

Someone with enough permissions to restart travis-ci? Seems like another timeout 👀

jku · 2020-08-17T11:12:38Z

Bad news is #1096 was merged and this now has conflicts. Good news is the proxy tests should be more reliable once you rebase :)

joshuagl

Thanks for the updates @sechkova, this looks much nicer.

I have made a few minor style suggestions and this PR needs rebasing on the most recent changes in develop, but otherwise this looks good to me.

joshuagl · 2020-08-18T13:22:47Z

tuf/repository_lib.py

+      role['keyids'] = \
+          tuf.roledb.get_role_keyids(role['name'], repository_name)
+      role['threshold']= \
+          tuf.roledb.get_role_threshold(role['name'], repository_name)


The coding style suggests:

The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. If necessary, you can add an extra pair of parentheses around an expression, but sometimes using a backslash looks better.

Which suits my personal preference too.

Suggested change

role['keyids'] = \

tuf.roledb.get_role_keyids(role['name'], repository_name)

role['threshold']= \

tuf.roledb.get_role_threshold(role['name'], repository_name)

role['keyids'] = tuf.roledb.get_role_keyids(role['name'],

repository_name)

role['threshold']= tuf.roledb.get_role_threshold(role['name'],

repository_name)

joshuagl · 2020-08-18T13:23:31Z

tuf/repository_lib.py

 if __name__ == '__main__':
  # The interactive sessions of the documentation strings can
-  # be tested by running repository_tool.py as a standalone module:
+  # be tested by running repository_lib.py as a standalone module:


joshuagl · 2020-08-18T13:26:07Z

tests/test_repository_lib.py

+    targets_metadata  = \
+      repo_lib.generate_targets_metadata(targets_directory, target_files,
+                                         version, expiration_date, delegations,
+                                         False)


I realise you've just copied other examples in this file, but it's more inline with the coding style to do something like:

Suggested change

targets_metadata = \

repo_lib.generate_targets_metadata(targets_directory, target_files,

version, expiration_date, delegations,

False)

targets_metadata = repo_lib.generate_targets_metadata(targets_directory,

target_files, version, expiration_date, delegations, False)

sechkova · 2020-08-19T10:43:38Z

Thanks @joshuagl, applied your code style suggestions and rebased. Let's see the tests :)

Edit: Two of the tests fail due to reasons unknown to me ... would need another look ... seems related to secure-systems-lab/securesystemslib#266

jku · 2020-08-21T14:52:10Z

lint failure is #1115

Use _keys_to_keydict() for the key dictionary generation in generate_root_metadata(). Rename it as a public function keys_to_keydict(). Signed-off-by: Teodora Sechkova <[email protected]>

Collect keys and threshold of delegated roles and update delegations in generate_targets_metadata in a similar manner as generate_root_metadata() does for top-level roles. Signed-off-by: Teodora Sechkova <[email protected]>

Use the delegation graph traversal during load_repository() to load delegated roles' 'keyids' and 'threshold' by reading it from the delegating role metadata. If more than one delegation to the same role exists, only the first one is loaded in roledb for this role. Signed-off-by: Teodora Sechkova <[email protected]>

Tests logic is modified to accommodate for the update of the delegations during generate_targets_metadata(). Signed-off-by: Teodora Sechkova <[email protected]>

Add a new test case in test_generate_targets_metadata to check if targets metadata is up-to-date with its delegated roles. Signed-off-by: Teodora Sechkova <[email protected]>

Signed-off-by: Teodora Sechkova <[email protected]>

sechkova · 2020-08-26T11:01:31Z

Rebased on develop with the linter failure fix.

joshuagl · 2020-08-26T13:58:24Z

tuf/repository_lib.py

-    for keyid in tuf.roledb.get_role_keyids(rolename, repository_name):
+    # Collect keys from all roles in a list
+    keyids = tuf.roledb.get_role_keyids(rolename, repository_name)
+    for keyid in keyids:


Nit: this change to assigning the return value of roledb.get_role_keyids seems superfluous, is keyids used anywhere other than the iterator here? Why not just keep for keyid in tuf.roledb....?

Yes, it is used in https://github.com/theupdateframework/tuf/pull/1074/files#diff-9f585d5478a17bd9e0f6cd5f7484ab60L1284

It replaces the keyids list that was used to avoid duplicates which we already decided not necessary somewhere in the discussion above.

Oh, yes I see. Thanks!

joshuagl · 2020-08-26T14:55:07Z

Good work, thanks!

joshuagl reviewed Jul 10, 2020

View reviewed changes

jku reviewed Jul 10, 2020

View reviewed changes

sechkova force-pushed the delegations-update branch 2 times, most recently from 50817f5 to 7c474bb Compare August 5, 2020 09:52

joshuagl requested changes Aug 10, 2020

View reviewed changes

sechkova force-pushed the delegations-update branch from 7c474bb to 7250f6d Compare August 11, 2020 08:04

joshuagl approved these changes Aug 18, 2020

View reviewed changes

sechkova force-pushed the delegations-update branch from 7250f6d to fb2c01e Compare August 19, 2020 10:41

sechkova added 6 commits August 26, 2020 13:53

Move _keys_to_keydict() to repository_lib

8f396a7

Use _keys_to_keydict() for the key dictionary generation in generate_root_metadata(). Rename it as a public function keys_to_keydict(). Signed-off-by: Teodora Sechkova <[email protected]>

Update delegations in generate_targets_metadata()

f7c9fcb

Collect keys and threshold of delegated roles and update delegations in generate_targets_metadata in a similar manner as generate_root_metadata() does for top-level roles. Signed-off-by: Teodora Sechkova <[email protected]>

Update failing tests for generate_targets_metadata

05a7008

Tests logic is modified to accommodate for the update of the delegations during generate_targets_metadata(). Signed-off-by: Teodora Sechkova <[email protected]>

Add test for delegations update

b3b0c04

Add a new test case in test_generate_targets_metadata to check if targets metadata is up-to-date with its delegated roles. Signed-off-by: Teodora Sechkova <[email protected]>

Fix typo in comment section

b6307dd

Signed-off-by: Teodora Sechkova <[email protected]>

sechkova force-pushed the delegations-update branch from fb2c01e to b6307dd Compare August 26, 2020 10:58

joshuagl approved these changes Aug 26, 2020

View reviewed changes

joshuagl merged commit 7b7a14a into theupdateframework:develop Aug 26, 2020

sechkova deleted the delegations-update branch August 26, 2020 14:59

joshuagl mentioned this pull request Sep 8, 2020

Targets delegations are not updated when adding a new delegation key #1037

Closed

	role_threshold= tuf.roledb.get_role_threshold(rolename, repository_name)
	role_threshold = tuf.roledb.get_role_threshold(rolename, repository_name)


		delegation_update:
		Boolean set to True if delegations are updated.

Update targets delegations in generate_targets_metadata #1074

Update targets delegations in generate_targets_metadata #1074

Conversation

sechkova commented Jul 9, 2020

joshuagl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jku Jul 10, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sechkova commented Aug 5, 2020

joshuagl commented Aug 7, 2020

joshuagl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sechkova Aug 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sechkova Aug 11, 2020 • edited Loading

Choose a reason for hiding this comment

sechkova commented Aug 17, 2020

jku commented Aug 17, 2020

joshuagl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sechkova commented Aug 19, 2020 • edited Loading

jku commented Aug 21, 2020

sechkova commented Aug 26, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joshuagl commented Aug 26, 2020

jku Jul 10, 2020 •

edited

Loading

sechkova Aug 11, 2020 •

edited

Loading

sechkova Aug 11, 2020 •

edited

Loading

sechkova commented Aug 19, 2020 •

edited

Loading