-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update targets delegations in generate_targets_metadata #1074
Update targets delegations in generate_targets_metadata #1074
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @sechkova! Your approach seems sound, but I have a couple of questions in-line about the implementation.
tuf/repository_lib.py
Outdated
storage_backend, repository_name) | ||
|
||
# Update roledb with the latest delegations info | ||
if delegations_update: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the reason we're having generate_targets_metadata()
return a boolean value and testing it here is because update_roleinfo()
is non-performant due to its use of deepcopy()
? What is the impact if we just update_roleinfo()
without this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the idea is to avoid calling update_roleinfo()
for each delegation on every write
, especially when their number can grow high if delegated hashed bins are used.
Anyway:
- when tested with the scrip from Fix performance and metadata representation issues in roledb #1005 (comment) the overhead when updating all delegations and not using the
delegations_update
boolean seems to be not more than ~1-2secs (tested in a regular VM on my laptop) - assuming that delegated hashed bins all use the same key, key revocation would mean updating all delegations
So if you are hinting towards simplifying the code at the cost of some negligible efficiency, you may have a point 🙂
In general, the overall overhead of the code in this PR seems to be in the range of 1-2 seconds (too small to actually calculate an exact delta without a dedicated test setup)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for crunching the numbers. I'd absolutely be in favour of the code simplification.
tuf/repository_lib.py
Outdated
# Do we have a duplicate? | ||
if keyids.count(keyid) > 1: | ||
raise securesystemslib.exceptions.Error('Same keyid listed twice:' | ||
' ' + keyid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels odd. We do not handle this exception anywhere, is this likely to happen in practice?
Furthermore, this is all data we manage. If duplicates really are an issue, should we just prune them? What are we expecting the 'user' to do when they see this exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a test case about it 😁
In practice add_verification_key
has a check to avoid duplicates.
For the rest of your questions, I don't have a strong opinion but I agree this does not seem to be the right place of the check. Unfortunately, I cannot claim that I can track all possible data entry points for roledb ... while this is the only "exit" point. Maybe this is the historical reason for it.
I would gladly remove the check and its test but I don't have a great insight on what to put instead. Nothing?
tuf/repository_lib.py
Outdated
for role in delegations['roles']: | ||
rolename = role['name'] | ||
role_keyids = tuf.roledb.get_role_keyids(rolename, repository_name) | ||
role_threshold= tuf.roledb.get_role_threshold(rolename, repository_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
role_threshold= tuf.roledb.get_role_threshold(rolename, repository_name) | |
role_threshold = tuf.roledb.get_role_threshold(rolename, repository_name) |
tuf/repository_lib.py
Outdated
role['threshold'] = role_threshold | ||
delegation_update = True | ||
|
||
_add_keys_to_keydict(keydict, role_keyids, repository_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be part of the if
conditional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it has to be outside the if
. The key dictionary has to contain delegations keys of all roles, updated or not.
tuf/repository_lib.py
Outdated
if delegation_update: | ||
delegations['keys'] = keydict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any harm in always assigning keydict to delegations['keys']
? So far as I can tell delegations['keys']
is only used when delegation_update == True
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, no obvious harm ...
tuf/repository_lib.py
Outdated
if keyid not in keydict: | ||
|
||
# This appears to be a new keyid. Generate the key for it. | ||
if key['keytype'] in ['rsa', 'ed25519', 'ecdsa-sha2-nistp256']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there's a reason for hard coding the key types here: we could just let format_keyval_to_metadata() throw FormatError (or handle it and throw something better) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it :)
After a quick search, format_keyval_to_metadata()
is used at a couple of places in tuf
the way you described it (just letting it throw FormatError) so I agree applying the same here.
It seems also worth trying to merge the functionality of repository_tool._keys_to_keydict()
and repository_lib._add_keys_to_keydict()
which seem suspiciously similar 👀
50817f5
to
7c474bb
Compare
Some time has passed since the last updates on this pull request so I had to rebase and force push. However I think the reformatted set of changes is addressing all the comments up to now. |
Thank you for the updates. They look good on a first pass, I'm aiming to do a more thorough review next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I have a few minor comments, mostly on docstring, but otherwise this is looking good.
tuf/repository_lib.py
Outdated
""" | ||
Iterate over a list of keys and return a list of keyids and a dict mapping | ||
keyid to key metadata | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we have promoted this to a public method (implicitly, by removing the leading _
) we should improve the docstring so that it's easier for callers to make use of. Docstrings for public methods should include <Arguments>
and <Returns>
sections, at least. The Arguments
section for this method should help a reader understand the types and any constraints of the objects in the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 1acd07c
tuf/repository_lib.py
Outdated
|
||
delegation_update: | ||
Boolean set to True if delegations are updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this value is no longer returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops ...
Removed in a18beb2
tuf/repository_lib.py
Outdated
if delegations is not None: | ||
if delegations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand this change?
if delegations is not None
ensures we only enter this loop if the delegations argument is changed from its default value of None
. As I understand it this is the standard pattern for checking whether an argument with a default value of None
has been changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad ... I didn't think it through ... 🤦
Reverted in a18beb2
tests/test_repository_lib.py
Outdated
delegations = targets_signable['signed']['delegations'] | ||
|
||
roleinfo = {} | ||
roleinfo['name'] = 'role1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: it's probably better read the role name from the loaded data in delegations
, as we are for keyids and threshold, in case the generated data changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done in f1a0fb1
7c474bb
to
7250f6d
Compare
Someone with enough permissions to restart travis-ci? Seems like another timeout 👀 |
Bad news is #1096 was merged and this now has conflicts. Good news is the proxy tests should be more reliable once you rebase :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @sechkova, this looks much nicer.
I have made a few minor style suggestions and this PR needs rebasing on the most recent changes in develop, but otherwise this looks good to me.
tuf/repository_lib.py
Outdated
role['keyids'] = \ | ||
tuf.roledb.get_role_keyids(role['name'], repository_name) | ||
role['threshold']= \ | ||
tuf.roledb.get_role_threshold(role['name'], repository_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The coding style suggests:
The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. If necessary, you can add an extra pair of parentheses around an expression, but sometimes using a backslash looks better.
Which suits my personal preference too.
role['keyids'] = \ | |
tuf.roledb.get_role_keyids(role['name'], repository_name) | |
role['threshold']= \ | |
tuf.roledb.get_role_threshold(role['name'], repository_name) | |
role['keyids'] = tuf.roledb.get_role_keyids(role['name'], | |
repository_name) | |
role['threshold']= tuf.roledb.get_role_threshold(role['name'], | |
repository_name) |
if __name__ == '__main__': | ||
# The interactive sessions of the documentation strings can | ||
# be tested by running repository_tool.py as a standalone module: | ||
# be tested by running repository_lib.py as a standalone module: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
tests/test_repository_lib.py
Outdated
targets_metadata = \ | ||
repo_lib.generate_targets_metadata(targets_directory, target_files, | ||
version, expiration_date, delegations, | ||
False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realise you've just copied other examples in this file, but it's more inline with the coding style to do something like:
targets_metadata = \ | |
repo_lib.generate_targets_metadata(targets_directory, target_files, | |
version, expiration_date, delegations, | |
False) | |
targets_metadata = repo_lib.generate_targets_metadata(targets_directory, | |
target_files, version, expiration_date, delegations, False) |
7250f6d
to
fb2c01e
Compare
Thanks @joshuagl, applied your code style suggestions and rebased. Let's see the tests :) Edit: Two of the tests fail due to reasons unknown to me ... would need another look ... seems related to secure-systems-lab/securesystemslib#266 |
lint failure is #1115 |
Use _keys_to_keydict() for the key dictionary generation in generate_root_metadata(). Rename it as a public function keys_to_keydict(). Signed-off-by: Teodora Sechkova <[email protected]>
Collect keys and threshold of delegated roles and update delegations in generate_targets_metadata in a similar manner as generate_root_metadata() does for top-level roles. Signed-off-by: Teodora Sechkova <[email protected]>
Use the delegation graph traversal during load_repository() to load delegated roles' 'keyids' and 'threshold' by reading it from the delegating role metadata. If more than one delegation to the same role exists, only the first one is loaded in roledb for this role. Signed-off-by: Teodora Sechkova <[email protected]>
Tests logic is modified to accommodate for the update of the delegations during generate_targets_metadata(). Signed-off-by: Teodora Sechkova <[email protected]>
Add a new test case in test_generate_targets_metadata to check if targets metadata is up-to-date with its delegated roles. Signed-off-by: Teodora Sechkova <[email protected]>
Signed-off-by: Teodora Sechkova <[email protected]>
fb2c01e
to
b6307dd
Compare
Rebased on develop with the linter failure fix. |
for keyid in tuf.roledb.get_role_keyids(rolename, repository_name): | ||
# Collect keys from all roles in a list | ||
keyids = tuf.roledb.get_role_keyids(rolename, repository_name) | ||
for keyid in keyids: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this change to assigning the return value of roledb.get_role_keyids
seems superfluous, is keyids
used anywhere other than the iterator here? Why not just keep for keyid in tuf.roledb....
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is used in https://github.com/theupdateframework/tuf/pull/1074/files#diff-9f585d5478a17bd9e0f6cd5f7484ab60L1284
It replaces the keyids list that was used to avoid duplicates which we already decided not necessary somewhere in the discussion above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes I see. Thanks!
Good work, thanks! |
Fixes issue #: #1037
Description of the changes being introduced by the pull request:
generate_targets_metadata()
in a similar manner asgenerate_root_metadata()
does for top-level roles.generate_root_metadata()
to a helper function_add_keys_to_keydict()
Fixing this issue was partially blocked by #574: after loading the repository the delegated roles' 'keyids' and 'threshold' are lost which leads to throwing "missing key" exceptions while trying to update the delegations.
I suggest a short-term solution of #574 by using the recently added logic of hierarchically loading the delegations in
load_reposiroty()
. This makes it possible to get the delegated roleskeyids
andthreshold
from its delegating role metadata.Clearly this won't work if more than one delegation to the same role exists but ... this is not supported correctly anyway in the current implementation so lets cover the simplest use case here?
Please verify and check that the pull request fulfills the following
requirements: