Skip to content

Commit

Permalink
Improve performance of delegate_hashed_bins
Browse files Browse the repository at this point in the history
Due to the performance overhead of deepcopy(), as used extensively in
roledb, the delegate function is rather slow. This is especially
noticeable when we have a large number_of_bins when calling
delegate_hashed_bins.

In order to be able to easily reduce the number of deepcopy() operations
we remove direct calls to delegate() and instead use the newly added
helper functions to replicate the behaviour, only with a single call
update to the roledb.

This improves the performance of a 16k bins delegation from a 1hr 24min
operation on my laptop to 33s.

Ideally once Issue #1005 has been properly fixed this commit can be
reverted and we can once again just call delegate() here.

Signed-off-by: Joshua Lock <[email protected]>
  • Loading branch information
joshuagl committed Apr 2, 2020
1 parent 5ed1b5b commit 82a3afd
Showing 1 changed file with 47 additions and 3 deletions.
50 changes: 47 additions & 3 deletions tuf/repository_tool.py
Original file line number Diff line number Diff line change
Expand Up @@ -2546,14 +2546,58 @@ def delegate_hashed_bins(self, list_of_targets, keys_of_hashed_bins,
hash_prefix = _get_hash(target_path.replace('\\', '/').lstrip('/'))[:prefix_length]
ordered_roles[int(hash_prefix, 16) // bin_size]["target_paths"].append(target_path)

keyids, keydict = _keys_to_keydict(keys_of_hashed_bins)

# A queue of roleinfo's that need to be updated in the roledb
delegated_roleinfos = []

for bin_rolename in ordered_roles:
# TODO: originally we just called self.delegate() for each item in this
# iteration. However, this is *extremely* slow when creating a large
# number of hashed bins, i.e. 16k as is recommended for PyPI usage in
# PEP 458: https://www.python.org/dev/peps/pep-0458/
# The source of the slowness is the interactions with the roledb, which
# causes several deep copies of roleinfo dictionaries:
# https://github.com/theupdateframework/tuf/issues/1005
# Once the underlying issues in #1005 are resolved, i.e. some combination
# of the intermediate and long-term fixes, we may simplify here by
# switching back to just calling self.delegate(), but until that time we
# queue roledb interactions and perform all updates to the roledb in one
# operation at the end of the iteration.

relative_paths = {}
targets_directory_length = len(self._targets_directory)
for path in bin_rolename['target_paths']:
relative_paths.update({path[targets_directory_length:]: {}})

# Delegate from the "unclaimed" targets role to each 'bin_rolename'
self.delegate(bin_rolename['name'], keys_of_hashed_bins, [],
list_of_targets=bin_rolename['target_paths'],
path_hash_prefixes=bin_rolename['target_hash_prefixes'])
target = self._create_delegated_target(bin_rolename['name'], keyids,
paths=relative_paths)

roleinfo = {'name': bin_rolename['name'],
'keyids': keyids,
'threshold': 1,
'terminating': False,
'path_hash_prefixes': bin_rolename['target_hash_prefixes']}
delegated_roleinfos.append(roleinfo)

for key in keys_of_hashed_bins:
target.add_verification_key(key)

# Add the new delegation to the top-level 'targets' role object (i.e.,
# 'repository.targets()').
if self.rolename != 'targets':
self._parent_targets_object.add_delegated_role(bin_rolename['name'],
target)

# Add 'new_targets_object' to the 'targets' role object (this object).
self.add_delegated_role(bin_rolename['name'], target)
logger.debug('Delegated from ' + repr(self.rolename) + ' to ' + repr(bin_rolename))


self._update_roledb_delegations(keydict, delegated_roleinfos)




def add_target_to_bin(self, target_filepath, number_of_bins, fileinfo=None):
Expand Down

0 comments on commit 82a3afd

Please sign in to comment.