-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix performance and metadata representation issues in roledb #1005
Comments
I was able to reproduce and profile the issue using below script. Generating 16K hashed bins took ~15 hours on my MBP (early 2015), of which ~3.5 hours (12355.858 seconds) were CPU time. Applying fix (a) form above, the CPU time dropped to ~80 seconds, with fix (b) and (d) it dropped to ~60 seconds. pip install securesystemslib[crypto,pynacl] tuf
curl https://gist.githubusercontent.com/lukpueh/724bd1d7b477f201a9f199b037d85747/raw/profile_tuf_bins.py -o profile_tuf_bins.py
python -m cProfile -o stats profile_tuf_bins.py >>> import pstats, os, glob
>>> from pstats import SortKey
>>> newest_stats = os.path.join(sorted(glob.glob("tuf_stats_*")).pop(), "stats")
>>> p = pstats.Stats(newest_stats)
>>> p.sort_stats(SortKey.CUMULATIVE).print_stats("tuf|securesystemslib|copy", 20)
# Thu Mar 19 10:46:32 2020 tuf_stats__1584550063__bins_16384/stats
#
# 31214279022 function calls (25298241328 primitive calls) in 12355.858 seconds
#
# Ordered by: cumulative time
# List reduced from 945 to 393 due to restriction <'tuf|securesystemslib|copy'>
# List reduced from 393 to 20 due to restriction <20>
#
# ncalls tottime percall cumtime percall filename:lineno(function)
# 1 0.036 0.036 12355.858 12355.858 profile_tuf_bins.py:29(main)
# 1 53.294 53.294 12289.763 12289.763 /Users/lukp/tuf/tuf/tuf/repository_tool.py:2421(delegate_hashed_bins)
# 16385 1.074 0.000 12233.417 0.747 /Users/lukp/tuf/tuf/tuf/repository_tool.py:2164(delegate)
# 4300458036/147547 4273.850 0.000 10081.901 0.068 /Users/lukp/.pyenv/versions/3.8.0/lib/python3.8/copy.py:128(deepcopy)
# 269189355/147547 1024.742 0.000 10081.119 0.068 /Users/lukp/.pyenv/versions/3.8.0/lib/python3.8/copy.py:225(_deepcopy_dict)
# 537608476/573712 1916.648 0.000 10069.452 0.018 /Users/lukp/.pyenv/versions/3.8.0/lib/python3.8/copy.py:199(_deepcopy_list)
# 32787 76.144 0.002 7181.414 0.219 /Users/lukp/tuf/tuf/tuf/roledb.py:319(update_roleinfo)
# 81949 19.111 0.000 5053.164 0.062 /Users/lukp/tuf/tuf/tuf/roledb.py:683(get_roleinfo)
# 134799701/245935 385.711 0.000 2067.982 0.008 /Users/lukp/.ve/tuf-dev/lib/python3.8/site-packages/securesystemslib/schema.py:754(check_match)
# 403661172/786789 164.846 0.000 2063.374 0.003 /Users/lukp/.ve/tuf-dev/lib/python3.8/site-packages/securesystemslib/schema.py:701(check_match)
# 269172973/540895 471.488 0.000 2062.263 0.004 /Users/lukp/.ve/tuf-dev/lib/python3.8/site-packages/securesystemslib/schema.py:523(check_match)
# 671981865 328.549 0.000 742.563 0.000 /Users/lukp/.ve/tuf-dev/lib/python3.8/site-packages/securesystemslib/schema.py:960(check_match)
# 806797831 418.819 0.000 606.990 0.000 /Users/lukp/.pyenv/versions/3.8.0/lib/python3.8/copy.py:241(_keep_alive)
# 3493660202 386.532 0.000 386.532 0.000 /Users/lukp/.pyenv/versions/3.8.0/lib/python3.8/copy.py:182(_deepcopy_atomic)
# 134652058 80.054 0.000 114.281 0.000 /Users/lukp/.ve/tuf-dev/lib/python3.8/site-packages/securesystemslib/schema.py:590(check_match)
# 1 0.377 0.377 65.143 65.143 /Users/lukp/tuf/tuf/tuf/repository_tool.py:220(writeall)
# 16389 0.485 0.000 63.732 0.004 /Users/lukp/tuf/tuf/tuf/repository_lib.py:94(_generate_and_write_metadata)
# 135203458 40.774 0.000 54.002 0.000 /Users/lukp/.ve/tuf-dev/lib/python3.8/site-packages/securesystemslib/schema.py:202(check_match)
# 134455360 33.074 0.000 46.199 0.000 /Users/lukp/.ve/tuf-dev/lib/python3.8/site-packages/securesystemslib/schema.py:455(check_match)
# 16389 0.934 0.000 39.196 0.002 /Users/lukp/tuf/tuf/tuf/repository_lib.py:1643(write_metadata_file)
# |
Thanks a ton for digging into this! |
Due to the performance overhead of deepcopy(), as used extensively in roledb, the delegate function is rather slow. This is especially noticeable when we have a large number_of_bins when calling delegate_hashed_bins. In order to be able to easily reduce the number of deepcopy() operations we remove direct calls to delegate() and instead use the newly added helper functions to replicate the behaviour, only with a single call update to the roledb. This improves the performance of a 16k bins delegation from a 1hr 24min operation on my laptop to 33s. Ideally once Issue theupdateframework#1005 has been properly fixed this commit can be reverted and we can once again just call delegate() here. Signed-off-by: Joshua Lock <[email protected]>
Due to the performance overhead of deepcopy(), as used extensively in roledb, the delegate function is rather slow. This is especially noticeable when we have a large number_of_bins when calling delegate_hashed_bins. In order to be able to easily reduce the number of deepcopy() operations we remove direct calls to delegate() and instead use the newly added helper functions to replicate the behaviour, only with a single call update to the roledb. This improves the performance of a 16k bins delegation from a 1hr 24min operation on my laptop to 33s. Ideally once Issue theupdateframework#1005 has been properly fixed this commit can be reverted and we can once again just call delegate() here. Signed-off-by: Joshua Lock <[email protected]>
Closing this issue as it was filed against (what is now known as) the legacy codebase: issue seems to not be relevant anymore. Please re-open or file a new issue if you feel that the issue is revelant to current python-tuf. More detailsCurrent source code (and upcoming 1.0 release) only contains the modern components
Legacy components (e.g. tuf.client, tuf.repository_tool, tuf.repository_lib as well as the repo and client scripts) are no longer included. See announcement and API reference for more details. |
Update on 2020-04-08:
The particular performance issue described below as Problem 1 has been resolved in #1012 by applying the below proposed Quick fix (a), however, the underlying Problem 2 remains.
I suggest to keep this issue around for context, and proceed with below outlined Long-term fixes (c) and (d).
As reported by @woodruffw (thanks!) and reproduced in the profiling experiment below, repository tool's
delegate_hashed_bins
method has massive performance issues when delegating to many bins (e.g. 16K), for the following reasons:Problem 1:
delegate_hashed_bins
callsdelegate
for each new bin.delegate
then adds the new delegated role to roledb, but also adds delegation infos to the steadily growing delegating role in roledb in each iteration.Problem 2:
Operations on
roledb
are extremely expensive, because they perform twodeepcopy
s of aroleinfo
dictionary, like so:Furthermore,
update_roleinfo
callstuf.formats.ROLEDB_SCHEMA.check_match
for the passed roleinfo, which recursively iterates over all its elements to check for some criteria.I propose the following fixes:
Change
delegate_hashed_bins
to update the delegating role only once in the roledb, instead of once for every bin. This means that we can't calldelegate
as it is right now, but would need to replicate or, ideally, factor out common functionality.Stop deepcopying and instead update roledb by reference. The deepcopys are an unnecessary overhead throughout the codebase (see profiling experiment). This seems quite feasible in most cases, but it is also risky. It might be worth skipping the intermediate fix, and move from quick fix directly to the long-term fix.
Kill roledb with fire. There is absolutely no reason to have different ways of representing tuf metadata in memory, and constantly sync them. Right now:
metadata
dict storeAlso see secondary objective "Has only one internal structure ..." of #846 ")
Stop distrusting your own code. There is really no need to call
check_match
in the internal API on data that we constructed ourselves, especially if it is that costly (see profiling results below and Revise schema and formats facility secure-systems-lab/securesystemslib#183).The text was updated successfully, but these errors were encountered: