Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigation: Why do some benchmarks have significantly increased memory usage with the new incremental GC? #668

Open
mdboom opened this issue Mar 22, 2024 · 1 comment

Comments

@mdboom
Copy link
Contributor

mdboom commented Mar 22, 2024

Looking at the new incremental GC and its subsequent bugfix together, it's interesting that some benchmarks use significantly more memory. In particular, mypy2, pycparser and docutils.

Data

image

benchmark base head diff
2to3 23,279,323 23,574,625 295,302
aiohttp 30,047,865 29,992,862 -55,003
async_generators 36,232,825 36,214,101 -18,724
async_tree_none 80,446,220 80,228,937 -217,283
async_tree_cpu_io_mixed 83,602,090 83,824,640 222,550
async_tree_cpu_io_mixed_tg 92,690,529 93,186,925 496,396
async_tree_io 136,706,145 135,888,700 -817,445
async_tree_io_tg 145,445,644 146,071,747 626,103
async_tree_memoization 86,078,610 85,938,371 -140,239
async_tree_memoization_tg 95,327,768 95,374,189 46,421
async_tree_none_tg 89,782,369 89,443,961 -338,408
asyncio_tcp 50,583,454 51,326,390 742,936
asyncio_tcp_ssl 76,057,648 76,106,020 48,372
asyncio_websockets 65,070,811 64,648,728 -422,083
chameleon 29,368,905 29,676,300 307,395
chaos 17,821,110 17,829,107 7,997
comprehensions 19,322,782 19,364,717 41,935
bench_mp_pool 20,654,762 20,830,890 176,128
coroutines 17,117,379 17,019,270 -98,109
crypto_pyaes 17,455,981 17,514,691 58,710
dask 88,305,273 88,550,448 245,175
deepcopy 18,866,956 18,866,176 -780
deepcopy_reduce 18,834,578 18,855,643 21,065
deepcopy_memo 18,891,727 18,836,918 -54,809
deltablue 19,162,648 21,385,216 2,222,568
django_template 43,785,069 43,736,697 -48,372
djangocms 30,294,016 30,260,467 -33,549
docutils 64,943,835 85,793,840 20,850,005
dulwich_log 26,350,348 25,903,299 -447,049
fannkuch 17,007,957 16,951,393 -56,564
float 34,400,158 34,406,009 5,851
create_gc_cycles 17,081,685 17,093,973 12,288
gc_traversal 20,627,651 20,528,566 -99,085
generators 29,539,571 29,482,227 -57,344
genshi_text 20,681,094 21,016,576 335,482
genshi_xml 21,943,832 22,484,114 540,282
go 20,013,641 18,889,191 -1,124,450
gunicorn 30,099,553 29,992,667 -106,886
hexiom 18,173,561 18,145,475 -28,086
html5lib 37,427,687 41,330,590 3,902,903
json 18,928,006 18,950,241 22,235
json_dumps 17,250,986 17,288,630 37,644
json_loads 17,275,172 17,305,600 30,428
logging_format 24,717,994 24,727,942 9,948
logging_silent 17,715,395 17,760,256 44,861
logging_simple 23,280,688 23,272,691 -7,997
mako 26,486,881 26,608,396 121,515
mdp 25,583,225 25,669,827 86,602
meteor_contest 19,434,739 19,383,442 -51,297
mypy2 167,246,116 242,955,605 75,709,489
nbody 17,175,308 17,166,336 -8,972
nqueens 17,238,308 17,207,491 -30,817
pathlib 18,659,035 18,834,383 175,348
pickle 17,929,362 18,064,140 134,778
pickle_dict 17,877,869 17,895,228 17,359
pickle_list 17,998,799 17,963,885 -34,914
pickle_pure_python 17,842,371 17,920,975 78,604
pidigits 17,161,264 17,136,883 -24,381
pprint_safe_repr 37,872,591 37,859,718 -12,873
pprint_pformat 37,840,798 37,846,454 5,656
pycparser 39,668,784 46,512,420 6,843,636
pyflate 60,085,199 59,582,366 -502,833
pylint 92,368,310 95,753,752 3,385,442
python_startup 13,590,332 13,637,729 47,397
python_startup_no_site 13,584,871 13,620,175 35,304
raytrace 17,892,108 17,909,662 17,554
regex_dna 25,022,854 25,080,393 57,539
regex_effbot 17,865,971 17,899,520 33,549
regex_v8 23,517,086 23,460,327 -56,759
richards 17,694,720 17,702,912 8,192
richards_super 17,723,392 17,667,413 -55,979
scimark_fft 18,076,428 18,055,168 -21,260
scimark_lu 17,692,769 17,755,769 63,000
scimark_monte_carlo 18,058,873 18,048,926 -9,947
scimark_sparse_mat_mult 18,079,353 18,216,667 137,314
spectral_norm 16,931,693 17,008,932 77,239
sqlglot_normalize 22,803,407 22,787,218 -16,189
sqlglot_optimize 23,573,845 23,881,045 307,200
sqlglot_parse 22,958,665 25,704,350 2,745,685
sqlglot_transpile 23,045,461 24,079,993 1,034,532
sqlite_synth 19,792,457 19,782,509 -9,948
sympy_expand 59,473,139 59,392,975 -80,164
sympy_integrate 60,309,504 60,277,126 -32,378
sympy_sum 72,773,632 72,791,771 18,139
sympy_str 60,266,203 60,242,602 -23,601
telco 17,062,180 17,114,453 52,273
thrift 18,817,609 18,836,528 18,919
tomli_loads 180,438,162 177,827,254 -2,610,908
tornado_http 41,430,649 41,497,356 66,707
typing_runtime_protocols 19,174,156 19,157,772 -16,384
unpack_sequence 18,567,753 18,615,929 48,176
unpickle 18,123,239 18,082,474 -40,765
unpickle_list 18,007,186 18,036,248 29,062
unpickle_pure_python 17,914,343 17,938,529 24,186
xml_etree_parse 22,025,362 22,069,443 44,081
xml_etree_iterparse 22,826,227 22,949,107 122,880
xml_etree_generate 23,017,569 23,610,514 592,945
xml_etree_process 22,720,121 22,722,462 2,341

Looking at the mypy2 benchmark alone, I measured how memory usage increases as you increase the size of the input data (I basically feed it N copies of the input file rather than a single file).

On the legacy GC, memory usage scales pretty linearly with the size of the input. With the incremental GC, memory usage appears to grow faster than linear. but certainly at a much faster rate. The two lines are normalized so it's (memory of N files / memory of 1 file).

image

#files incremental gc legacy gc
1 144 136
2 149 137
5 155 138
10 192 142
@mdboom
Copy link
Contributor Author

mdboom commented Mar 28, 2024

I ran this again now that some follow-on PRs have be merged and with more data points. Here I am comparing a recent commit python/cpython@c1712ef to the commit prior to the incremental GC python/cpython@d5ebf8b.

Now, there is almost no difference at smaller data sizes. As the data size increases, however, you can see that the legacy GC memory increases at a much smaller rate than the incremental GC. In fact, I think the "stair stepped" appearance of the legacy GC probably supports @markshannon's suggestion that the legacy GC did "too much" work. More importantly, though, the incremental GC doesn't appear to increase memory usage in any sort of faster-than-linear rate as the input size increases, suggesting there isn't any issue with cycles that are never found/"leaked" or anything of that sort. So this does seem like a tradeoff that is reasonable to make, rather than a sign of any bug causing runaway memory usage.

image

Data size Incremental GC Legacy GC
1 109 MB 109 MB
2 118 MB 118 MB
4 126 MB 125 MB
8 187 MB 139 MB
16 244 MB 193 MB
24 335 MB 267 MB
32 435 MB 270 MB
48 619 MB 451 MB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant