You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Looking at the new incremental GC and its subsequent bugfix together, it's interesting that some benchmarks use significantly more memory. In particular, mypy2, pycparser and docutils.
Data
benchmark
base
head
diff
2to3
23,279,323
23,574,625
295,302
aiohttp
30,047,865
29,992,862
-55,003
async_generators
36,232,825
36,214,101
-18,724
async_tree_none
80,446,220
80,228,937
-217,283
async_tree_cpu_io_mixed
83,602,090
83,824,640
222,550
async_tree_cpu_io_mixed_tg
92,690,529
93,186,925
496,396
async_tree_io
136,706,145
135,888,700
-817,445
async_tree_io_tg
145,445,644
146,071,747
626,103
async_tree_memoization
86,078,610
85,938,371
-140,239
async_tree_memoization_tg
95,327,768
95,374,189
46,421
async_tree_none_tg
89,782,369
89,443,961
-338,408
asyncio_tcp
50,583,454
51,326,390
742,936
asyncio_tcp_ssl
76,057,648
76,106,020
48,372
asyncio_websockets
65,070,811
64,648,728
-422,083
chameleon
29,368,905
29,676,300
307,395
chaos
17,821,110
17,829,107
7,997
comprehensions
19,322,782
19,364,717
41,935
bench_mp_pool
20,654,762
20,830,890
176,128
coroutines
17,117,379
17,019,270
-98,109
crypto_pyaes
17,455,981
17,514,691
58,710
dask
88,305,273
88,550,448
245,175
deepcopy
18,866,956
18,866,176
-780
deepcopy_reduce
18,834,578
18,855,643
21,065
deepcopy_memo
18,891,727
18,836,918
-54,809
deltablue
19,162,648
21,385,216
2,222,568
django_template
43,785,069
43,736,697
-48,372
djangocms
30,294,016
30,260,467
-33,549
docutils
64,943,835
85,793,840
20,850,005
dulwich_log
26,350,348
25,903,299
-447,049
fannkuch
17,007,957
16,951,393
-56,564
float
34,400,158
34,406,009
5,851
create_gc_cycles
17,081,685
17,093,973
12,288
gc_traversal
20,627,651
20,528,566
-99,085
generators
29,539,571
29,482,227
-57,344
genshi_text
20,681,094
21,016,576
335,482
genshi_xml
21,943,832
22,484,114
540,282
go
20,013,641
18,889,191
-1,124,450
gunicorn
30,099,553
29,992,667
-106,886
hexiom
18,173,561
18,145,475
-28,086
html5lib
37,427,687
41,330,590
3,902,903
json
18,928,006
18,950,241
22,235
json_dumps
17,250,986
17,288,630
37,644
json_loads
17,275,172
17,305,600
30,428
logging_format
24,717,994
24,727,942
9,948
logging_silent
17,715,395
17,760,256
44,861
logging_simple
23,280,688
23,272,691
-7,997
mako
26,486,881
26,608,396
121,515
mdp
25,583,225
25,669,827
86,602
meteor_contest
19,434,739
19,383,442
-51,297
mypy2
167,246,116
242,955,605
75,709,489
nbody
17,175,308
17,166,336
-8,972
nqueens
17,238,308
17,207,491
-30,817
pathlib
18,659,035
18,834,383
175,348
pickle
17,929,362
18,064,140
134,778
pickle_dict
17,877,869
17,895,228
17,359
pickle_list
17,998,799
17,963,885
-34,914
pickle_pure_python
17,842,371
17,920,975
78,604
pidigits
17,161,264
17,136,883
-24,381
pprint_safe_repr
37,872,591
37,859,718
-12,873
pprint_pformat
37,840,798
37,846,454
5,656
pycparser
39,668,784
46,512,420
6,843,636
pyflate
60,085,199
59,582,366
-502,833
pylint
92,368,310
95,753,752
3,385,442
python_startup
13,590,332
13,637,729
47,397
python_startup_no_site
13,584,871
13,620,175
35,304
raytrace
17,892,108
17,909,662
17,554
regex_dna
25,022,854
25,080,393
57,539
regex_effbot
17,865,971
17,899,520
33,549
regex_v8
23,517,086
23,460,327
-56,759
richards
17,694,720
17,702,912
8,192
richards_super
17,723,392
17,667,413
-55,979
scimark_fft
18,076,428
18,055,168
-21,260
scimark_lu
17,692,769
17,755,769
63,000
scimark_monte_carlo
18,058,873
18,048,926
-9,947
scimark_sparse_mat_mult
18,079,353
18,216,667
137,314
spectral_norm
16,931,693
17,008,932
77,239
sqlglot_normalize
22,803,407
22,787,218
-16,189
sqlglot_optimize
23,573,845
23,881,045
307,200
sqlglot_parse
22,958,665
25,704,350
2,745,685
sqlglot_transpile
23,045,461
24,079,993
1,034,532
sqlite_synth
19,792,457
19,782,509
-9,948
sympy_expand
59,473,139
59,392,975
-80,164
sympy_integrate
60,309,504
60,277,126
-32,378
sympy_sum
72,773,632
72,791,771
18,139
sympy_str
60,266,203
60,242,602
-23,601
telco
17,062,180
17,114,453
52,273
thrift
18,817,609
18,836,528
18,919
tomli_loads
180,438,162
177,827,254
-2,610,908
tornado_http
41,430,649
41,497,356
66,707
typing_runtime_protocols
19,174,156
19,157,772
-16,384
unpack_sequence
18,567,753
18,615,929
48,176
unpickle
18,123,239
18,082,474
-40,765
unpickle_list
18,007,186
18,036,248
29,062
unpickle_pure_python
17,914,343
17,938,529
24,186
xml_etree_parse
22,025,362
22,069,443
44,081
xml_etree_iterparse
22,826,227
22,949,107
122,880
xml_etree_generate
23,017,569
23,610,514
592,945
xml_etree_process
22,720,121
22,722,462
2,341
Looking at the mypy2 benchmark alone, I measured how memory usage increases as you increase the size of the input data (I basically feed it N copies of the input file rather than a single file).
On the legacy GC, memory usage scales pretty linearly with the size of the input. With the incremental GC, memory usage appears to grow faster than linear. but certainly at a much faster rate. The two lines are normalized so it's (memory of N files / memory of 1 file).
#files
incremental gc
legacy gc
1
144
136
2
149
137
5
155
138
10
192
142
The text was updated successfully, but these errors were encountered:
I ran this again now that some follow-on PRs have be merged and with more data points. Here I am comparing a recent commit python/cpython@c1712ef to the commit prior to the incremental GC python/cpython@d5ebf8b.
Now, there is almost no difference at smaller data sizes. As the data size increases, however, you can see that the legacy GC memory increases at a much smaller rate than the incremental GC. In fact, I think the "stair stepped" appearance of the legacy GC probably supports @markshannon's suggestion that the legacy GC did "too much" work. More importantly, though, the incremental GC doesn't appear to increase memory usage in any sort of faster-than-linear rate as the input size increases, suggesting there isn't any issue with cycles that are never found/"leaked" or anything of that sort. So this does seem like a tradeoff that is reasonable to make, rather than a sign of any bug causing runaway memory usage.
Looking at the new incremental GC and its subsequent bugfix together, it's interesting that some benchmarks use significantly more memory. In particular, mypy2, pycparser and docutils.
Data
Looking at the mypy2 benchmark alone, I measured how memory usage increases as you increase the size of the input data (I basically feed it N copies of the input file rather than a single file).
On the legacy GC, memory usage scales pretty linearly with the size of the input. With the incremental GC, memory usage appears to grow faster than linear. but certainly at a much faster rate. The two lines are normalized so it's (memory of N files / memory of 1 file).
The text was updated successfully, but these errors were encountered: