Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC page profiler #52567

Merged
merged 1 commit into from
Dec 23, 2023
Merged

GC page profiler #52567

merged 1 commit into from
Dec 23, 2023

Conversation

d-netto
Copy link
Member

@d-netto d-netto commented Dec 17, 2023

Piggybacks in the sweeping phase of the GC to pretty-print a JSON representation of every page in the pool allocator.

Usage:

julia> using Profile

julia> Profile.take_page_profile("/tmp/test-profile.out")
"/tmp/test-profile.out"

Output (truncated to one page & after pretty printing):

{
	"address": "0x109dd0000",
	"object_size": 400,
	"objects": [
		"Task",
		"Task",
		"Task",
		"garbage",
		"GenericMemory",
		"garbage",
		"GenericMemory",
		"GenericMemory",
		"garbage",
		"GenericMemory",
		"GenericMemory",
		"Task",
		"Task",
		"Task",
		"garbage",
		"garbage",
		"garbage",
		"String",
		"garbage",
		"garbage",
		"String",
		"GenericMemory",
		"GenericMemory",
		"garbage",
		"garbage",
		"garbage",
		"garbage",
		"garbage",
		"garbage",
		"garbage",
		"garbage",
		"garbage",
		"garbage",
		"garbage",
		"garbage",
		"garbage",
		"garbage",
		"garbage",
		"garbage",
		"garbage",
	]
}

This proved particularly useful for us when studying fragmentation in some of our workloads, though this profiler may possibly have some repeated functionality compared to the other profilers we already have in stdlib.

@d-netto d-netto added GC Garbage collector profiler labels Dec 17, 2023
@d-netto d-netto force-pushed the dcn-page-profile branch 2 times, most recently from 7b153c0 to 01465a9 Compare December 17, 2023 18:26
@d-netto d-netto force-pushed the dcn-page-profile branch 8 times, most recently from 5214188 to 4c78f4f Compare December 17, 2023 22:03
@d-netto
Copy link
Member Author

d-netto commented Dec 17, 2023

In particular, we've been using this profile to compute page utilization per object type, as done in https://github.com/d-netto/page-profile-parser:

profile = PageProfile(947 pages, 14.80 MiB)
  SimpleVector: 4.74% utilization 395 pages (6.17 MiB)
  GenericMemory: 35.51% utilization 303 pages (4.73 MiB)
  DataType: 27.53% utilization 94 pages (1.47 MiB)
  String: 16.89% utilization 75 pages (1.17 MiB)
  MethodInstance: 40.96% utilization 67 pages (1.05 MiB)
  TypeMapEntry: 77.40% utilization 56 pages (0.88 MiB)
  Array: 34.06% utilization 52 pages (0.81 MiB)
  TypeofVararg: 0.27% utilization 21 pages (0.33 MiB)
  TypeMapLevel: 0.99% utilization 19 pages (0.30 MiB)
  IntrusiveLinkedList: 1.18% utilization 15 pages (0.23 MiB)
...

@d-netto d-netto mentioned this pull request Dec 17, 2023
3 tasks
@d-netto d-netto force-pushed the dcn-page-profile branch 6 times, most recently from 195f86b to 6393ca1 Compare December 21, 2023 23:39
@d-netto
Copy link
Member Author

d-netto commented Dec 21, 2023

Instrumentation above doesn't seem to introduce significant regressions in GC time:

  • before:
category = "TimeZones"
bench = "TimeZones.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │        572 │     111 │       100 │         11 │          111 │                 9 │     4645 │         19 │
│  median │        585 │     118 │       105 │         13 │          118 │                10 │     4648 │         20 │
│ maximum │        601 │     128 │       111 │         20 │          128 │                12 │     4651 │         21 │
│   stdev │         11 │       7 │         4 │          3 │            7 │                 1 │        2 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "append"
bench = "append.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │        866 │     246 │       158 │         88 │           57 │                84 │     4031 │         28 │
│  median │        872 │     247 │       158 │         90 │           58 │                92 │     4031 │         28 │
│ maximum │        906 │     262 │       165 │         97 │           61 │               103 │     4031 │         29 │
│   stdev │         16 │       7 │         3 │          4 │            2 │                 8 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "bigint"
bench = "pollard.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │        490 │     159 │       142 │         17 │           50 │                42 │      134 │         32 │
│  median │        511 │     170 │       151 │         18 │           52 │                58 │      135 │         33 │
│ maximum │        558 │     186 │       166 │         20 │           58 │                65 │      141 │         33 │
│   stdev │         27 │      11 │        10 │          1 │            3 │                10 │        3 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "linked"
bench = "list.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2808 │    2153 │      1955 │        199 │          906 │                47 │     2962 │         76 │
│  median │       2834 │    2164 │      1965 │        200 │          911 │                50 │     2962 │         77 │
│ maximum │       2923 │    2234 │      2011 │        223 │          931 │                82 │     2964 │         77 │
│   stdev │         49 │      36 │        25 │         11 │           10 │                16 │        1 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
bench = "tree.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2181 │     113 │       109 │          3 │           74 │                12 │      100 │          5 │
│  median │       2808 │     142 │       139 │          3 │          102 │                14 │      101 │          5 │
│ maximum │       2919 │     167 │       163 │          4 │          128 │                16 │      103 │          6 │
│   stdev │        346 │      25 │        25 │          0 │           25 │                 2 │        1 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "obj_arrays"
bench = "many_refs.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       1173 │     844 │       787 │         57 │          277 │                 9 │      869 │         72 │
│  median │       1193 │     855 │       795 │         57 │          282 │                10 │      870 │         72 │
│ maximum │       1196 │     863 │       805 │         60 │          286 │                25 │      872 │         72 │
│   stdev │         10 │       7 │         7 │          1 │            4 │                 7 │        1 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
bench = "single_ref.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │        465 │     335 │       325 │         10 │          119 │                11 │      870 │         72 │
│  median │        466 │     338 │       328 │         10 │          122 │                16 │      873 │         73 │
│ maximum │        483 │     352 │       341 │         11 │          129 │                21 │      873 │         73 │
│   stdev │          8 │       8 │         7 │          0 │            4 │                 4 │        1 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "strings"
bench = "strings.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      14304 │    1880 │      1681 │        199 │          587 │                75 │      782 │         13 │
│  median │      14966 │    2008 │      1796 │        212 │          595 │                84 │      801 │         13 │
│ maximum │      15996 │    2119 │      1894 │        225 │          647 │                91 │      814 │         14 │
│   stdev │        701 │      95 │        85 │         11 │           29 │                 6 │       13 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
  • after (edited to include the latest commit):
category = "TimeZones"
bench = "TimeZones.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │        471 │       0 │         0 │          0 │           51 │                 0 │     4647 │          0 │
│  median │        490 │       0 │         0 │          0 │           54 │                 0 │     4649 │          0 │
│ maximum │        594 │     128 │       116 │         12 │          128 │                10 │     4649 │         22 │
│   stdev │         62 │      68 │        62 │          7 │           40 │                 5 │        1 │         12 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "append"
bench = "append.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │        878 │     244 │       156 │         85 │           56 │                74 │     4031 │         28 │
│  median │        889 │     255 │       162 │         89 │           57 │                95 │     4031 │         28 │
│ maximum │        927 │     262 │       170 │        100 │           61 │               109 │     4031 │         29 │
│   stdev │         21 │       7 │         5 │          6 │            2 │                13 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "bigint"
bench = "pollard.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │        482 │     156 │       139 │         16 │           47 │                37 │      132 │         32 │
│  median │        485 │     157 │       141 │         16 │           49 │                42 │      140 │         32 │
│ maximum │        515 │     163 │       146 │         18 │           50 │                65 │      140 │         33 │
│   stdev │         14 │       3 │         2 │          1 │            2 │                11 │        4 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "linked"
bench = "list.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2862 │    2187 │      1981 │        206 │          910 │                56 │     2962 │         76 │
│  median │       2895 │    2219 │      2000 │        210 │          921 │                61 │     2966 │         77 │
│ maximum │       2911 │    2232 │      2021 │        218 │          924 │                69 │     2967 │         77 │
│   stdev │         23 │      20 │        19 │          5 │            6 │                 5 │        2 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
bench = "tree.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2342 │     113 │       109 │          3 │           74 │                10 │      101 │          5 │
│  median │       2480 │     131 │       127 │          3 │           92 │                15 │      102 │          5 │
│ maximum │       2617 │     158 │       154 │          4 │          119 │                18 │      104 │          6 │
│   stdev │        100 │      18 │        18 │          0 │           20 │                 3 │        1 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "obj_arrays"
bench = "many_refs.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       1189 │     854 │       797 │         57 │          278 │                11 │      870 │         71 │
│  median │       1200 │     860 │       802 │         57 │          279 │                14 │      871 │         72 │
│ maximum │       1221 │     876 │       818 │         58 │          292 │                22 │      874 │         72 │
│   stdev │         12 │      10 │        10 │          0 │            6 │                 4 │        2 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
bench = "single_ref.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │        469 │     332 │       322 │         10 │          118 │                13 │      869 │         70 │
│  median │        476 │     338 │       328 │         10 │          121 │                16 │      873 │         71 │
│ maximum │        481 │     340 │       330 │         11 │          121 │                19 │      873 │         71 │
│   stdev │          4 │       3 │         3 │          0 │            1 │                 3 │        2 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "strings"
bench = "strings.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      14860 │    1971 │      1758 │        210 │          600 │                70 │      794 │         13 │
│  median │      14900 │    1984 │      1774 │        213 │          621 │                88 │      799 │         13 │
│ maximum │      15195 │    2099 │      1878 │        221 │          623 │                93 │      814 │         14 │
│   stdev │        151 │      55 │        51 │          5 │           10 │                 9 │        8 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘

@d-netto d-netto force-pushed the dcn-page-profile branch 6 times, most recently from 417ee2b to 81bdfa5 Compare December 22, 2023 01:52
@d-netto d-netto force-pushed the dcn-page-profile branch 2 times, most recently from deab1fa to 2deb6cf Compare December 22, 2023 02:18
Copy link
Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine generally, but I do wonder if we really need to run GC to gather these.
In some sense we could just scan the pages without running GC first.

@d-netto
Copy link
Member Author

d-netto commented Dec 22, 2023

I think this is fine generally, but I do wonder if we really need to run GC to gather these.
In some sense we could just scan the pages without running GC first.

Examining the pages while the mutators are running is likely not thread-safe.

In principle we could use an alternative stop-the-world mechanism to make sure all threads are in a safe state while we're examining their pages, but piggybacking on the GC stop-the-world phase to do so is more convenient from an implementation perspective.

Copy link
Contributor

@kpamnany kpamnany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me and the benchmarks suggest there's no problem, but I'd still prefer to have all this under a build-time setting.

@vtjnash
Copy link
Member

vtjnash commented Dec 22, 2023

I already added a stop-a-thread mechanism recently (piggy backing on the safepoint code as you guessed), so you can do that part in a simple round-robin style if you wanted

@d-netto d-netto merged commit 4975a78 into master Dec 23, 2023
5 of 7 checks passed
@d-netto d-netto deleted the dcn-page-profile branch December 23, 2023 00:11
@vchuravy
Copy link
Member

vchuravy commented Feb 9, 2024

Would have been great to also add docs

@NHDaly
Copy link
Member

NHDaly commented Apr 16, 2024

@d-netto can you please come back and add documentation for this? I agree w/ Valentin: A feature is not done until it is well tested and well documented.

(IMO The best approach is to merge in smaller changes at a time, with documentation and tests coming along with each part, so that even if you run out of time and have to set it down, at least what you've merged is high-quality.)

@d-netto
Copy link
Member Author

d-netto commented Apr 22, 2024

Will add some docs once #53256 gets merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector profiler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants