Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison of branch: backports-release-1.10+RAI #46

Closed
wants to merge 141 commits into from

Conversation

quinnj
Copy link

@quinnj quinnj commented Sep 11, 2023

Corresponding RAICode PR that tracks this: https://github.com/RelationalAI/raicode/pull/15320

@nickrobinson251 nickrobinson251 marked this pull request as draft September 14, 2023 09:19
@nickrobinson251 nickrobinson251 changed the base branch from v1.10.0-beta1+RAI to backports-release-1.9 September 15, 2023 10:45
@nickrobinson251 nickrobinson251 changed the title Comparison of release branch: v1.10.0-beta1+RAI and target branch: backports-release-1.10+RAI Comparison of branch: backports-release-1.10+RAI Sep 15, 2023
@nickrobinson251 nickrobinson251 changed the base branch from backports-release-1.9 to backports-release-1.10 September 15, 2023 16:30
@DelveCI DelveCI force-pushed the backports-release-1.10+RAI branch 2 times, most recently from 3ff2ddc to 6afc404 Compare September 18, 2023 16:30
@DelveCI DelveCI force-pushed the backports-release-1.10+RAI branch 4 times, most recently from 2e51295 to b20dc01 Compare October 12, 2023 00:13
@DelveCI DelveCI force-pushed the backports-release-1.10+RAI branch 2 times, most recently from 719e684 to e1562a3 Compare October 21, 2023 00:14
@kpamnany kpamnany force-pushed the backports-release-1.10+RAI branch 2 times, most recently from 636d6a5 to 458ca53 Compare October 21, 2023 21:01
@DelveCI DelveCI force-pushed the backports-release-1.10+RAI branch 6 times, most recently from a5e5ab1 to 71d3477 Compare October 29, 2023 00:16
@DelveCI DelveCI force-pushed the backports-release-1.10+RAI branch 3 times, most recently from c14f6de to 29221fe Compare November 1, 2023 00:16
li1 and others added 26 commits May 29, 2024 00:17
* Add GC metric `last_incremental_sweep`

* Update gc.c

* Update gc.c
Prevent transparent huge pages (THP) overallocating pysical memory.

Co-authored-by: Adnan Alhomssi <[email protected]>
Pass the types to the allocator functions.

-------

Before this PR, we were missing the types for allocations in two cases:

1. allocations from codegen
2. allocations in `gc_managed_realloc_`

The second one is easy: those are always used for buffers, right?

For the first one: we extend the allocation functions called from
codegen, to take the type as a parameter, and set the tag there.

I kept the old interfaces around, since I think that they cannot be
removed due to supporting legacy code?

------

An example of the generated code:
```julia
  %ptls_field6 = getelementptr inbounds {}**, {}*** %4, i64 2
  %13 = bitcast {}*** %ptls_field6 to i8**
  %ptls_load78 = load i8*, i8** %13, align 8
  %box = call noalias nonnull dereferenceable(32) {}* @ijl_gc_pool_alloc_typed(i8* %ptls_load78, i32 1184, i32 32, i64 4366152144) #7
```

Fixes JuliaLang#43688.
Fixes JuliaLang#45268.

Co-authored-by: Valentin Churavy <[email protected]>
Sweeping of object pools will either construct a free list through dead objects (if there is at least one live object in a given page) or return the page to the OS (if there are no live objects whatsoever). With this PR, we're basically constructing the free-lists for each GC page in parallel.
GC threads don't have tasks associated with them.
Presence is controlled by a build-time option. Start a separate
thread which simply sleeps. When heartbeats are enabled, this
thread wakes up at specified intervals to verify that user code
is heartbeating as requested and if not, prints task backtraces.

Also fixes the call to `maxthreadid` in `generate_precompile.jl`.
When enabling heartbeats, the user must specify:
- heartbeat_s: jl_heartbeat() must be called at least once every heartbeat_s; if it
  isn't, a one-line heartbeat loss report is printed
- show_tasks_after_n: after these many heartbeat_s have passed without jl_heartbeat()
  being called, print task backtraces and stop all reporting
- reset_after_n: after these many heartbeat_s have passed with jl_heartbeat()
  being called, print a heartbeats recovered message and reset reporting
`pool_live_bytes` was previously lazily updated during the GC, meaning
it was only accurate right after a GC.

Make this metric accurate if gathered after a GC has happened.
Otherwise we may just observe `gc_n_threads = 0` (`jl_gc_collect` sets
it to 0 in the very end of its body) and this function becomes a no-op.
…uliaLang#52164)

One of the limitations is that it's only accurate right after the GC.
Still might be helpful for observability purposes.
We're suffering from heavy fragmentation in some of our workloads.

Add a build-time option to enable 4k pages (instead of 16k) in the GC,
since that improves memory utilization considerably for us.

Drawback is that this may increase the number of `madvise` system calls
in the sweeping phase by a factor of 4, but concurrent page sweeping
should help with some of that.
…uliaLang#52943)

**EDIT**: fixes JuliaLang#52937 by
decreasing the contention on the page lists and only waking GC threads
up if we have a sufficiently large number of pages.

Seems to address the regression from the MWE of
JuliaLang#52937:

- master:
```
../julia-master/julia --project=. run_benchmarks.jl serial obj_arrays issue-52937 -n5 --gcthreads=1
bench = "issue-52937.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      24841 │     818 │        78 │        740 │           44 │             10088 │       96 │          3 │
│  median │      24881 │     834 │        83 │        751 │           45 │             10738 │       97 │          3 │
│ maximum │      25002 │     891 │        87 │        803 │           48 │             11074 │      112 │          4 │
│   stdev │         78 │      29 │         4 │         26 │            1 │               393 │        7 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
 ../julia-master/julia --project=. run_benchmarks.jl serial obj_arrays issue-52937 -n5 --gcthreads=8
bench = "issue-52937.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      29113 │    5200 │        68 │       5130 │           12 │              9724 │       95 │         18 │
│  median │      29354 │    5274 │        69 │       5204 │           12 │             10456 │       96 │         18 │
│ maximum │      29472 │    5333 │        70 │       5264 │           14 │             11913 │       97 │         18 │
│   stdev │        138 │      54 │         1 │         55 │            1 │               937 │        1 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
```

- PR:
```
../julia-master/julia --project=. run_benchmarks.jl serial obj_arrays issue-52937 -n5 --gcthreads=1
bench = "issue-52937.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      24475 │     761 │        77 │        681 │           40 │              9499 │       94 │          3 │
│  median │      24845 │     775 │        80 │        698 │           43 │             10793 │       97 │          3 │
│ maximum │      25128 │     811 │        85 │        726 │           47 │             12820 │      113 │          3 │
│   stdev │        240 │      22 │         3 │         21 │            3 │              1236 │        8 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
../julia-master/julia --project=. run_benchmarks.jl serial obj_arrays issue-52937 -n5 --gcthreads=8
bench = "issue-52937.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      24709 │     679 │        70 │        609 │           11 │              9981 │       95 │          3 │
│  median │      24869 │     702 │        70 │        631 │           12 │             10705 │       96 │          3 │
│ maximum │      24911 │     708 │        72 │        638 │           13 │             10820 │       98 │          3 │
│   stdev │         79 │      12 │         1 │         12 │            1 │               401 │        1 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
```

Also, performance on `objarray.jl` (an example of benchmark in which
sweeping parallelizes well with the current implementation) seems fine:

- master:
```
../julia-master/julia --project=. run_benchmarks.jl multithreaded bigarrays -n5 --gcthreads=1      
bench = "objarray.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      19301 │   10792 │      7485 │       3307 │         1651 │               196 │     4519 │         56 │
│  median │      21415 │   12646 │      9094 │       3551 │         1985 │               241 │     6576 │         59 │
│ maximum │      21873 │   13118 │      9353 │       3765 │         2781 │               330 │     8793 │         60 │
│   stdev │       1009 │     932 │       757 │        190 │          449 │                50 │     1537 │          2 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
../julia-master/julia --project=. run_benchmarks.jl multithreaded bigarrays -n5 --gcthreads=8
bench = "objarray.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      13135 │    4377 │      3350 │       1007 │          491 │               231 │     6062 │         33 │
│  median │      13164 │    4540 │      3370 │       1177 │          669 │               256 │     6383 │         35 │
│ maximum │      13525 │    4859 │      3675 │       1184 │          748 │               320 │     7528 │         36 │
│   stdev │        183 │     189 │       146 │         77 │          129 │                42 │      584 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
```

- PR:
```
../julia-master/julia --project=. run_benchmarks.jl multithreaded bigarrays -n5 --gcthreads=1    
bench = "objarray.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      19642 │   10931 │      7566 │       3365 │         1653 │               204 │     5688 │         56 │
│  median │      21441 │   12717 │      8948 │       3770 │         1796 │               217 │     6972 │         59 │
│ maximum │      23494 │   14643 │     10576 │       4067 │         2513 │               248 │     8229 │         62 │
│   stdev │       1408 │    1339 │      1079 │        267 │          393 │                19 │      965 │          2 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
../julia-master/julia --project=. run_benchmarks.jl multithreaded bigarrays -n5 --gcthreads=8
bench = "objarray.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      13365 │    4544 │      3389 │       1104 │          516 │               255 │     6349 │         34 │
│  median │      13445 │    4624 │      3404 │       1233 │          578 │               275 │     6385 │         34 │
│ maximum │      14413 │    5278 │      3837 │       1441 │          753 │               300 │     7547 │         37 │
│   stdev │        442 │     303 │       194 │        121 │           89 │                18 │      522 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
```
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
… (#134)

Fixes JuliaLang#52262.

Performs `^(x, y)` but throws OverflowError on overflow.

Example:
```julia
julia> 2^62
4611686018427387904

julia> 2^63
-9223372036854775808

julia> checked_pow(2, 63)
ERROR: OverflowError: 2147483648 * 4294967296 overflowed for type Int64
```

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Jameson Nash <[email protected]>
Co-authored-by: Shuhei Kadowaki <[email protected]>
Co-authored-by: Tomáš Drvoštěp <[email protected]>
* --safe-crash-log-file flag

* Update init.c

* json escape jl_safe_printf when safe crash log file

* add timestamp to json logs

* port it to aarch64 darwin

* fix minor warning

* missing double quote

* Suggestion from code review: make sig_stack_size a const in signals-unix.c

Co-authored-by: Kiran Pamnany <[email protected]>

* Suggestion from code review: make sig_stack size a const in signals-win.c

Co-authored-by: Kiran Pamnany <[email protected]>

* more suggestions from Kiran's review

* more suggestions from review

---------

Co-authored-by: Malte Sandstede <[email protected]>
Co-authored-by: Adnan Alhomssi <[email protected]>
Co-authored-by: Kiran Pamnany <[email protected]>
@DelveCI DelveCI force-pushed the backports-release-1.10+RAI branch from aa81c3f to 7d490ea Compare May 29, 2024 00:17
Copy link

This PR is stale because it has been open 30 days with no activity. Comment or remove stale label, or this PR will be closed in 5 days.

@github-actions github-actions bot added the stale This pull request is inactive label Jun 29, 2024
@github-actions github-actions bot closed this Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale This pull request is inactive
Projects
None yet
Development

Successfully merging this pull request may close these issues.