Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change to streaming out the heap snapshot data #52854

Merged

Conversation

JianFangAtRai
Copy link
Contributor

This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot data (#51518 )

Here are the commit history:

* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic

* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading,
or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before
downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the
node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an
optimization.... It seems like we should probably prefer to inline the
Edges right into the vector, rather than having to do another random
lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
@inkydragon inkydragon added stdlib Julia's standard library profiler labels Jan 11, 2024
@inkydragon
Copy link
Member

Since the current changes do not involve documentation, CI errors are irrelevant.
You may want to rebase based on the master branch.

┌ Error: no doc found for reference '[`Profile.HeapSnapshot.assemble_snapshot(filepath)`](@ref)' in src\stdlib\Profile.md.
└ @ Documenter C:\workdir\doc\deps\packages\Documenter\1HwWe\src\utilities\utilities.jl:44
[ Info: CheckDocument: running document checks.
[ Info: Populate: populating indices.
ERROR: LoadError: `makedocs` encountered an error [:cross_references] -- terminating build before rendering.

@giordano giordano changed the title Change to streaming out the heap snapshot data (#1) Change to streaming out the heap snapshot data Jan 17, 2024
@JianFangAtRai JianFangAtRai marked this pull request as ready for review January 17, 2024 16:35
@NHDaly NHDaly requested a review from gbaraldi January 24, 2024 18:57
@gbaraldi
Copy link
Member

I'm not super familiar with this code, I've only modified it a bit once, but the changes seems pretty reasonable. Did you folks do some testing to see if chrome is still able to process the file that gets regenerated from the json?

@JianFangAtRai
Copy link
Contributor Author

I'm not super familiar with this code, I've only modified it a bit once, but the changes seems pretty reasonable. Did you folks do some testing to see if chrome is still able to process the file that gets regenerated from the json?

Yes, we tested the snapshot with different sizes from hundreds of MBs to couple GBs, chrome was still able to process the assembled files.

src/gc-heap-snapshot.cpp Outdated Show resolved Hide resolved
stdlib/Profile/src/heapsnapshot_reassemble.jl Outdated Show resolved Hide resolved
stdlib/Profile/src/heapsnapshot_reassemble.jl Outdated Show resolved Hide resolved
@d-netto d-netto merged commit c16472b into JuliaLang:master Feb 1, 2024
7 checks passed
Drvi added a commit to RelationalAI/julia that referenced this pull request Feb 1, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request Feb 6, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request Feb 7, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request Feb 14, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request Feb 21, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request Feb 22, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
Drvi added a commit to RelationalAI/julia that referenced this pull request Feb 28, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request Mar 1, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request Mar 13, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
Drvi added a commit to RelationalAI/julia that referenced this pull request Apr 3, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
d-netto pushed a commit to RelationalAI/julia that referenced this pull request Apr 16, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request Apr 23, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request Apr 24, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request Apr 30, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request Apr 30, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request May 2, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
@IanButterworth IanButterworth added the backport 1.10 Change should be backported to the 1.10 release label May 8, 2024
@IanButterworth IanButterworth mentioned this pull request May 8, 2024
23 tasks
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request May 9, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request May 19, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request May 26, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request May 28, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request May 29, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
KristofferC added a commit that referenced this pull request May 30, 2024
Backported PRs:
- [x] #54010 <!-- Overload `Base.literal_pow` for `AbstractQ` -->
- [x] #54143 <!-- Fix `make install` from tarballs -->
- [x] #54151 <!-- LinearAlgebra: Correct zero element in
`_generic_matvecmul!` for block adj/trans -->
- [x] #54233 <!-- set MAX_OS_WRITE on unix -->
- [x] #54251 <!-- fix typo in gc_mark_memory8 when chunking a large
array -->
- [x] #54363 <!-- typeintersect: fix another stack overflow caused by
circular constraints -->
- [x] #54497 <!-- Make TestLogger thread-safe (introduce a lock) -->
- [x] #53796 <!-- Add a missing doc -->
- [x] #54465 <!-- typeintersect: conservative typevar subtitution during
`finish_unionall` -->
- [x] #54514 <!-- typeintersect: followup cleanup for the nothrow path
of type instantiation -->

Need manual backport:
- [ ] #52505 <!-- fix alignment of emit_unbox_store copy -->
- [ ] #53373 <!-- fix sysimage-native-code=no option with pkgimages -->
- [ ] #53815 <!-- create phantom task for GC threads -->
- [ ] #53984 <!-- Profile: fix heap snapshot is valid char check -->
- [ ] #54276 <!-- Fix solve for complex `Hermitian` with non-vanishing
imaginary part on diagonal -->

Contains multiple commits, manual intervention needed:
- [ ] #52854 <!-- Change to streaming out the heap snapshot data -->
- [ ] #53218 <!-- Fix interpreter_exec.jl test -->
- [ ] #53833 <!-- Profile: make heap snapshots viewable in vscode viewer
-->
- [ ] #54303 <!-- LinearAlgebra: improve type-inference in
Symmetric/Hermitian matmul -->
- [ ] #52694 <!-- Reinstate similar for AbstractQ for backward
compatibility -->

Non-merged PRs with backport label:
- [ ] #54471 <!-- Actually setup jit targets when compiling
packageimages instead of targeting only one -->
- [ ] #53452 <!-- RFC: allow Tuple{Union{}}, returning Union{} -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
Drvi added a commit to RelationalAI/julia that referenced this pull request Jun 7, 2024
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
@KristofferC KristofferC mentioned this pull request Jun 19, 2024
46 tasks
KristofferC added a commit that referenced this pull request Aug 13, 2024
Backported PRs:
- [x] #51351 <!-- Remove boxing in pinv -->
- [x] #52678 <!-- Profile: Improve module docstring -->
- [x] #54201 <!-- Fix generic triangular solves with empty matrices -->
- [x] #54605 <!-- Allow libquadmath to also fail as it is not available
on all systems -->
- [x] #54634 <!-- Fix trampoline assembly for build on clang 18 on apple
silicon -->
- [x] #54635 <!-- Aggressive constprop in trevc! to stabilize triangular
eigvec -->
- [x] #54645 <!-- ensure we set the right value to gc_first_tid -->
- [x] #54671 <!-- Add boundscheck in bindingkey_eq to avoid OOB access
due to data race -->
- [x] #54672 <!-- make: Fix `sed` command for LLVM libraries with no
symbol versioning -->
- [x] #54704 <!-- LazyString in reinterpretarray error messages -->
- [x] #54713 <!-- make: use `readelf` for LLVM symbol version detection
-->
- [x] #54781 <!-- [LinearAlgebra] Improve resilience to unknown
libblastrampoline flags -->
- [x] #54837 <!-- Do not add type tag size to the `alloc_typed` lowering
for GC allocations -->
- [x] #54815 <!-- add sticky task warning to `@task` and `schedule` -->
- [x] #55141 <!-- Update the aarch64 devdocs to reflect the current
state of its support -->
- [x] #55178 <!-- Compat for `Base.@nospecializeinfer` -->
- [x] #55197 <!-- compat notice for a[begin] indexing -->
- [x] #55209 <!-- correction to compat notice for a[begin] -->
- [x] #55203 <!-- document mutable struct const fields -->
- [x] #54769 <!-- add missing compat entry to edit -->
- [x] #54791 <!-- Bump libblastrampoline to v5.10.1 -->
- [x] #55070 <!-- LinearAlgebra: LazyString in error messages for
Diagonal/Bidiagonal -->
- [x] #54624 <!-- more precise aliasing checks for SubArray -->
- [x] #54690 <!-- Fix assertion/crash when optimizing function with dead
basic block -->
- [x] #55084 <!-- Use triple quotes in TOML.print when string contains
newline -->


Need manual backport:
- [ ] #52505 <!-- fix alignment of emit_unbox_store copy -->
- [ ] #53373 <!-- fix sysimage-native-code=no option with pkgimages -->
- [ ] #53984 <!-- Profile: fix heap snapshot is valid char check -->
- [ ] #54276 <!-- Fix solve for complex `Hermitian` with non-vanishing
imaginary part on diagonal -->
- [ ] #54669 <!-- Improve error message in inplace transpose -->
- [ ] #54871 <!-- Make warn missed transformations pass optional -->

Contains multiple commits, manual intervention needed:
- [ ] #52854 <!-- Change to streaming out the heap snapshot data -->
- [ ] #53218 <!-- Fix interpreter_exec.jl test -->
- [ ] #53833 <!-- Profile: make heap snapshots viewable in vscode viewer
-->
- [ ] #54303 <!-- LinearAlgebra: improve type-inference in
Symmetric/Hermitian matmul -->
- [ ] #52694 <!-- Reinstate similar for AbstractQ for backward
compatibility -->
- [ ] #54737 <!-- LazyString in interpolated error messages involving
types -->
- [ ] #54738 <!-- serialization: fix relocatability bug -->
- [ ] #55052 <!-- Fix `(l/r)mul!` with `Diagonal`/`Bidiagonal` -->

Non-merged PRs with backport label:
- [ ] #55220 <!-- `isfile_casesensitive` fixes on Windows -->
- [ ] #55169 <!-- `propertynames` for SVD respects private argument -->
- [ ] #55013 <!-- [docs] change docstring to match code -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
- [ ] #50813 <!-- More doctests for Sockets and capitalization fix -->
- [ ] #50157 <!-- improve docs for `@inbounds` and
`Base.@propagate_inbounds` -->
- [ ] #41244 <!-- Fix shell `cd` error when working dir has been deleted
-->
@KristofferC KristofferC mentioned this pull request Sep 12, 2024
63 tasks
@KristofferC KristofferC mentioned this pull request Oct 29, 2024
47 tasks
@KristofferC KristofferC removed the backport 1.10 Change should be backported to the 1.10 release label Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
profiler stdlib Julia's standard library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants