Change precompilation format to be more amenable to copying #32705

timholy · 2019-07-26T23:26:29Z

This is a hackathon project by @vtjnash (🧠) and myself (💪). This is partway towards the end goal (with the harder thinking still to go).

As I understand it, the problem is that inferred CodeInfo objects are serialized by putting all objects into the method's roots table, and then each specialization gets stored by referring to these objects via their index. There is one roots table for all the specializations, and this has an important consequence: when you compile for new input types, what happens frequently is that you need new objects added to roots, and this changes the numbering of items in roots. Consequently the lookup is state-dependent ("which specializations do you happen to have compiled?"), and thus we can't merge this info into other things. To avoid such problems, currently we refuse to save inference info for methods for which additional specializations would change the roots table. Since that's a large fraction of all methods, that means in practice we save relatively little. This PR is designed to eliminate such confusion by ensuring that each CodeInstance gets its own private roots table.

In the current state, we may have successfully added the new field and gotten Julia through bootstrap. If this were relatively neutral in terms of system image size and build time, it could presumably be merged at any point despite not yet doing anything useful. However, the .data field of the so is 125MB vs its expected 118MB, suggesting it should wait for more improvement.

This is designed to eliminate confusion in the serialization by ensuring that offsets are relative to a "private" roots table. This may allow more extensive caching of inference results, because it should eliminate root-indexing conflicts between different instances of the same method.

timholy · 2019-07-30T12:51:46Z

Having learned a bunch while doing this, I now have to ask: would a better approach be to go back to a single roots table for the method, but switch to an ihtable_t (indexable hash table) that maintains keys in a list sorted by order of entry into the hash table? That is to say, hash(obj)=>index=>obj, where the last stage is looked up by objects[index]. Whenever you add a new item not previously in the hash table you just push onto the end of objects. The advantage seems to be that it would preserve the validity of previously computed indexes while also allowing new entries to be added. Consequently if you have the "master" roots list in your running session and load a .ji file with its own roots list, you can effectively do append!(masterroots, pkgroots) and add just the missing items to masterroots. When you load the package's inferred IR you also have to update the indexes to those of masterroots, but while running that append! you can also compute a reindex vector and use that for translation.

Ordering is distinct from indexing, and our canonical Julia OrderedSet doesn't support indexing. But that's because it has decided to support deletion, and indexes (if we want them to always increase by 1) are not stable under deletion. But this is a case where we never want to delete anything, in which case indexing seems justifiable.

There would be an extra layer of indirection for each lookup, but you'd only hit that when serializing (not deserializing). In any case, it's my understanding that this is not the performance bottleneck for loading & compiling code.

However, maybe this isn't any better: is the fundamental problem that the indexing is still order-dependent? That is, if we first execute setindex! with a key of type A and then with a key of type B, that will give different ordering than if the two are compiled in the opposite order?

vtjnash · 2019-08-16T22:01:15Z

cross-ref #14556

timholy · 2019-09-19T07:02:09Z

I added a couple of tags. What's holding this up is the question: is this even going in the right direction, and if not would the strategy in #32705 (comment) be more viable? (I tend to suspect the answers to these two questions are "no" and "yes," respectively, but need guidance, since the effort is not small and my time & expertise on this problem are both limited.) I am willing to keep pushing on this but from a pragmatic standpoint I will need a collaborator, or at least a reviewer making suggestions about next steps, who knows the compile chain better than I do.

timholy · 2019-09-20T11:48:52Z

Bump @JeffBezanson. At JuliaCon you raised some concerns about the approach Jameson & I took here, might #32705 (comment) be a better strategy?

erezsh · 2019-09-20T12:04:16Z

Hi, most of the technical details here are still over my head, but I just wanted to ask about the general architecture of this solution.

Have you considered implementing this with a write-ahead log? That means, appending every new signature to a list (i.e. appending to the end of the file), so they can be replayed in the next load. I imagine it would be reasonably efficient, and much easier to implement than a dynamically updating file.

A second, unifying pass, can occur at any desired interval, so that its performance won't matter. It's optional, but will result in even faster modules.

This approach might even help for dealing with multiple processes of the same module, in some distant future (You'll have to forgive me, I'm not even sure how Julia does multiprocessing today).

timholy · 2019-09-20T12:19:50Z

A lot of it is unfamiliar to me too. It's less a question of how you write the file, and more about what you put into it. How do you represent code in binary format? Pointers are obviously off the table, so you have to come up with a serialization, which is the subject of src/dump.c. IIUC, currently serializing uses an integer index to refer to object lists stored in a methods' roots table. Try this gist (instructions are in the comment near the top) and you'll see how the order in which code gets compiled can alter the roots list. Consequently the integer index that we use to serialize doesn't have an invariant meaning.

Having put this together, I'm not sure #32705 (comment) is a panacea. The original strategy suggested by Jameson is more workable, although the "merge" step will be harder, and the sheer amount of duplicated information is troubling.

Now I'm wondering about saving the roots table for each method in a .ji file, comparing it against the master copy, and renumbering things as you deserialize the file. #32705 (comment) might be useful to prevent problems from rehashing, though.

erezsh · 2019-09-20T13:28:31Z

Consequently the integer index that we use to serialize doesn't have an invariant meaning.

Why is that? If you append only, the old definitions don't lose their meaning. And it's possible to "merge" new definitions by translating their indices (to avoid duplicates).

I still don't understand where is the root of the problem. When Julia compiles a new method, it adds the definition somewhere. If it added to roots, we could just rewrite the whole file. But if not, why can't we take that definition, and serialize it / deserialize it on its own?

timholy · 2019-09-21T09:11:21Z

Most things are possible, it's just a question of doing it 😄. The strategies you're proposing about merging are the same ones I'm proposing. Whip out that editor and go for it!

bjarthur · 2020-08-01T15:25:18Z

do i understand correctly that this PR hopes to make it possible to cache the native code compiled in a session so that it can be subsequently re-used in a new session thereby avoiding the JIT lag? that would be huge. kinda like an on-the-fly PackageCompiler but without modifying the sysimage?

timholy · 2020-08-01T17:14:36Z

It's not specifically addressed at the question of native code, more about being able to pick up some stragglers in our current precompilation.

However, with the invalidation work having recently achieved quite a lot of success, the biggest obstacle to working on native-code caching is basically gone.

timholy · 2020-09-01T21:58:46Z

Friendly bump; now with so many other improvements to latency, I'm seeing lots of examples where inference time is a major part of getting real work done. Here's an example of profiling ProfileView on its first run (with dat, lidict = Profile.retrieve(); Profile.clear(); @profile ProfileView(dat; lidict=lidict):

First block (gray-dominated) is inference, second (yellow-dominated) is LLVM, third is mixed (including some computation), and fourth is borked pointers. My best result with @compile_options is to set optlevel=1 in Gtk, GtkReactive, and ProfileView (turning off inference seems to make things worse), where I've gotten the first @time ProfileView.view() down from about 2s to 1.8s. That is nice, but given that inference on single functions (ProfileView.view() takes ballpark 1s, getting rid of that would be a much bigger prize.

I've added a ton of precompile statements but many of them don't "take," presumably because of the reason described in the top post.

It sure would be some nice icing on the latency cake if we could make progress on this. It's more than I'm prepared to handle solo, but I'm happy to help as I can.

For some reason (perhaps #32705?) most or all of these fail if they are emitted as precompile statements, so this moves them into Base itself. This drops the time for a revision down to 1.85s.

For some reason (perhaps #32705?) most or all of these fail if they are emitted as precompile statements, so this moves them into Base itself. This drops the time for a revision down to 1.85s. (cherry picked from commit cc1623b)

vtjnash · 2021-04-13T17:45:33Z

This is probably in the right direction, but needs more work and is heavily conflicted now. Maybe next hackathon we'll be able to get more progress

For some reason (perhaps JuliaLang#32705?) most or all of these fail if they are emitted as precompile statements, so this moves them into Base itself. This drops the time for a revision down to 1.85s.

timholy · 2021-08-22T14:54:19Z

What about changing the format of the roots table to something like push!(roots, uuid=>list_of_pkg_roots)? That would seem to make it possible to share a single roots table but not be vulnerable to package-loading order or the cost of objid computation (uuid = PkgId.uuid).

HiperMaximus · 2022-07-19T10:38:29Z

Bump

For some reason (perhaps #32705?) most or all of these fail if they are emitted as precompile statements, so this moves them into Base itself. This drops the time for a revision down to 1.85s. (cherry picked from commit cc1623b)

timholy requested a review from JeffBezanson July 26, 2019 23:26

timholy force-pushed the teh/precompile branch from a3a71e3 to cd77b83 Compare July 27, 2019 02:42

timholy mentioned this pull request Sep 18, 2019

Taking Juleps seriously #33239

Closed

timholy added needs decision A decision on this change is needed DO NOT MERGE Do not merge this PR! speculative Whether the change will be implemented is speculative labels Sep 19, 2019

vtjnash mentioned this pull request Sep 19, 2019

RFC: allow precompile to associate a MethodInstance with a module #31466

Closed

JeffBezanson mentioned this pull request Sep 19, 2019

ideas for improving inference and optimizer performance #33326

Closed

timholy mentioned this pull request Nov 13, 2019

Reduce time to first plot by precompiling functions JuliaPlots/Plots.jl#2250

Merged

This was referenced Dec 1, 2019

doc why julia sometimes can't save inference results timholy/SnoopCompile.jl#40

Closed

Sorting the result timholy/SnoopCompile.jl#41

Closed

timholy mentioned this pull request Apr 17, 2020

Trial at skipping read-only packages (packages in .julia/packages) timholy/Revise.jl#457

Closed

timholy mentioned this pull request Aug 27, 2020

Add more precompile and fix an inference problem JuliaPackaging/Requires.jl#89

Merged

This was referenced Sep 1, 2020

Set optlevel to 1 JuliaGraphics/Gtk.jl#520

Merged

Set optlevel=1 and be more restrictive about precompilation JuliaGizmos/GtkReactive.jl#104

Merged

Set optlevel to 1 timholy/ProfileView.jl#152

Merged

kimikage mentioned this pull request Sep 2, 2020

[RFC] Support for huge profile data like TTFP kimikage/ProfileSVG.jl#44

Closed

timholy mentioned this pull request Sep 26, 2020

Add OrderedDict to Base (for Julia 1.6?) #37761

Closed

timholy mentioned this pull request Dec 16, 2020

Reduce latency of Revise.revise() #38906

Merged

timholy mentioned this pull request Dec 20, 2020

Consider generate_precompile prior to module close #38951

Open

vtjnash closed this Apr 13, 2021

vtjnash deleted the teh/precompile branch April 13, 2021 17:45

DilumAluthge removed the DO NOT MERGE Do not merge this PR! label Jun 18, 2021

timholy mentioned this pull request Aug 22, 2021

Faster incremental sysimg rebuilds #40414

Closed

timholy mentioned this pull request Aug 26, 2021

Precompile all the CodeInstances #42016

Closed

timholy mentioned this pull request Jan 14, 2022

Track method roots by precompile module #43793

Merged

BioTurboNick mentioned this pull request Jun 15, 2023

Compilation time regression on nightly due to #41099 #50082

Closed

IanButterworth mentioned this pull request Mar 9, 2024

move precompile workload back from Base #53679

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change precompilation format to be more amenable to copying #32705

Change precompilation format to be more amenable to copying #32705

timholy commented Jul 26, 2019

timholy commented Jul 30, 2019 •

edited

Loading

vtjnash commented Aug 16, 2019

timholy commented Sep 19, 2019

timholy commented Sep 20, 2019

erezsh commented Sep 20, 2019

timholy commented Sep 20, 2019 •

edited

Loading

erezsh commented Sep 20, 2019 •

edited

Loading

timholy commented Sep 21, 2019

bjarthur commented Aug 1, 2020 •

edited

Loading

timholy commented Aug 1, 2020

timholy commented Sep 1, 2020

vtjnash commented Apr 13, 2021

timholy commented Aug 22, 2021 •

edited

Loading

HiperMaximus commented Jul 19, 2022

Change precompilation format to be more amenable to copying #32705

Change precompilation format to be more amenable to copying #32705

Conversation

timholy commented Jul 26, 2019

timholy commented Jul 30, 2019 • edited Loading

vtjnash commented Aug 16, 2019

timholy commented Sep 19, 2019

timholy commented Sep 20, 2019

erezsh commented Sep 20, 2019

timholy commented Sep 20, 2019 • edited Loading

erezsh commented Sep 20, 2019 • edited Loading

timholy commented Sep 21, 2019

bjarthur commented Aug 1, 2020 • edited Loading

timholy commented Aug 1, 2020

timholy commented Sep 1, 2020

vtjnash commented Apr 13, 2021

timholy commented Aug 22, 2021 • edited Loading

HiperMaximus commented Jul 19, 2022

timholy commented Jul 30, 2019 •

edited

Loading

timholy commented Sep 20, 2019 •

edited

Loading

erezsh commented Sep 20, 2019 •

edited

Loading

bjarthur commented Aug 1, 2020 •

edited

Loading

timholy commented Aug 22, 2021 •

edited

Loading