-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change precompilation format to be more amenable to copying #32705
Conversation
This is designed to eliminate confusion in the serialization by ensuring that offsets are relative to a "private" roots table. This may allow more extensive caching of inference results, because it should eliminate root-indexing conflicts between different instances of the same method.
Having learned a bunch while doing this, I now have to ask: would a better approach be to go back to a single Ordering is distinct from indexing, and our canonical Julia OrderedSet doesn't support indexing. But that's because it has decided to support deletion, and indexes (if we want them to always increase by 1) are not stable under deletion. But this is a case where we never want to delete anything, in which case indexing seems justifiable. There would be an extra layer of indirection for each lookup, but you'd only hit that when serializing (not deserializing). In any case, it's my understanding that this is not the performance bottleneck for loading & compiling code. However, maybe this isn't any better: is the fundamental problem that the indexing is still order-dependent? That is, if we first execute |
cross-ref #14556 |
I added a couple of tags. What's holding this up is the question: is this even going in the right direction, and if not would the strategy in #32705 (comment) be more viable? (I tend to suspect the answers to these two questions are "no" and "yes," respectively, but need guidance, since the effort is not small and my time & expertise on this problem are both limited.) I am willing to keep pushing on this but from a pragmatic standpoint I will need a collaborator, or at least a reviewer making suggestions about next steps, who knows the compile chain better than I do. |
Bump @JeffBezanson. At JuliaCon you raised some concerns about the approach Jameson & I took here, might #32705 (comment) be a better strategy? |
Hi, most of the technical details here are still over my head, but I just wanted to ask about the general architecture of this solution. Have you considered implementing this with a write-ahead log? That means, appending every new signature to a list (i.e. appending to the end of the file), so they can be replayed in the next load. I imagine it would be reasonably efficient, and much easier to implement than a dynamically updating file. A second, unifying pass, can occur at any desired interval, so that its performance won't matter. It's optional, but will result in even faster modules. This approach might even help for dealing with multiple processes of the same module, in some distant future (You'll have to forgive me, I'm not even sure how Julia does multiprocessing today). |
A lot of it is unfamiliar to me too. It's less a question of how you write the file, and more about what you put into it. How do you represent code in binary format? Pointers are obviously off the table, so you have to come up with a serialization, which is the subject of Having put this together, I'm not sure #32705 (comment) is a panacea. The original strategy suggested by Jameson is more workable, although the "merge" step will be harder, and the sheer amount of duplicated information is troubling. Now I'm wondering about saving the roots table for each method in a .ji file, comparing it against the master copy, and renumbering things as you deserialize the file. #32705 (comment) might be useful to prevent problems from rehashing, though. |
Why is that? If you append only, the old definitions don't lose their meaning. And it's possible to "merge" new definitions by translating their indices (to avoid duplicates). I still don't understand where is the root of the problem. When Julia compiles a new method, it adds the definition somewhere. If it added to roots, we could just rewrite the whole file. But if not, why can't we take that definition, and serialize it / deserialize it on its own? |
Most things are possible, it's just a question of doing it 😄. The strategies you're proposing about merging are the same ones I'm proposing. Whip out that editor and go for it! |
do i understand correctly that this PR hopes to make it possible to cache the native code compiled in a session so that it can be subsequently re-used in a new session thereby avoiding the JIT lag? that would be huge. kinda like an on-the-fly PackageCompiler but without modifying the sysimage? |
It's not specifically addressed at the question of native code, more about being able to pick up some stragglers in our current precompilation. However, with the invalidation work having recently achieved quite a lot of success, the biggest obstacle to working on native-code caching is basically gone. |
Friendly bump; now with so many other improvements to latency, I'm seeing lots of examples where inference time is a major part of getting real work done. Here's an example of profiling ProfileView on its first run (with First block (gray-dominated) is inference, second (yellow-dominated) is LLVM, third is mixed (including some computation), and fourth is borked pointers. My best result with I've added a ton of precompile statements but many of them don't "take," presumably because of the reason described in the top post. It sure would be some nice icing on the latency cake if we could make progress on this. It's more than I'm prepared to handle solo, but I'm happy to help as I can. |
For some reason (perhaps #32705?) most or all of these fail if they are emitted as precompile statements, so this moves them into Base itself. This drops the time for a revision down to 1.85s.
This is probably in the right direction, but needs more work and is heavily conflicted now. Maybe next hackathon we'll be able to get more progress |
For some reason (perhaps JuliaLang#32705?) most or all of these fail if they are emitted as precompile statements, so this moves them into Base itself. This drops the time for a revision down to 1.85s.
What about changing the format of the roots table to something like |
Bump |
This is a hackathon project by @vtjnash (🧠) and myself (💪). This is partway towards the end goal (with the harder thinking still to go).
As I understand it, the problem is that inferred CodeInfo objects are serialized by putting all objects into the method's
roots
table, and then each specialization gets stored by referring to these objects via their index. There is oneroots
table for all the specializations, and this has an important consequence: when you compile for new input types, what happens frequently is that you need new objects added toroots
, and this changes the numbering of items inroots
. Consequently the lookup is state-dependent ("which specializations do you happen to have compiled?"), and thus we can't merge this info into other things. To avoid such problems, currently we refuse to save inference info for methods for which additional specializations would change the roots table. Since that's a large fraction of all methods, that means in practice we save relatively little. This PR is designed to eliminate such confusion by ensuring that each CodeInstance gets its own private roots table.In the current state, we may have successfully added the new field and gotten Julia through bootstrap. If this were relatively neutral in terms of system image size and build time, it could presumably be merged at any point despite not yet doing anything useful. However, the
.data
field of theso
is 125MB vs its expected 118MB, suggesting it should wait for more improvement.