How precompile files are loaded need to change if using multiple projects are going to be pleasant #27418

KristofferC · 2018-06-04T11:31:10Z

Precompile files are currently stored only based on the UUID of the package.
So if you change your project it is likely that you will have to recompile everything. And then again when you swap back etc.
This will be very annoying for people trying to use multiple packages and people will likely just use one mega project like before.
#26165 also removed any possibility for users to change the precompile path so there is no way to workaround this right now.

We should be smarter how we save precompile file to reduce the amount of recompilation needed. A very simple system is to just use one precompile directory for each project but that might be a bit wasteful since it is theoretically possible to share compilation files between projects.

StefanKarpinski · 2018-06-04T19:59:56Z

We'll need advice and input from @vtjnash on this one.

tkf · 2018-09-01T02:21:13Z

Could you consider #28518 when fixing it?

Re: implementation, I suppose you can de-duplicate precompile cache by using hash tree? What I mean by that is to generate the path of the precompile file using a hash that depends on its own git-tree-sha1 (or version?) and the hash of all of its dependencies. What I suggested in #28518 was to make it also depend on the package options (JuliaLang/Juleps#38).

ref: JuliaPy/pyjulia#173

musm · 2018-10-25T19:21:58Z

Chiming in that for me this is pretty useful

Argument:
I have dev shared environment .
What is annoying is that I have to recompile my 'clean' environment whenever I work on the development packages and then switch back to my clean environemnt, even though none of the packages in the default environment have been touched.

At least an optional flag for new environments not to share the the precompile cache would be awesome.

tkf · 2018-10-30T02:12:21Z

As different system images may contain different versions of packages, I suppose it makes sense for the cache path to depend on (say) the path of the system image as well? I think it also helps to decouple stdlib more from Julia core.

tkf · 2018-11-04T23:27:07Z

@StefanKarpinski I don't think implementing what I suggested above #27418 (comment) is difficult. Does this conceptually work?

function cache_path_slug(env::Pkg.Types.EnvCache, uuid::Base.UUID)
    info = Pkg.Types.manifest_info(env, uuid)
    crc = 0x00000000
    if haskey(info, "deps")
        for dep_uuid in sort(Base.UUID.(values(info["deps"])))
            slug = cache_path_slug(env, dep_uuid)
            crc = Base._crc32c(slug, crc)
        end
    end
    crc = Base._crc32c(uuid, crc)
    if haskey(info, "git-tree-sha1")
        crc = Base._crc32c(info["git-tree-sha1"], crc)
    end
    # crc = _crc32c(unsafe_string(JLOptions().image_file), crc)
    return Base.slug(crc, 5)
end

cache_path_slug(Pkg.Types.EnvCache(), Base.identify_package("Compat").uuid)

(By "conceptually", I mean that I'm grossing over that probably Base shouldn't be using Pkg. Also, above function as-is without memoization may be bad for large dependency trees.)

Some possible flaws I noticed:

It requires GC. But I guess we can do that it in Pkg.gc.
It is supposed to share the common sub-tree of the projects. However, unless the projects are updated at the same time, none of them share substantial sub-tree (e.g., if they have different Compat.jl version, they probably do not share anything.). I'm not sure how problematic it can be.

timholy · 2018-12-03T16:21:33Z

Related: I was benchmarking julia master vs. a branch using two different directories & builds. The two compete against one another for the ownership of the compiled package files.

cstjean · 2018-12-03T16:32:01Z

FWIW, we found that using a different DEPOT_PATH for each frequently-used environment is a decent (if cumbersome) work-around until there's a fix.

timholy · 2018-12-03T20:59:48Z

That's what I was doing too but recently I ran into a case where, surprisingly, that didn't work. I was rushing and didn't have time to document it, but I will see if I can remember what was involved.

JeffreySarnoff · 2018-12-05T17:15:05Z

tangentially adjacent or interwoven?
every time I make a change to julia source code in ArbNumerics, pkg insists on regenerating all the c library files oblivious to the fact that nothing at all has occured which benefits therefrom

timholy · 2018-12-30T10:54:09Z

This is also the cause of timholy/Revise.jl#205

bjarthur · 2019-01-14T19:26:55Z

yowza. could we please prioritize this with a milestone?

jpsamaroo · 2019-01-14T20:23:07Z

We should be smarter how we save precompile file to reduce the amount of recompilation needed. A very simple system is to just use one precompile directory for each project but that might be a bit wasteful since it is theoretically possible to share compilation files between projects.

Under what conditions can it be guaranteed that one or more precompile files are shareable? If we can nail down the varying inputs to precompilation, it should at least be possible to put in a hack to stop truly unnecessary precompilations, at least until a better mechanism is devised.

montyvesselinov · 2019-03-08T18:16:01Z

My home dir is typically shared by many different machines (os/proc type). I would need to have different build place for *.ji files related to each machine.

DEPOT_PATH defines the location /Users/monty/.julia

So i will need /Users/monty/.julia-redhat, /Users/monty/.julia-linux, /Users/monty/.julia-ubuntu14, /Users/monty/.julia-ubuntu16, etc.

Is there a better way?

tkf · 2019-03-26T02:44:44Z

I'm replying to @jpsamaroo's comment in this discourse thread here since this discussion belongs to here than there. Please read my comment (and the follow-up) and @jpsamaroo's comment for the full context.

Therefore, I think initially we should focus on just precompiling each project which is loaded in isolation, before any further activate s occur.

I think it does not handle many common cases. For example, if you have using Revise in startup.jl then you can't capture even the first activate in this scheme. Also, what do you do after first activate? Switch to a --compiled-modules=no mode (I don't know if you can toggle this flag dynamically)? Since you also need to address chicken-and-egg problem in this approach by adding a TOML-parser in Base or persistent cache (or something else) to get dependencies before locating cache path (they are hard problem on its own), and since we know that this cannot capture many use-cases, I think it makes sense to implement the fully dynamic solution ("in-memory dependency tree") from the get go.

But I actually don't know if it is such a bad idea as the first implementation. As switching project trigger precompilation anyway ATM, it is an improvement if julia automatically turns off recompilation. Also, if people care reproducibility maybe they use --project/JULIA_PROJECT most of the time. In that case, full dynamism may not be required for precompilation. Also, a con for fully dynamic solution is "GC" of *.ji files. It'll create more precompilation files than static solution and it's hard to know what files are needed or not.

jpsamaroo · 2019-03-26T11:44:13Z

I'd be interested in elaboration on this "in-memory dependency tree" and how it can solve the issue of dynamic activations. I only consider my "solution" a temporary improvement for certain commons cases anyway, but you're definitely right that it might make other common cases worse instead of better.

oxinabox · 2019-03-26T12:17:08Z

I don't see why we can't just have 1 complile cache directory, per exact stack of enviroments.
At least as a short term solution.
I feel like this would generally lead to less than 3 compile caches per enviroment.
And sure it might duplicate a bit of compile time but it wouild be less than we have now.

And sure it woud use more harddrive space, but harddrive space is cheap.
Cheaper than my time that I spend waiting for compilation when I switch enviroments.
Probably would want some gc complilecache all to clear all compile caches,
and maybe gc compilcache dead to clear all compile caches that we can no longer locate all Manifest.tomls for.

tkf · 2019-03-27T00:45:53Z

@oxinabox

I don't see why we can't just have 1 complile cache directory, per exact stack of enviroments.
At least as a short term solution.

I think it's not a crazy plan provided that there is a mechanism to switch to the mode that acts like --compiled-modules=no when precompilation does not work.

To illustrate what I mean by "precompilation does not work", consider the following setup:

Default (named) project v1.2 with packages:

A
[email protected] (package C of version 1.0)

custom_project with packages:

B
[email protected] (package C of version 1.1)

Further assume that packages A and B both only require C >= 1.0. (custom_project gets [email protected], e.g., due to the timing it is created.)

If you do

julia> using A  # loads [email protected]

pkg> activate custom_project

julia> using B

this Julia session (hereafter Session 1) loads [email protected] while if you do

pkg> activate custom_project

julia> using A  # loads [email protected]

julia> using B

then this Julia session (hereafter Session 2) loads [email protected]. Notice that at the point using B, both sessions have exactly the same environment stack. However, if you want to precompile package B, you need to compile it with [email protected] in Session 1 and [email protected] in Session 2.

@jpsamaroo This is what I meant by "in-memory dependency tree." The information that [email protected] must be used in Session 1 and that [email protected] must be used in Session 2 is only in the memory of each session. This information has to be passed to the subprocess compiling package B. Actually, "in-memory dependency tree" is misleading and I should have called it "in-memory manifest" which includes the list of exact package versions (or maybe rather file path to the source code directory of the given version ~/.julia/packages/$package_name/$version_slug/).

vtjnash · 2019-03-27T01:55:24Z

This is all great thinking. Unfortunately, the current issue is just so much more mundane than all that. We actually already have all of that great "in-memory dependency tree" logic and stacks of caches and more! So what's the problem, since that's clearly not working for the default user experience? Well, at the end of the precompile step, it goes and garbage collects the old files right away. So there's nary a chance for it to survive for even a brief moment to be found later and used. If it only could just stop doing that until some later explicit step (like the brand new Pkg.gc() operation), life would be much happier for everyone.

jpsamaroo · 2019-03-27T02:52:55Z

Right, that's a good point. But we do still need to ensure we know how to locate the previously-generated *.ji files deterministically in a manner that is guaranteed to load the correct ones. Currently it seems this issue is avoided by blowing everything away and starting from scratch the moment any little thing changes with respect to the conditions that generated the previous *.ji files.

tkf · 2019-03-28T02:15:09Z

We actually already have all of that great "in-memory dependency tree" logic and stacks of caches and more!

@vtjnash Do you mind let us know where it is implemented? The closest thing I could find was Base._concrete_dependencies but it only records the pair of PkgId and build_id. IIUC, the actual dependencies are still recorded in the header of the cache file (together with build_id of them). It's great for integrity check but it looks to me that there are no dependencies (list of upstream packages uuid and version for each package) stored in memory.

staticfloat · 2019-05-16T06:46:32Z

@vtjnash It would be great if you could elucidate a little more concretely what needs to change inside of base; I don't quite follow precisely what needs to change. Clearly the naming of precompile files needs to change, and I think what you're saying is that we need a way to determine which precompile files are used and which are not used so that we don't just slowly fill up a disk with stale precompile caches?

staticfloat · 2019-06-05T17:00:16Z

Another perspective; there are situations where having user-control over which precompile file gets loaded is desirable. Let us imagine a user wanting to distribute a docker container with Julia GPU packages pre-installed; the Julia GPU packages need to do some setup when they see a new generation of GPU hardware attached, and so right now in the docker container we are forced to set JULIA_DEPOT_PATH=~/.julia_for_hardware_x, precompile for all different configurations in a for loop (with different hardware attached each time), then ship the whole thing to the user. (This is to avoid needing to precompile every time you launch the docker container)

It would be much preferable if there were some kind of mechanism that allowed packages to expose a user-defined function that gets called to add some salt into the hash; an extremely coarse-grained version could be an environment variable JULIA_CODELOAD_SALT=hardware-x, which would then shift ALL precompile files by the hash of that string, (thereby saving on space by having multiple depots) but I could imagine finer-grained versions as well.

Of course, the problem of how to intelligently garbage collect these files remains.

tkf · 2019-06-06T00:42:43Z

Yes, it would be nice to integrate this with package options JuliaLang/Juleps#38

Meanwhile, you can build a patched system image with which you can add arbitrary salt to it via an environment variable. This works because child processes (which precompile Julia packages) inherit environment variables. More precisely, here is the code snippet that does this (used in jlm; a similar trick is also used in PyJulia):

Base.eval(Base, quote
    function package_slug(uuid::UUID, p::Int=5)
        crc = _crc32c(uuid)
        crc = _crc32c(unsafe_string(JLOptions().image_file), crc)
        crc = _crc32c(get(ENV, "JLM_PRECOMPILE_KEY", ""), crc)
        return slug(crc, p)
    end
end)

(You can get this system image by running JuliaManager.compile_patched_sysimage("PATH/TO/NEW/sys.so").)

AndersBlomdell · 2020-09-04T17:23:50Z

I would very much like a functionality like this for our lab computers, since it would make it possible to have multiple precompiled versions of commonly used libraries. Attached is a simplistic patch that adds the same version slug that is used in
./packages/<name>/<slug/>

loading.jl.patch.txt

StefanKarpinski · 2020-09-09T17:33:30Z

This was already implemented in 2019.

denizyuret · 2020-09-10T05:34:53Z

Stefan: is there documentation that specifies exactly what parts of the system should be placed where, or at least an example configuration for such a centralized read-only setup? Where does the compiled directory go, what about packages that the user explicitly wants to override etc.

…

On Wed, Sep 9, 2020 at 8:33 PM Stefan Karpinski ***@***.***> wrote: This was already implemented in 2019. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#27418 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAN43JUUZ7OTUZHYJOMPDVDSE637ZANCNFSM4FDDVQ7Q> .

AndersBlomdell · 2020-09-10T08:45:39Z

This was already implemented in 2019.

Actually, no; loading.jl uses three different slugs:

function package_slug(uuid::UUID, p::Int=5) PKG-SLUG used for
- determine what goes in the cache_file_entry
function version_slug(uuid::UUID, sha1::SHA1, p::Int=5) VER-SLUG based on package UUID
and directory hash used for
- locating the requested package in explicit_manifest_uuid_path

project_precompile_slug as defined in function compilecache_path(pkg::PkgId)::String PRJ-SLUG

   crc = _crc32c(something(Base.active_project(), ""))
   crc = _crc32c(unsafe_string(JLOptions().image_file), crc)
   crc = _crc32c(unsafe_string(JLOptions().julia_bin), crc)
   project_precompile_slug = slug(crc, 5)```

determine where precompiled code lives

These parts are then used to place package source code in package/<name>/<VER-SLUG>/ and
precompiled code in compiled/v<MAJOR>.<MINOR>>/<name>/<PKG-SLUG>_<PRJ-SLUG>.ji[the validty of the precompiled
code is checked in _require_from_serialized]

With this scheme the number of files in the precompiled files is kept low, since new versions of a precompiled
package will overwrite the old one, there will also be a sharing of compatible precompiled code between project
the same packages, since all precompiled code starting with /<PKG-SLUG>- is checked before a new precompilation
is done. It is not a good scheme for a shared environment, though; I would rather suggest

introduce a new compiled_slug based on the data checked in _require_from_serialized CMP-SLUG
place package source code package/<name>/<VER-SLUG>/ [i.e. no change]
place precompiled code in compiled/v<MAJOR>.<MINOR>/<name>/<VER-SLUG>_<CMP-SLUG>.ji
maybe all this should be based on some flag to keep the pressure on the filesystem low for
systems used by a single individual?

BTW: the previous loading.jl.patch contained some bugs, so here we go again
julia-loading.jl.patch.txt

KristofferC added the packages Package management and loading label Jun 4, 2018

This was referenced Sep 13, 2018

Julia 1.0 Error : julia.core.JuliaError: Exception 'UndefVarError' occurred while calling julia code: JuliaPy/pyjulia#196

Closed

PyJulia does not work with Julia 1.0 & 0.7 with Python installed via Conda/Ubuntu/(what else?) JuliaPy/pyjulia#185

Open

tkf mentioned this issue Oct 30, 2018

Idea: use PackageCompiler.jl to avoid the precompilation cache nightmare? JuliaPy/pyjulia#217

Closed

tkf mentioned this issue Nov 3, 2018

Suggestion: Use different precompilation cache path for different system image #29914

Merged

timholy mentioned this issue Dec 30, 2018

Error in multiple environments timholy/Revise.jl#205

Closed

fredrikekre mentioned this issue Jan 14, 2019

environments not isolated JuliaLang/Pkg.jl#994

Closed

tkf mentioned this issue Apr 7, 2019

Suggestion: Do not mark BUILTIN_FILE as a precompilation dependency JuliaDebug/JuliaInterpreter.jl#271

Closed

tkf mentioned this issue May 16, 2019

Reduce cache clobbering within cache_file_entry() #32042

Closed

tkf mentioned this issue Jun 25, 2019

[WIP/RFC] Conditional dependencies and package features JuliaLang/Pkg.jl#977

Closed

tkf mentioned this issue Jul 19, 2019

Support for custom sysimg julia-vscode/julia-vscode#761

Merged

KristofferC mentioned this issue Jul 22, 2019

add an LRU cache to precompile files #32651

Merged

andreasnoack closed this as completed in #32651 Aug 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How precompile files are loaded need to change if using multiple projects are going to be pleasant #27418

How precompile files are loaded need to change if using multiple projects are going to be pleasant #27418

KristofferC commented Jun 4, 2018

StefanKarpinski commented Jun 4, 2018

tkf commented Sep 1, 2018

musm commented Oct 25, 2018

tkf commented Oct 30, 2018

tkf commented Nov 4, 2018

timholy commented Dec 3, 2018

cstjean commented Dec 3, 2018

timholy commented Dec 3, 2018

JeffreySarnoff commented Dec 5, 2018

timholy commented Dec 30, 2018

bjarthur commented Jan 14, 2019

jpsamaroo commented Jan 14, 2019

montyvesselinov commented Mar 8, 2019 •

edited

Loading

tkf commented Mar 26, 2019

jpsamaroo commented Mar 26, 2019

oxinabox commented Mar 26, 2019 •

edited

Loading

tkf commented Mar 27, 2019

vtjnash commented Mar 27, 2019

jpsamaroo commented Mar 27, 2019

tkf commented Mar 28, 2019

staticfloat commented May 16, 2019

staticfloat commented Jun 5, 2019

tkf commented Jun 6, 2019 •

edited

Loading

AndersBlomdell commented Sep 4, 2020

StefanKarpinski commented Sep 9, 2020

denizyuret commented Sep 10, 2020 via email

AndersBlomdell commented Sep 10, 2020 •

edited

Loading

How precompile files are loaded need to change if using multiple projects are going to be pleasant #27418

How precompile files are loaded need to change if using multiple projects are going to be pleasant #27418

Comments

KristofferC commented Jun 4, 2018

StefanKarpinski commented Jun 4, 2018

tkf commented Sep 1, 2018

musm commented Oct 25, 2018

tkf commented Oct 30, 2018

tkf commented Nov 4, 2018

timholy commented Dec 3, 2018

cstjean commented Dec 3, 2018

timholy commented Dec 3, 2018

JeffreySarnoff commented Dec 5, 2018

timholy commented Dec 30, 2018

bjarthur commented Jan 14, 2019

jpsamaroo commented Jan 14, 2019

montyvesselinov commented Mar 8, 2019 • edited Loading

tkf commented Mar 26, 2019

jpsamaroo commented Mar 26, 2019

oxinabox commented Mar 26, 2019 • edited Loading

tkf commented Mar 27, 2019

vtjnash commented Mar 27, 2019

jpsamaroo commented Mar 27, 2019

tkf commented Mar 28, 2019

staticfloat commented May 16, 2019

staticfloat commented Jun 5, 2019

tkf commented Jun 6, 2019 • edited Loading

AndersBlomdell commented Sep 4, 2020

StefanKarpinski commented Sep 9, 2020

denizyuret commented Sep 10, 2020 via email

AndersBlomdell commented Sep 10, 2020 • edited Loading

montyvesselinov commented Mar 8, 2019 •

edited

Loading

oxinabox commented Mar 26, 2019 •

edited

Loading

tkf commented Jun 6, 2019 •

edited

Loading

AndersBlomdell commented Sep 10, 2020 •

edited

Loading