Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated calls to isfile to look for identical ji files in precompile causes high disk pressure #56366

Closed
KristofferC opened this issue Oct 28, 2024 · 0 comments · Fixed by #56369
Labels
packages Package management and loading performance Must go faster

Comments

@KristofferC
Copy link
Member

Running something like:

@eval Base begin
    const isfile_cnt = Dict{String, Int64}()
    function Base.Filesystem.isfile(st::StatStruct)
        if haskey(isfile_cnt, st.desc)
            isfile_cnt[st.desc] += 1
        else
            isfile_cnt[st.desc] = 1
        end
        filemode(st) & 0xf000 == 0x8000
    end
end

empty!(Base.isfile_cnt)

using Pkg
Pkg.precompile()

@show sort(collect(Base.isfile_cnt); by=x->x[2], rev=true)

gives for a big environment (e.g. DifferentialEquations.jl):

"~/.julia/compiled/v1.10/Preferences/pWSk8_J52EP.ji"         => 152
"~/.julia/compiled/v1.10/Preferences/pWSk8_bmGsQ.ji"         => 152
"~/julia/compiled/v1.10/SuiteSparse_jll/ME9At_BncKj.ji"      => 124
"~/julia/compiled/v1.10/SuiteSparse_jll/ME9At_rJPx9.ji"      => 124
"~/julia/compiled/v1.10/SuiteSparse_jll/ME9At_9T2BZ.ji"      => 124
"~/.julia/compiled/v1.10/PrecompileTools/AQ9Mk_bmGsQ.ji"     => 123
"~/.julia/compiled/v1.10/PrecompileTools/AQ9Mk_J52EP.ji"     => 123
"~/julia/compiled/v1.10/SparseArrays/P9ieR_eNkqo.ji"         => 122
"~/julia/compiled/v1.10/SparseArrays/P9ieR_BncKj.ji"         => 122
"~/julia/compiled/v1.10/SparseArrays/P9ieR_rJPx9.ji"         => 122
"~/.julia/compiled/v1.10/ArrayInterface/7bROb_EHPmX.ji"      => 112
"~/.julia/compiled/v1.10/DocStringExtensions/KRdZs_bmGsQ.ji" => 111

where we are checking the same file over and over.

This seems to come from

modpaths = find_all_in_cache_path(modkey)

Noticed by @topolarity when a no-op instantiate was slow on Windows

@KristofferC KristofferC added packages Package management and loading performance Must go faster labels Oct 28, 2024
KristofferC added a commit that referenced this issue Nov 11, 2024
#56369)

Before (in an environment with DifferentialEquations.jl):

```julia
julia> @time Pkg.precompile()
  0.733576 seconds (3.44 M allocations: 283.676 MiB, 6.24% gc time)

julia> isfile_calls[1:10]
10-element Vector{Pair{String, Int64}}:
        "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/Printf/3FQLY_zHycD.ji" => 178
        "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/Printf/3FQLY_xxrt3.ji" => 178
         "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/Dates/p8See_xxrt3.ji" => 158
         "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/Dates/p8See_zHycD.ji" => 158
          "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/TOML/mjrwE_zHycD.ji" => 155
          "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/TOML/mjrwE_xxrt3.ji" => 155
                                     "/home/kc/.julia/compiled/v1.12/Preferences/pWSk8_4Qv86.ji" => 152
                                     "/home/kc/.julia/compiled/v1.12/Preferences/pWSk8_juhqb.ji" => 152
 "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/StyledStrings/UcVoM_zHycD.ji" => 144
 "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/StyledStrings/UcVoM_xxrt3.ji" => 144
 ```

After:

```julia
julia> @time Pkg.precompile()
  0.460077 seconds (877.59 k allocations: 108.075 MiB, 4.77% gc time)

julia> isfile_calls[1:10]
  10-element Vector{Pair{String, Int64}}:
"/tmp/jl_a5xFWK/Project.toml" => 15
"/tmp/jl_a5xFWK/Manifest.toml" => 7
"/home/kc/.julia/registries/General.toml" => 6

"/home/kc/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Markdown/src/Markdown.jl"
=> 3

"/home/kc/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Serialization/src/Serialization.jl"
=> 3

"/home/kc/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Distributed/src/Distributed.jl"
=> 3

"/home/kc/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/UUIDs/src/UUIDs.jl"
=> 3

"/home/kc/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/LibCURL/src/LibCURL.jl"
=> 3
```

Performance is improved and we are not calling `isfile` on a bunch of the same ji files hundreds times.

Benchmark is made on a linux machine so performance diff should be a lot better on Windows where these `isfile_casesensitive` call is much more expensive.

Fixes #56366

---------

Co-authored-by: KristofferC <[email protected]>
Co-authored-by: Ian Butterworth <[email protected]>
(cherry picked from commit 9850a38)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
packages Package management and loading performance Must go faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant