Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pkg.add on repo w/ many branches can be extremely slow #2329

Open
quinnj opened this issue Jan 11, 2021 · 10 comments · May be fixed by #2330
Open

Pkg.add on repo w/ many branches can be extremely slow #2329

quinnj opened this issue Jan 11, 2021 · 10 comments · May be fixed by #2330

Comments

@quinnj
Copy link
Member

quinnj commented Jan 11, 2021

Trying to run:

Pkg.add(url="https://github.com/apache/arrow", subdir="julia/Arrow")

takes an extremely long time (though I'm told some people were patient enough to have it actually finish). It was suggested that it's probably due to the apache/arrow repo having a ton of branches and that changing this line to:

const refspecs = ["+refs/heads/*:refs/remotes/cache/*"]

might solve the problem.

@DilumAluthge
Copy link
Member

DilumAluthge commented Jan 11, 2021

What if you pass the rev argument, e.g. subdir = "foo", rev = "master"?

Is it any faster that way?

@DilumAluthge
Copy link
Member

On a separate note, how long does it take if you go to the command line and do a git clone of the repo? And then just do e.g. ] dev .. Is that faster or slower than the Pkg.add?

@quinnj
Copy link
Member Author

quinnj commented Jan 11, 2021

Adding the ref = "master" didn't seem to make any difference; it was still at least minutes of waiting, but I never let it actually finish.

@DilumAluthge
Copy link
Member

Just to double check, you did rev = "master", right?

@DilumAluthge
Copy link
Member

Try doing just a regular command line git clone and see how long that takes.

@quinnj
Copy link
Member Author

quinnj commented Jan 11, 2021

Hmmm, doing a plain git clone https://github.com/apache/arrow took 1 min 18 sec. I just pulled julia master, rebuilt, and ran the original command from above and it took 118 seconds, which is much better than before (though obviously still pretty slow). I'm not sure if something changed from the Julia version I was using? I rebuilt julia master probably a few weeks ago.

@DilumAluthge
Copy link
Member

Can you try this out? For me, it takes 22 seconds.

import Pkg

function dev_single_branch(; name::AbstractString,
                             url::AbstractString,
                             branch::AbstractString,
                             force::Bool = false,
                             parent_directory::Union{AbstractString, Nothing} = nothing,
                             subdir::Union{AbstractString, Nothing} = nothing)
    if parent_directory isa Nothing
        clone_directory = joinpath(DEPOT_PATH[1], "dev", name)
    else 
        clone_directory = joinpath(parent_directory, name)
    end
    if force || !ispath(clone_directory)
        rm(clone_directory; force = true, recursive = true)
        mkpath(dirname(clone_directory))
    else 
        msg = "The path \"$(clone_directory)\" already exists. Use `force =true` to overwrite the existing path."
        throw(ArgumentError(msg))
    end
    run(`git clone --depth=1 --single-branch --branch $(branch) $(url) $(clone_directory)`)
    if subdir isa Nothing 
        path = clone_directory
    else 
        path = joinpath(clone_directory, subdir)
    end
    Pkg.develop(path = path)
end
julia> @time dev_single_branch(; name = "Arrow", url = "https://github.com/apache/arrow", branch = "master", subdir = "julia/Arrow")
Cloning into '/Users/dilum/.julia/dev/Arrow'...
remote: Enumerating objects: 5873, done.
remote: Counting objects: 100% (5873/5873), done.
remote: Compressing objects: 100% (4763/4763), done.
remote: Total 5873 (delta 2075), reused 2317 (delta 829), pack-reused 0
Receiving objects: 100% (5873/5873), 9.10 MiB | 10.08 MiB/s, done.
Resolving deltas: 100% (2075/2075), done.
Updating files: 100% (5221/5221), done.
Path `/Users/dilum/.julia/dev/Arrow/julia/Arrow` exists and looks like the correct package. Using existing path.
   Resolving package versions...
   Installed XML2_jll ──────────────────── v2.9.10+3
   Installed PooledArrays ──────────────── v0.5.3
   Installed Libiconv_jll ──────────────── v1.16.0+7
   Installed Lz4_jll ───────────────────── v1.9.2+2
   Installed IteratorInterfaceExtensions ─ v1.0.0
   Installed EzXML ─────────────────────── v1.1.0
   Installed RecipesBase ───────────────── v1.1.1
   Installed CodecLz4 ──────────────────── v0.4.0
   Installed TableTraits ───────────────── v1.0.0
   Installed Tables ────────────────────── v1.2.2
   Installed DataValueInterfaces ───────── v1.0.0
   Installed CodecZstd ─────────────────── v0.7.0
   Installed BitIntegers ───────────────── v0.2.4
   Installed SentinelArrays ────────────── v1.2.16
   Installed ExprTools ─────────────────── v0.1.3
   Installed JLLWrappers ───────────────── v1.2.0
   Installed Mocking ───────────────────── v0.7.1
   Installed TranscodingStreams ────────── v0.9.5
   Installed Zstd_jll ──────────────────── v1.4.5+2
   Installed DataAPI ───────────────────── v1.4.0
   Installed TimeZones ─────────────────── v1.5.3
  Downloaded artifact: Lz4
  Downloaded artifact: Libiconv
  Downloaded artifact: XML2
  Downloaded artifact: Zstd
Updating `~/.julia/environments/v1.7/Project.toml`
  [69666777] + Arrow v1.1.0 `~/.julia/dev/Arrow/julia/Arrow`
Updating `~/.julia/environments/v1.7/Manifest.toml`
  [69666777] + Arrow v1.1.0 `~/.julia/dev/Arrow/julia/Arrow`
  [c3b6d118] + BitIntegers v0.2.4
  [5ba52731] + CodecLz4 v0.4.0
  [6b39b394] + CodecZstd v0.7.0
  [9a962f9c] + DataAPI v1.4.0
  [e2d170a0] + DataValueInterfaces v1.0.0
  [e2ba6199] + ExprTools v0.1.3
  [8f5d6c58] + EzXML v1.1.0
  [82899510] + IteratorInterfaceExtensions v1.0.0
  [692b3bcd] + JLLWrappers v1.2.0
  [78c3b35d] + Mocking v0.7.1
  [2dfb63ee] + PooledArrays v0.5.3
  [3cdcf5f2] + RecipesBase v1.1.1
  [91c51154] + SentinelArrays v1.2.16
  [3783bdb8] + TableTraits v1.0.0
  [bd369af6] + Tables v1.2.2
  [f269a46b] + TimeZones v1.5.3
  [3bb67fe8] + TranscodingStreams v0.9.5
  [94ce4f54] + Libiconv_jll v1.16.0+7
  [5ced341a] + Lz4_jll v1.9.2+2
  [02c8fc9c] + XML2_jll v2.9.10+3
  [3161d3a3] + Zstd_jll v1.4.5+2
  [0dad84c5] + ArgTools
  [56f22d72] + Artifacts
  [2a0f44e3] + Base64
  [ade2ca70] + Dates
  [f43a241f] + Downloads
  [b77e0a4c] + InteractiveUtils
  [b27032c2] + LibCURL
  [76f85450] + LibGit2
  [8f399da3] + Libdl
  [37e2e46d] + LinearAlgebra
  [56ddb016] + Logging
  [d6f4376e] + Markdown
  [a63ad114] + Mmap
  [ca575930] + NetworkOptions
  [44cfe95a] + Pkg
  [de0858da] + Printf
  [3fa0cd96] + REPL
  [9a3f8284] + Random
  [ea8e919c] + SHA
  [9e88b42a] + Serialization
  [6462fe0b] + Sockets
  [fa267f1f] + TOML
  [a4e569a6] + Tar
  [8dfed614] + Test
  [cf7118a7] + UUIDs
  [4ec0a83e] + Unicode
  [deac9b47] + LibCURL_jll
  [29816b5a] + LibSSH2_jll
  [c8ffd9c3] + MbedTLS_jll
  [14a3606d] + MozillaCACerts_jll
  [83775a58] + Zlib_jll
  [8e850ede] + nghttp2_jll
    Building TimeZones  `~/.julia/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/4ba8a9579a243400db412b50300cd61d7447e583/build.log`
 21.459417 seconds (3.60 M allocations: 291.668 MiB, 1.70% gc time)

julia> import Arrow
[ Info: Precompiling Arrow [69666777-d1a9-59fb-9406-91d4454c9d45]

@quinnj
Copy link
Member Author

quinnj commented Jan 14, 2021

Yes, that works and installs in 7 seconds for me (I think I already had several dependencies installed, so it wasn't a completely clean install).

@KristofferC
Copy link
Member

I still think this issue is valid, or?

@quinnj
Copy link
Member Author

quinnj commented Nov 23, 2022

Last I can remember, it was more a temporary thing; and then I just forgot about it for a long time. Either way, the arrow setup is not the same anymore, so I can't really follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants