Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git_add() slow #242

Open
krlmlr opened this issue Nov 16, 2024 · 3 comments
Open

git_add() slow #242

krlmlr opened this issue Nov 16, 2024 · 3 comments

Comments

@krlmlr
Copy link
Member

krlmlr commented Nov 16, 2024

These measurements surprised me:

dir <- withr::local_tempdir()
setwd(dir)

gert::git_init()
writeLines(character(), "a.txt")
system.time(gert::git_add("."))
#>    user  system elapsed 
#>   0.019   0.002   0.105
gert::git_commit(message = "Test")
#> [1] "b49672895dc220cb508a17b1b716558b9d11fb6a"

Created on 2024-11-16 with reprex v2.1.1

Running git as an external process is much faster:

dir <- withr::local_tempdir()
setwd(dir)

gert::git_init()
writeLines(character(), "a.txt")
system.time(system("git add a.txt"))
#>    user  system elapsed 
#>   0.003   0.005   0.037
gert::git_commit(message = "Test")
#> [1] "521a3b6ee3e59e6990b37e6f61da3ed17c0fb623"

Created on 2024-11-16 with reprex v2.1.1

git2r seems to have the same problems. What is libgit2 doing there?

The implementation seems to not use the results of normalizePath(), and 20 out of the 100 milliseconds are spent in git_status(), regardless if the caller cares about the result. Need to understand why R_git_repository_add is slow.

gert::git_add
#> function (files, force = FALSE, repo = ".") 
#> {
#>     repo <- git_open(repo)
#>     info <- git_info(repo)
#>     normalizePath(file.path(info$path, files), mustWork = FALSE)
#>     force <- as.logical(force)
#>     .Call(R_git_repository_add, repo, files, force)
#>     git_status(repo = repo)
#> }
#> <bytecode: 0x113181fa8>
#> <environment: namespace:gert>

Created on 2024-11-16 with reprex v2.1.1

@jeroen
Copy link
Member

jeroen commented Nov 16, 2024

Does the overhead get bigger when adding many files at once? Or is it a fixed overhead of 0.1sec?

@krlmlr
Copy link
Member Author

krlmlr commented Nov 16, 2024

There's a substantial cost per file. With 100 files, gert needs almost 6 seconds:

N <- 100

dir <- withr::local_tempdir()
setwd(dir)

gert::git_init()
for (i in 1:N) {
  writeLines(character(), paste0(i, ".txt"))
}
system.time(gert::git_add("."))
#>    user  system elapsed 
#>   0.013   0.047   5.499
nrow(gert::git_status())
#> [1] 100
gert::git_commit(message = "Test")
#> [1] "05e7dd6d0077f78dada74864b44866ae8fb8e976"


dir <- withr::local_tempdir()
setwd(dir)

gert::git_init()
for (i in 1:N) {
  writeLines(character(), paste0(i, ".txt"))
}
system.time(system2("git", args = c("add", ".")))
#>    user  system elapsed 
#>   0.003   0.007   0.025
nrow(gert::git_status())
#> [1] 100
gert::git_commit(message = "Test")
#> [1] "05e7dd6d0077f78dada74864b44866ae8fb8e976"

Created on 2024-11-16 with reprex v2.1.1

@krlmlr
Copy link
Member Author

krlmlr commented Nov 16, 2024

I truly wonder why the commit hash comes out as the same in the first example. Does the commit hash uses the system time with accuracy to the second only?

Same timing (but different commit hashes) if I swap:

N <- 100

dir <- withr::local_tempdir()
setwd(dir)

gert::git_init()
for (i in 1:N) {
  writeLines(character(), paste0(i, ".txt"))
}
system.time(system2("git", args = c("add", ".")))
#>    user  system elapsed 
#>   0.004   0.007   0.039
nrow(gert::git_status())
#> [1] 100
gert::git_commit(message = "Test")
#> [1] "5eb64df18ee4bb3cc5371faf644833a23e04216c"

dir <- withr::local_tempdir()
setwd(dir)

gert::git_init()
for (i in 1:N) {
  writeLines(character(), paste0(i, ".txt"))
}
system.time(gert::git_add("."))
#>    user  system elapsed 
#>   0.011   0.043   5.014
nrow(gert::git_status())
#> [1] 100
gert::git_commit(message = "Test")
#> [1] "933c72ecc0ce1bd962fddbeb55b916e94abc1717"

Created on 2024-11-16 with reprex v2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants