Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Programs installed by pkgx needs to be quicker #991

Closed
felipecrs opened this issue Apr 3, 2024 · 17 comments · Fixed by #1008
Closed

Programs installed by pkgx needs to be quicker #991

felipecrs opened this issue Apr 3, 2024 · 17 comments · Fixed by #1008

Comments

@felipecrs
Copy link
Contributor

felipecrs commented Apr 3, 2024

It would be great if the programs installed by pkgx were quicker. I use pkgx install in my devcontainers to ensure the same environment is used by both developers and CI/CD (Jenkins).

However, I'm noticing a very significant increase in the time it takes to execute the same scripts when using pkgx installed programs. One of my pipelines that usually takes 10 minutes started to take 15 minutes after switching to pkgx.

For a comparison:

> pkgx install [email protected] && ~/.local/bin/node --version

# using pkgx's shim to execute node
> time ~/.local/bin/node --version
v20.12.1
~/.local/bin/node --version  0.37s user 0.30s system 128% cpu 0.519 total

# pkgx's node without any shim
> env [email protected]

> time node --version
v20.12.1
node --version  0.00s user 0.00s system 59% cpu 0.006 total

> volta install [email protected] && ~/.volta/bin/node --version

# using volta's shim to execute node
> time ~/.volta/bin/node --version
v20.12.1
~/.volta/bin/node --version  0.00s user 0.00s system 30% cpu 0.006 total

# volta's node without any shim
> time $(volta which node) --version
v20.12.1
$(volta which node) --version  0.00s user 0.00s system 11% cpu 0.004 total

As you can see, pkgx's shim seems to add 0.5-1s on every invocation. Which is reasonable for day to day use but is unnaceptable for some scripts and environments.

Another comparison is when using aqua-proxy (made with Go), which seems to add around 50ms to each invocation.

Volta's shim is made with Rust, and it is extremely fast. It even looks for a package.json in the current and up folders to find out the proper node version to use. And still, extremely fast.

Maybe shipping some kind of very optimized pkgx-shim binary made with Rust would be the best option in this case.

Copy link

dosubot bot commented Apr 3, 2024

Hey @felipecrs,

Thank you for bringing this to our attention and for the detailed comparison! 🌟 Your insights are invaluable in helping us improve pkgx.

Based on your feedback, it's clear that the performance of pkgx's shim is not on par with your expectations, especially when compared to alternatives like Volta. The delay introduced by our shim, as you've rightly pointed out, can accumulate significantly in scripts and CI/CD environments, impacting overall efficiency.

We're taking your suggestion seriously and are considering the development of a more optimized pkgx-shim, potentially in Rust, to minimize the overhead. Rust's performance characteristics make it an excellent candidate for this kind of tool, as demonstrated by Volta's success.

In the meantime, here are a couple of workarounds that might help mitigate the issue:

  1. Direct Invocation: For critical scripts where performance is paramount, consider invoking the binary directly from its installed location, bypassing the shim. While this approach requires managing the path more explicitly, it can significantly reduce the overhead in time-sensitive operations.

  2. Batching Commands: When possible, batch operations to minimize the number of shim invocations. This approach can help reduce the cumulative delay in scenarios where multiple pkgx-managed tools are used in succession.

We appreciate your patience and understanding as we work on this. Your feedback is crucial in helping us make pkgx better for everyone. Please keep the suggestions coming, and let us know if there are any other ways we can assist!

Best,
Dosu

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@jhheider
Copy link
Contributor

jhheider commented Apr 3, 2024

I think the current optimization would be to add them to your environment:

% pkgx hyperfine "pkgx node --version"
Benchmark 1: pkgx node --version
  Time (mean ± σ):     283.2 ms ±   6.2 ms    [User: 212.4 ms, System: 36.7 ms]
  Range (min … max):   276.8 ms … 297.9 ms    10 runs

% dev
env +nodejs.org
% pkgx hyperfine "node --version"
Benchmark 1: node --version
  Time (mean ± σ):      18.6 ms ±   0.5 ms    [User: 16.2 ms, System: 1.6 ms]
  Range (min … max):    17.8 ms …  20.7 ms    143 runs

the reason is that the dev environment builds the path with your tools, so you don't get those execution lags. if you're using GitHub actions, the dev action, rather than the setup action handles this. In your Jenkins environment, you should be able to do similarly by invoking dev to bring in all the tools defined for your environment.

I agree that the installer shims add too much overhead for those kinds of environments. You can also do the same thing for a script (making it a 1-time cost, instead of a per-call cost) by using a pkgx shebang.

@felipecrs
Copy link
Contributor Author

My intention is to not have to refactor my scripts. They already execute in docker containers, which is supposed to provide them the programs in the environment.

And dev can't be used in containers.

> docker run --rm pkgxdev/pkgx dev
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "dev": executable file not found in $PATH: unknown.

Or any environment where shell is not used to invoke commands.

@jhheider
Copy link
Contributor

jhheider commented Apr 3, 2024

yes, that's a trickier one. max has a few tricks in https://docs.pkgx.sh/run-anywhere/docker, but possibly also something like

RUN pkgx +{list of tools} | sed 's/^/export /' >>.bashrc

could work. definitely a thorny problem.

@felipecrs
Copy link
Contributor Author

That only works if bash is used to invoke programs, which is not always the case for my pipelines. :(

@jhheider
Copy link
Contributor

jhheider commented Apr 3, 2024

yes, it might take some trickery for other shells. if it's just path modifications that are needed, you could probably manipulate that manually as well. just brainstorming paths forward.

@felipecrs
Copy link
Contributor Author

yes, it might take some trickery for other shells.

Especially if no shell is used (docker exec for example).

if it's just path modifications that are needed, you could probably manipulate that manually as well. just brainstorming paths forward.

I think this is impossible within a Dockerfile... You can't generate or change ENV instructions from commands. And will probably never be (moby/moby#29110).

But it is not only about changing PATH. Some pkgx programs has important environment variables like JAVA_HOME for openjdk.

@jhheider
Copy link
Contributor

jhheider commented Apr 3, 2024

yes, we try to avoid needing environment, but it's not perfectly possible.

of course you can always hard-code stuff, but it loses much of the interactive shell magic of pkgx:

RUN pkgx +llvm.org=17.0.1 && ln -s ~/.pkgx/llvm.org/v17.0.1/bin/* /usr/bin/

@felipecrs
Copy link
Contributor Author

felipecrs commented Apr 3, 2024

That's interesting. I built a script like like:

#!/usr/bin/env bash

set -euxo pipefail

pkgx_output=$(
  pkgx \
    +node@20 \
    +npm@10 \
    [email protected] \
    +yq@4 \
    +k3d@5 \
    +helm@3 \
    [email protected] \
    [email protected] \
    [email protected]
)
set +u
eval "${pkgx_output}"
set -u
unset pkgx_output

# create symbolic links for each pkgx binary in $HOME/.local/bin
# shellcheck disable=SC2016
echo "${PATH}" | tr ':' '\n' | grep "^${HOME}/.pkgx/" \
  | xargs -r -n1 bash -c 'ln -vfs "${1}"/* "${HOME}/.local/bin/"' --

# Test
node --version

But then:

+ node --version
node: error while loading shared libraries: libcrypto.so.1.1: cannot open shared object file: No such file or directory

Which probably means nodejs requires some of these extra environment variables as well.

Only some pkgs would be ok with this approach, mainly the ones that has no runtime dependencies.

@felipecrs
Copy link
Contributor Author

Without leaving JS' world:

hyperfine --warmup 10 --shell=none "./deno-test" "./bun-test" './llrt console.js'
Benchmark 1: ./deno-test
  Time (mean ± σ):      21.5 ms ±   1.1 ms    [User: 10.9 ms, System: 9.1 ms]
  Range (min … max):    19.2 ms …  27.8 ms    142 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: ./bun-test
  Time (mean ± σ):      11.7 ms ±   0.7 ms    [User: 4.3 ms, System: 7.1 ms]
  Range (min … max):    10.4 ms …  19.7 ms    277 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 3: ./llrt console.js
  Time (mean ± σ):       3.4 ms ±   0.2 ms    [User: 2.9 ms, System: 1.0 ms]
  Range (min … max):     3.0 ms …   4.1 ms    921 runs
 
Summary
  ./llrt console.js ran
    3.44 ± 0.27 times faster than ./bun-test
    6.34 ± 0.44 times faster than ./deno-test

This indicates that even deno takes 20ms to boot a simple js file. Considering that pkgx takes around 500ms to finally exec the program, it looks like some optimization can be done in pkgx itself as well.

@felipecrs
Copy link
Contributor Author

I was thinking about this and I came up with a very good idea, which I believe is easy enough to be implemented and used until we can make pkgx itself faster.

Basically, make the stubs like:

#!/bin/sh
if [ "$PKGX_UNINSTALL" != 1 ]; then
+ if [ -f "${PKGX_DIR:-"$HOME/.pkgx"}/binenv/node.sh" ]; then
+   set -a
+   . "${PKGX_DIR:-"$HOME/.pkgx"}/binenv/node.sh"
+   exec node "$@"
+ fi
  exec pkgx +nodejs.org -- node "$@"
else
  cd "$(dirname "$0")"
  rm node && echo "uninstalled: nodejs.org" >&2
fi

Then, we also need to ensure pkgx always generates and keeps these binenv files up to date when a sync operation is executed.

For PoC purposes:

#!/bin/sh
if [ "$PKGX_UNINSTALL" != 1 ]; then
  if [ -f "${PKGX_DIR:-"$HOME/.pkgx"}/binenv/node.sh" ]; then
    set -a
    . "${PKGX_DIR:-"$HOME/.pkgx"}/binenv/node.sh"
    exec node "$@"
  fi
+ mkdir -p "${PKGX_DIR:-"$HOME/.pkgx"}/binenv"
+ pkgx +nodejs.org | tee "${PKGX_DIR:-"$HOME/.pkgx"}/binenv/node.sh" >/dev/null

  exec pkgx +nodejs.org -- node "$@"
else
  cd "$(dirname "$0")"
  rm node && echo "uninstalled: nodejs.org" >&2
fi

This writes the binenv file if it hasn't been written yet.

Benchmarking:

image

The results are great. I'd say they pay off easily.

@mxcl what do you think? Would you accept a PR?

@shpoont
Copy link

shpoont commented Apr 13, 2024

I'd like to add that currently pkgx shim is slower than running docker container on Mac.

This:
time pkgx whoami > /dev/null

is slower than this:
time docker run --rm -it alpine whoami > /dev/null

@shpoont
Copy link

shpoont commented Jun 7, 2024

Another note - since pkgx install essentially creates a wrapper for pkgx <command>, it makes anything installed with pkgx also slow.

@felipecrs
Copy link
Contributor Author

@shpoont FYI this was the issue's point already, Programs installed by pkgx.

@yacoob
Copy link

yacoob commented Aug 8, 2024

Just bumped into this, and the slowdown is substantial:

Running rye installed via pkgx:

❯ head $(which rye)
#!/bin/sh
if [ "$PKGX_UNINSTALL" != 1 ]; then
  exec pkgx +rye.astral.sh -- rye "$@"
else
  cd "$(dirname "$0")"
  rm rye && echo "uninstalled: rye.astral.sh" >&2
fi%
❯ time rye --version
rye 0.38.0
commit: 0.38.0 (2024-08-02)
platform: linux (x86_64)
self-python: [email protected]
symlink support: true
uv enabled: true
rye --version  0.618 user 0.288 system 125% cpu (0.722 wasted time).

Running rye directly on the same host:

❯ file $(which rye)
/home/yacoob/.rye/shims/rye: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), static-pie linked, not stripped
❯ time rye --version
rye 0.38.0
commit: 0.38.0 (3e3c8540f 2024-08-02)
platform: linux (x86_64)
self-python: [email protected]
symlink support: true
uv enabled: true
rye --version  0.002 user 0.000 system 48% cpu (0.003 wasted time).

That's two orders of magnitude slower :[. And granted, it's only annoying in case of rye, which you don't run all that often. But then I've installed direnv via pkgx, which starship calls to construct the prompt, and suddenly every prompt takes more than half a second to appear, which feels sluggish.

@mxcl
Copy link
Member

mxcl commented Sep 5, 2024

I have patches to use an sqlite cache for pkg resolution which is as yet not added to any release because the mechanism by which we use sqlite is dodgy.

It would fix this tho.

Sorry for the slowness.

Copy link

dosubot bot commented Sep 5, 2024

I found a similar open issue that might be relevant to your concern:

Regarding the performance of the pkgx shim, there are no existing performance benchmarks or profiling data available in the repository [2]. However, there is a comment in the codebase suggesting a potential efficiency improvement by maintaining a separate write file-pointer [3].

Additionally, there are no current discussions or plans in the repository about optimizing the shim by rewriting it in a more performant language like Rust [2].

Given your observations and comparisons, it might be worth opening a new issue or contributing to the existing discussions to propose the idea of optimizing the pkgx shim, possibly by rewriting it in Rust for better performance.

To continue talking to Dosu, mention @dosu.

@mxcl mxcl closed this as completed in #1008 Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants