Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid redundant package upgrades #25

Open
nicwaller opened this issue Mar 17, 2015 · 2 comments
Open

Avoid redundant package upgrades #25

nicwaller opened this issue Mar 17, 2015 · 2 comments

Comments

@nicwaller
Copy link

Using the :upgrade action with the package provider in this cookbook results in excessive re-installation of packages because install_package is invoked unconditionally.

R will replace the installed package even if a newer version is unavailable, and this can cause problems for any jobs that attempt to make use of the package while it is being re-installed. Since Chef runs every half hour, this can happen frequently on a busy server.

I've written a short Ruby method to generate a bash command that returns true if the package needs to be updated. It does this by comparing the PACKAGES manifest against the DESCRIPTION of the package in the site-library.

def r_package_needs_update(package_name)
  available_packages_file = "/opt/R/src/contrib/PACKAGES"
  installed_package_manifest = "/usr/local/lib/R/site-library/#{package_name}/DESCRIPTION"
  sh_available_version = "awk '/^Package: #{package_name}/ {P=1} P==1 && /^Version:/ {print $2; exit}' #{available_packages_file}"
  sh_installed_version = "awk '/^Version/ {print $2}' #{installed_package_manifest}"
  "[ $(#{sh_available_version}) != $(#{sh_installed_version}) ]"
end

This seems to work, but would need to be generalized instead of hard-coding the paths. The check executes very quickly (~0.01s) compared to calling into R (~0.50s) but I'm not sure if this is a durable approach for checking package versions.

@stevendanna
Copy link
Owner

@nicwaller Apologies that I never responded to this. Somehow notifications got turned off for this repository for me. I'll take a look at the function you've created here to see what we can do.

@nicwaller
Copy link
Author

I had forgotten about this pull request! Warning: This function doesn't actually work.

For our internal (non-CRAN) packages we run a daily cron task to download the latest available packages from S3 into a subfolder under /opt and this process is what allows for rapid comparison of installed vs available versions as shown in the original function.

But obviously that doesn't work for CRAN packages because the latest version cannot be known without a lookup to the CRAN repository. If you want to lookup the latest available package in the CRAN repo you can do something like this.

PACKAGE=Rcpp;
echo "available.packages()['$PACKAGE','Version']" |
  R --no-save --no-restore -q |
  awk -F\" '/^\[1\]/ {print $2}'

If this was used to guard upgrade of packages I would expect it to slightly improve Chef runtime, but more importantly it would avoid the chance of breaking running processes that depend on libraries which get re-installed every half hour.

Our workaround is to just use the :install action all the time, then only temporarily set :upgrade when we know it is really required. This has been working well enough for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants