Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support binary + source packages in Nexus repositories #1074

Closed
kevinushey opened this issue Sep 7, 2022 · 15 comments
Closed

support binary + source packages in Nexus repositories #1074

kevinushey opened this issue Sep 7, 2022 · 15 comments

Comments

@kevinushey
Copy link
Collaborator

Many thanks @kevinushey for your very quick and positive answer !

To make it simple, Nexus (aka Nexus Repository Manager, by Sonatype) is deployed by my IT teams on our company's servers.
Nexus aims at achieving two main goals:

  • Proxifying external repositories (such as CRAN)
  • Storing persistently packages downloaded from these repositories, for future usage (cache functionality).

In the Rprofile.site we deploy on our machines, the repository is set using the Nexus root URL for R repository (miror of CRAN). Thanks to this configuration:

  • When I install an R package that nobody in my company has already installed (same package, same version, same type, architecture if binary, etc), Nexus downloads it from CRAN, serves it to me and stores it for future usage
  • Afterwards, when a colleague (or myself from another machine) tries to install it, Nexus serves the package it has already stored internally.

The architecture of Nexus storage is the same as the original CRAN repository. For example :

  • R source packages will be stored in /src/contrib/<PACKAGE_NAME>_<PACKAGE_VERSION>.tar.gz
  • R windows binary packages will be stored in /bin/windows/contrib/<R_VERSION_X.Y>/<PACKAGE_NAME>_<PACKAGE_VERSION>.zip
  • and so on.

The main difference with CRAN is that older versions are kept, and still available thanks to Nexus internal storage functionality.

Binary packages

Here is an example for binary package rlang in my Nexus architecture:

___ bin
    |___ windows
         |___ contrib
              |___ 4.1
                   |___ rlang_1.0.3.zip
                   |___ rlang_1.0.4.zip
                   |___ rlang_1.0.5.zip
                   |___ PACKAGES
                   |___ PACKAGES.gz		 
                   |___ PACKAGES.rds

As said above, older versions are still available for download, even if they are not explicitly listed in PACKAGES.gz (which only includes latest version since it is the proxyfied version of CRAN’s PACKAGES.gz file):

Package: rlang
Version: 1.0.5
Depends: R (>= 3.4.0)
Imports: utils
Suggests: cli (>= 3.1.0), covr, crayon, fs, glue, knitr, magrittr,
        methods, pillar, rmarkdown, stats, testthat (>= 3.0.0), tibble,
        usethis, vctrs (>= 0.2.3), withr
License: MIT + file LICENSE

The idea would be to determine the theoretical URL and try such URL when restoring the project, if the requested package is not explicitly listed in PACKAGES.gz.
In our example, if the renv.lock file contains the record [email protected], we should try URL bin/windows/contrib/4.1/rlang_1.0.4.zip.

Source packages

The same idea could also be used for source packages, in order to take advantage of Nexus storage functionnality.
Here is an example for source package rlang in my Nexus architecture:

___ src
    |___ contrib
         |___ rlang_1.0.3.tar.gz
         |___ rlang_1.0.4.tar.gz
         |___ rlang_1.0.5.tar.gz
         |___ PACKAGES
         |___ PACKAGES.gz		 
         |___ PACKAGES.rds

Again, older versions are still available for download, even if they are not explicitly listed in PACKAGES.gz (which only includes latest version since it is the proxyfied version of CRAN’s PACKAGES.gz file):

Package: rlang
Version: 1.0.5
Depends: R (>= 3.4.0)
Imports: utils
Suggests: cli (>= 3.1.0), covr, crayon, fs, glue, knitr, magrittr,
        methods, pillar, rmarkdown, stats, testthat (>= 3.0.0), tibble,
        usethis, vctrs (>= 0.2.3), withr
Enhances: winch
License: MIT + file LICENSE
MD5sum: 0419b9b94b400f3ec1f2792ab7f228e2
NeedsCompilation: yes

I saw the code you wrote retrieve.R file, for renv_retrieve_repos_archive_path() function.
I understand (but not 100% sure) that when requested package is neither in binary PACKAGES.gz file nor in source PACKAGES.gz, this function:

  • first tries to find the package "src/contrib/Archive/<PACKAGE_NAME>/"
  • then in "src/contrib/" (to deal with initial request hereabove).

(If voluntary ignore the step related to issue #602, to keep this post as simple as possible !)

If the package has been moved to Archive subfolder in CRAN, Nexus will download it and store it with the same architecture, in folder "src/contrib/Archive/<PACKAGE_NAME>/".

At the end of the day, the same source package will be duplicated in Nexus storage : both in "src/contrib/" (initial download) and in "src/contrib/Archive/<PACKAGE_NAME>/" (second download, when package is moved to Archive in CRAN) :

___ src
    |___ contrib
         |___ Archive
              |___ rlang
                   |___ rlang_1.0.3.tar.gz          # redudant storage since already available
                   |___ rlang_1.0.4.tar.gz          # in Nexus outside of Archive folder
         |___ rlang_1.0.3.tar.gz
         |___ rlang_1.0.4.tar.gz
         |___ rlang_1.0.5.tar.gz
         |___ PACKAGES
         |___ PACKAGES.gz		 
         |___ PACKAGES.rds

If we could avoid this, this would prevent from unnecessarily increase the storage volumetry.
For Nexus’like configurations, I would suggest to try first in "src/contrib/", and then in "src/contrib/Archive/<PACKAGE_NAME>/".

Conclusion

To put in a nutshell, my suggested sequence would be :

  1. renv_retrieve_repos_binary (when binary explicitly requested by user)
  2. renv_retrieve_repos_binary_older (when binary explicitly requested by user) – new step
  3. renv_retrieve_repos_mran (when binary explicitly requested by user and MRAN enabled)
  4. renv_retrieve_repos_source
  5. renv_retrieve_repos_source_older – new step
  6. renv_retrieve_repos_archive

I don't know if repository managers like Nexus are largely used by R users. If you don't want to systematically try oldest versions, a user-level configuration to trigger steps 2 and 5 could make sense:
For instance, using a new option renv.config.retrieve.try.older (or RENV_CONFIG_RETRIEVE_TRY_OLDER as environment variable), with default to FALSE.

Would you have any question, please contact me.
And if you prefer that I create an new issue on Github, please tell me.
Kind regards
Arnaud

Originally posted by @arnauddeblic in #595 (comment)

@kevinushey
Copy link
Collaborator Author

@arnauddeblic: do you know if there's a way for me to determine whether a repository URL is associated with a Nexus repository? E.g. is there some file or header I can query at the repository URL to determine that?

@kevinushey
Copy link
Collaborator Author

kevinushey commented Sep 16, 2022

I've tried making some changes to support this in a582231; if you want to test you can try something like:

options(renv.nexus.enabled = TRUE)
renv::install(<package>)

and see if renv is able to find a binary package at the Nexus "fallback" location.

If there's a way for me to query whether a repository is a Nexus repository, then I could eliminate the need to set an R option to opt-in to this behavior.

@arnauddeblic
Copy link

Dear @kevinushey,

Many thanks for addressing this issue so quickly!

To answer both your questions:

  1. Concerning your implementation:

The fallback function is called and an URL is requested - this is a good start.
However, the requested URL is not correct. Based on the original example, with [email protected],
instead of requesting <repo>/bin/windows/contrib/4.1/rlang_1.0.4.zip, the code requests <repo>/rlang_1.0.4.zip :
image

Same behavior for source packages:
Instead of requesting <repo>/src/contrib/rlang_1.0.4.tar.gz, the code requests <repo>/rlang_1.0.4.tar.gz :
image

  1. Concerning the way of querying whether a repository is a Nexus repository:

Since Nexus is a proxy (and cache) system, I'm afraid there is no special file that could help.
The HTTP header could be a solution (at least from what I can observe using my company's installation of Nexus).
When I request the repo using Postman, I get in the response a Server header with value Nexus/3.25.1-04 (OSS):
image
In this configuration, looking whether the Server header contains nexus (with no case sensitivity) or not would provide you with the information.

Please note:

  • Whereas Nexus's documentation does not specify that the repo should end with a trailing slash, the http code will not be 200 if the is no trailing slash :
    image
  • Same consideration for "subfolders": all these URL with throw a 404 code
    . /bin
    . /bin/windows
    . /bin/windows/contrib
    . /bin/windows/contrib/4.1
    . /src
    . /src/contrib
    whereas same URL with a trailing slash will throw a 200 status code.
  • For files (PACKAGES and PACKAGES.gz for instance), status code will be 200 as well.
  • => Make sure the URL you send ends with a trailing slash unless it is a file URL.

Since we are not sure Nexus will always send this header (companies sometimes change their name or their products name, you know it better than me ;) ), maybe you could secure this header request with your renv.nexus.enabled option):
if header Server contains nexus or if option renv.nexus.enabled is TRUE, then...

Other question
I have a last remark / question concerning some part of the code I have just seen in this last version of your retrieve.R file.
In the CRAN version [email protected], retrieving from source was always added to the methods list - unless pkgType option was not source. With such an algorithm, when pkgType option was set to binary, retrieving from source was tried, if no binary was found. I was quite confortable with such implementation.
In this new version, I understand that retrieving from source is no more added to the methods list if pkgType option is set to binary: srcok <- pkgtype %in% c("both", "source").
I'm not sure I understand the reason of this modification. From what I understand, pkgType option is supposed to set the preferred installation method. Have you considered using option install.packages.check.source ? Maybe renv should add "retrieve from source" to methods list, unless install.packages.check.source is explicitely set to no.

Would you have any question, please contact me.
Kind regards
Arnaud

kevinushey added a commit that referenced this issue Sep 16, 2022
@kevinushey
Copy link
Collaborator Author

Thanks! I've made the changes required (I think) to support the Nexus URLs properly. It might take a bit more iteration to refine but I think we're getting there.

Re: your question on srcok <- pkgtype %in% c("both", "source"); in R, the pkgType option defaults to "both":

> getOption("pkgType")
[1] "both"

and renv tries to respect that choice. In this situation, R (and renv) prefer installing binaries if available, but will fall back to source packages if not.

From what I can see in the R sources:

https://github.com/wch/r-source/blob/18d16095f36e28862d125d88659bda28d93d0269/src/library/utils/R/packages2.R#L547-L548

R uses the install.packages.check.source option to allow a fallback to the source repository even if a binary repository was explicitly requested.

@arnauddeblic
Copy link

Many thanks @kevinushey for your support.

I tried your new implementation:

Diagnostics:

  • with pkgType = binary and renv.nexus.enabled option set to TRUE : OK
    image

  • with pkgType = source and renv.nexus.enabled option set to TRUE : OK
    image

  • with pkgType = binary and renv.nexus.enabled option unset (default to FALSE) : /!\ KO (retrieve was performed from Archive)
    image

  • with pkgType = source and renv.nexus.enabled option unset (default to FALSE) : /!\ KO (retrieve was performed from Archive)
    image

I spent some time debugging and found the problem:
Nexus serveur throws a 404 status code when you curl with HEAD parameter, see renv-headers temp file:

HTTP/1.1 404 Not Found
Date: Fri, 16 Sep 2022 19:54:00 GMT
Server: Nexus/3.25.1-04 (OSS)
X-Content-Type-Options: nosniff
Content-Security-Policy: sandbox allow-forms allow-modals allow-popups allow-presentation allow-scripts allow-top-navigation
X-XSS-Protection: 1; mode=block
Pragma: no-cache
Cache-Control: no-cache, no-store, max-age=0, must-revalidate, post-check=0, pre-check=0
Expires: 0
X-Frame-Options: DENY
Content-Type: text/html
Content-Length: 2071
Set-Cookie: e1c2a849e31cf572844da4b9bd2d0f31=bd4f901c2a2c2c04b50478be164e36fa; path=/; HttpOnly

When removing HEAD parameter from curl configuration file, Nexus server throws a 200 status code.
The full page is served; this is very small data when repo is Nexus, since basically the page tells you:

This r group repository is not directly browseable at this URL.

Please use the [browse] or [HTML index] views to inspect the contents of this repository.
HTTP/1.1 200 OK
Date: Fri, 16 Sep 2022 20:10:58 GMT
Server: Nexus/3.25.1-04 (OSS)
X-Content-Type-Options: nosniff
Content-Security-Policy: sandbox allow-forms allow-modals allow-popups allow-presentation allow-scripts allow-top-navigation
X-XSS-Protection: 1; mode=block
Content-Type: text/html
Content-Length: 2403
Set-Cookie: e1c2a849e31cf572844da4b9bd2d0f31=bd4f901c2a2c2c04b50478be164e36fa; path=/; HttpOnly
Cache-control: private


<!DOCTYPE html>
<html lang="en">
<head>
  <title>Repository - Nexus Repository Manager</title>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>


  <!--[if lt IE 9]>
  <script>(new Image).src="https://**********************.fr/favicon.ico?3.25.1-04"</script>
  <![endif]-->
  <link rel="icon" type="image/png" href="https://**********************.fr/favicon-32x32.png?3.25.1-04" sizes="32x32">
  <link rel="mask-icon" href="https://**********************.fr/safari-pinned-tab.svg?3.25.1-04" color="#5bbad5">
  <link rel="icon" type="image/png" href="https://**********************.fr/favicon-16x16.png?3.25.1-04" sizes="16x16">
  <link rel="shortcut icon" href="https://**********************.fr/favicon.ico?3.25.1-04">
  <meta name="msapplication-TileImage" content="https://**********************.fr/mstile-144x144.png?3.25.1-04">
  <meta name="msapplication-TileColor" content="#00a300">

  <link rel="stylesheet" type="text/css" href="https://**********************.fr/static/css/nexus-content.css?3.25.1-04"/>
</head>
<body>
<div class="nexus-header">
  <a href="https://**********************.fr">
    <div class="product-logo">
      <img src="https://**********************.fr/static/images/nexus.png?3.25.1-04" alt="Product logo"/>
    </div>
    <div class="product-id">
      <div class="product-id__line-1">
        <span class="product-name">Nexus Repository Manager</span>
      </div>
      <div class="product-id__line-2">
        <span class="product-spec">OSS 3.25.1-04</span>
      </div>
    </div>
  </a>
</div>

<div class="nexus-body">
  <div class="content-header">
    <img src="https://**********************.fr/static/rapture/resources/icons/x32/database.png?3.25.1-04" alt="Repository image"/>
    <span class="title">Repository</span>
    <span class="description">r</span>
  </div>

  <div class="content-body">
    <div class="content-section">
      <p>
        This r group repository is not directly browseable at this URL.
      </p>

      <p>
        Please use the <a href="https://**********************.fr/#browse/browse:r">browse</a>
        or <a href="https://**********************.fr/service/rest/repository/browse/r/">HTML index</a>
        views to inspect the contents of this repository.
      </p>
    </div>
  </div>
</body>
</html>

I don't know if there is another option than HEAD to limit the amount of data to be served when requesting an URL. If there is no option, maybe we could store the result of the renv_nexus_enabled() function for a given repo, so that the function is not triggered for all packages to be restored.

Additional question concerning download method:

In your implemention, you seem to prefer curl method for downloads (see function renv_repos_info_impl()).
On the Windows servers of my intensive computing grid, I had to set RENV_DOWNLOAD_FILE_METHOD = wininet in the Renviron.site, since i did not manage to renv::install packages with default settings. Do you think this could be a problem, for future renv users using wininet like me ? As far as i am concerned, I will set the renv.nexus.enabled option to TRUE on theses machines, so it will be OK ;)

Kind regards

Arnaud

@kevinushey
Copy link
Collaborator Author

kevinushey commented Sep 16, 2022

Thanks! I think it should be okay to just perform a regular web request at that endpoint; it's unlikely the returned data would be large from any typical CRAN mirror. It would also allow us to use arbitrary downloaders as well (so no need to force the use of curl).

Loose ends should be tied up on the main branch now. Thanks for the feedback; fingers crossed that this gets us over the finish line!

@arnauddeblic
Copy link

I will give you feedback as soon as I test.
Kind regards

@arnauddeblic
Copy link

arnauddeblic commented Sep 19, 2022

Dear @kevinushey,

Thanks for your reply and for new improved implementation.

I tested [email protected] with 4 configurations:

  • pkgType option set to binary, renv.nexus.enabled option unset
  • pkgType option set to binary, renv.nexus.enabled option set to TRUE
  • pkgType option set to source, renv.nexus.enabled option unset
  • pkgType option set to source, renv.nexus.enabled option set to TRUE

on several Windows environments:

  • Labtop (Windows 10)
  • VDI (Windows 10)
  • Server (Windows Server 2016)

using 2 different methods:

  • from RStudio, within a Rstudio project (on labtop and VDI)
  • from R.exe run in command line, within a directory project (on labtop, VDI, and server)

Diagnostics

Results are OK everywhere, with all 4 configurations, apart from a strange behavior, see below:

Environment Method Status
Labtop RStudio, within a Rstudio project OK
Labtop R.exe run in command line, within a directory project OK
VDI RStudio, within a Rstudio project OK
VDI R.exe run in command line, within a directory project OK
Server R.exe run in command line, within a directory project OK - but strange behavior, see below

Strange behavior observed on Windows Server

When restoring a renv project on Windows Server (using R.exe run from command line):

  • everything works fine (my old package versions correctly restore from Nexus, whatever the configuration)
  • BUT a new empty directory called NULL is created in the project directory:
    image

To make sure, I tested with [email protected] CRAN version, and I confirm this stange behavior does not occur with released version: empty NULL directory is not created when restoring a renv project using [email protected].
I don't think it is related to recent developpements dealing with Nexus issue. Probably another developpment made between [email protected] and [email protected] ?

Would you have any question, please contact me.
Kind regards
Arnaud

P.S. :

  • Do you already have a rough idea concerning next CRAN release date, including this new Nexus feature ?
  • In the meantime, can I consider that "renv.nexus.enabled" is the definitive option name ? (I'm currently preparing Renviron.site Rprofile.site configuration files for deployment in production in my company)

@kevinushey
Copy link
Collaborator Author

Great news -- thanks for taking the time to test.

Do you already have a rough idea concerning next CRAN release date, including this new Nexus feature ?

I'm hoping to prepare a new release in the coming weeks.

In the meantime, can I consider that "renv.nexus.enabled" is the definitive option name ? (I'm currently preparing Renviron.site Rprofile.site configuration files for deployment in production in my company)

Yes, we can consider the option here stable.

I don't think it is related to recent developpements dealing with Nexus issue. Probably another developpment made between [email protected] and [email protected] ?

Thanks for the heads up here -- I'll see if I can figure out where this is coming from.

@kevinushey
Copy link
Collaborator Author

Regarding the NULL directory, it might be helpful if you could also test with code of the following form:

trace(dir.create, quote({
  if (grepl("NULL", path)) { print(rlang::trace_back()) }
}))

(please also make sure rlang is also installed)

That might give a hint as to where that directory is coming from.

@kevinushey
Copy link
Collaborator Author

My only other guess is that this could be related to us setting R_LIBS_USER and R_LIBS_SITE here:

renv/R/r.R

Lines 10 to 13 in 630d5ef

# ensure R_LIBS is set; unset R_LIBS_USER and R_LIBS_SITE
# so that R_LIBS will always take precedence
rlibs <- paste(renv_libpaths_all(), collapse = .Platform$path.sep)
renv_scope_envvars(R_LIBS = rlibs, R_LIBS_USER = "NULL", R_LIBS_SITE = "NULL")

Maybe something is auto-creating those directories?

@arnauddeblic
Copy link

arnauddeblic commented Sep 19, 2022

Dear @kevinushey,
Your last guess is the good one:
I indeed deployed such code on my server:

  • in the Renviron.site file:
R_LIBS_SITE = D:/R/R_LIBS_SITE/%p-library/%v
  • in the Rprofile.site file :
local({
  R_LIBS_SITE <- Sys.getenv("R_LIBS_SITE")
  if (!dir.exists(R_LIBS_SITE)) {
    dir.create(R_LIBS_SITE, recursive = TRUE)
  }
})

I added this code since otherwise, from what I understand, R does not take into account R_LIBS_SITE when the corresponding directory does not exists. And I really need to specify a site library.

To be sure this auto-creation is responsible for the NULL directory, I've just added the same kind of parameter in my VDI environment:

  • in the Renviron.site file:
R_LIBS_USER = U:/AppData/Local/R/%p-library/%v
  • in the Rprofile.site file :
local({
  R_LIBS_USER <- Sys.getenv("R_LIBS_USER")
  if (!dir.exists(R_LIBS_USER)) {
    dir.create(R_LIBS_USER, recursive = TRUE)
  }
})

This configuration now leads to the same strange behavior on my VDI: a NULL directory is created.

This configuration was not already set on VDI when I performed the tests this morning. I planned to do it, since otherwise, from what I understand, R does not take into account R_LIBS_USER when the corresponding directory does not exists. And I really need to specify a user library. This is even more necessary on my VDI configuration, since I do deploy hundreads of VDIs, and I need to store user data in a shared network dedicated to every user (mapped on the U: drive), rather than in C:/Users/... of the VDI).

Do you know if there is another way to auto-create those directories ?
Otherwise, to you think you can adjust renv behavior, so that it does not create the NULL directory ?

Kind regards
Arnaud

@kevinushey
Copy link
Collaborator Author

The R documentation suggests that R_LIBS_USER and R_LIBS_SITE can be set to NULL if you'd like them to be ignored or set as empty; e.g.

https://github.com/wch/r-source/blob/18d16095f36e28862d125d88659bda28d93d0269/src/library/base/man/libPaths.Rd#L58-L66

And those NULL values get handled by R's built-in base Rprofile, e.g. for Unix:

https://github.com/wch/r-source/blob/18d16095f36e28862d125d88659bda28d93d0269/src/library/profile/Rprofile.unix#L5-L15

In this case, I believe renv is doing the right thing; I think you need to validate that R_LIBS_USER and R_LIBS_SITE are not equivalent to NULL before choosing to create them.

@arnauddeblic
Copy link

Dear @kevinushey,
Thanks to your advice, I adjusted my Rprofile.site files as below.
It's now OK : the undue directory creation no longer occurs.
I think we can consider this issue #1074 as ready to be closed.
Many thanks for your help - I really appreciated our collaboration !
Kind regards
Arnaud


On server:

local({
  R_LIBS_SITE <- Sys.getenv("R_LIBS_SITE", unset = "NULL")
  if (R_LIBS_SITE != "NULL" & !dir.exists(R_LIBS_SITE)) {
    dir.create(R_LIBS_SITE, recursive = TRUE)
  }
})

On VDI:

local({
  R_LIBS_USER <- Sys.getenv("R_LIBS_USER", unset = "NULL")
  if (R_LIBS_USER != "NULL" & !dir.exists(R_LIBS_USER)) {
    dir.create(R_LIBS_USER, recursive = TRUE)
  }
})

@kevinushey
Copy link
Collaborator Author

Great, I'm glad to hear it! Thanks for taking the time to report back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants