Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install texlive conditionally #750

Merged
merged 8 commits into from
Jan 25, 2024
Merged

install texlive conditionally #750

merged 8 commits into from
Jan 25, 2024

Conversation

cboettig
Copy link
Member

@eitsupi my apologies, I didn't realize my local test was pulling the upstream image. The tex issue is specific to the changes in #743 -- specifically, the official ubuntu repo texlive binaries are installed and purged in the BUILDDEPS list. By not purging builddeps, we had conflicting versions of texlive.

This PR modifies the install_texlive.sh to not do the manual install of tlmgr from CPAN if the texlive has already be installed by apt-get (i.e. /usr/bin/latex already exists). I believe it should resolve the issue.

@eitsupi
Copy link
Member

eitsupi commented Jan 24, 2024

See #743 (comment)

I apologize for the poor quality of the review.
I think the last build didn't remove the build dependencies, which increased the image size of rocker/r-ver by more than 2GB. It needs to be fixed immediately.

https://github.com/rocker-org/rocker-versioned2/wiki/r-ver_4c21e4362fbf
v.s.
https://github.com/rocker-org/rocker-versioned2/wiki/r-ver_05eb2a30305e

@eitsupi
Copy link
Member

eitsupi commented Jan 24, 2024

See #751


if [[ -x "/usr/bin/latex" ]]; then
echo "texlive already installed"
exit 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why skip all other processes? In that case, the package installation will not take place, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe any of the processes listed inside install_texlive.sh ought to take place if we're using the ubuntu binaries which already supply these.

@eitsupi
Copy link
Member

eitsupi commented Jan 24, 2024

It's really unfortunate that the regression test was working correctly and we ended up merging.

@eitsupi
Copy link
Member

eitsupi commented Jan 24, 2024

I didn't realize my local test was pulling the upstream image.

I don't think the current structure is really good. Ideally, we should only rely on all images ubuntu and not on the rocker image that has been pushed.
But I don't have the energy to do it because it takes a lot of effort to change the structure.

@cboettig
Copy link
Member Author

I think the failing test will succeed once https://github.com/rocker-org/rocker-versioned2/actions/runs/7634476675/job/20798413557 is finished. I'll leave this as a draft until then.

The current test structure seems to depend on testing against the current published version of the rocker/r-ver image, not the modified one, which is what confused me? In https://github.com/rocker-org/rocker-versioned2/actions/runs/7634519606/job/20799340714?pr=750#step:5:31, it appears to be pulling rocker/r-ver from docker hub, not using the one in the PR.

To test, I'm using a dummy Dockerfile that basically uses your proposal in #754 and adds the extra script tests:

FROM ubuntu:jammy

LABEL org.opencontainers.image.licenses="GPL-2.0-or-later" \
      org.opencontainers.image.source="https://github.com/rocker-org/rocker-versioned2" \
      org.opencontainers.image.vendor="Rocker Project" \
      org.opencontainers.image.authors="Carl Boettiger <[email protected]>"

ENV R_VERSION=4.3.2
ENV R_HOME=/usr/local/lib/R
ENV TZ=Etc/UTC

COPY scripts/install_R_source.sh /rocker_scripts/install_R_source.sh

RUN /rocker_scripts/install_R_source.sh

ENV CRAN=https://p3m.dev/cran/__linux__/jammy/latest
ENV LANG=en_US.UTF-8

COPY scripts /rocker_scripts

RUN /rocker_scripts/setup_R.sh

CMD ["R"]

# TESTING additional steps after r-ver steps:
RUN /rocker_scripts/install_jupyter.sh
RUN /rocker_scripts/install_verse.sh

Anyway, let's see if the above tests pass once the hotfixes you so nicely did reach DockerHub.

@eitsupi
Copy link
Member

eitsupi commented Jan 24, 2024

@cboettig No, the Dockerfile used in the tests uses the latest scripts.

COPY tests/rocker_scripts/test.sh /test.sh
COPY scripts /rocker_scripts
ARG script_name=install_rstudio.sh
ARG script_arg=skip
RUN /test.sh ${script_name} ${script_arg}

The test case for not removing build deps is not covered here, so if CI goes green here, the rocker/ml-verse build will continue to fail.

@eitsupi
Copy link
Member

eitsupi commented Jan 24, 2024

You should add the test case install_texlive.sh on rocker/cuda:latest to this file.

https://github.com/rocker-org/rocker-versioned2/blob/1828862223cd0a5357f2523c0dc85777c64fb9b7/tests/rocker_scripts/matrix.json

@cboettig
Copy link
Member Author

@eitsupi thanks for all your help here, sorry I don't always understand things.

Yes, I see the test case copies the scripts, but it looks to me that is based on pulling r-ver from Dockerhub, and is not running install_R_source.sh script again. Thus it appears to me the test case is grabbing the version of r-ver on Dockerhub that does not have builddeps purged. We see here in the failing test, it pull from hub, copy the new scripts, and run only the install_verse.sh, right?: https://github.com/rocker-org/rocker-versioned2/actions/runs/7634519606/job/20799340714?pr=750#step:5:77

I agree we need additional tests as well. I think we actually have to test both install_texlive.sh and install_jupyter.sh in the cuda scenario (I think it is already covered in the r-ver case). I'll try to add that test now.

@eitsupi
Copy link
Member

eitsupi commented Jan 24, 2024

and is not running install_R_source.sh script again. Thus it appears to me the test case is grabbing the version of r-ver on Dockerhub that does not have builddeps purged. We see here in the failing test, it pull from hub, copy the new scripts, and run only the install_verse.sh, right?

Correct. As you know, install_R_source.sh is quite time-consuming to execute, so regression tests are rarely triggered. The test is here, but it is not triggered in this PR.

regression-test:
needs: build
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
tag:
- "4.0.0"
platforms:
- linux/amd64
script:
- install_rstudio.sh
- install_tidyverse.sh
- install_verse.sh
- install_shiny_server.sh
- install_geospatial.sh
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Expose GitHub Runtime
uses: crazy-max/ghaction-github-runtime@v3
- name: test build
run: |
docker buildx bake \
-f bakefiles/"${{ matrix.tag }}".docker-bake.json \
--set=*.platform="${{ matrix.platforms }}" \
--set=*.cache-from=docker.io/rocker/r-ver:"${{ matrix.tag }}" \
--set=*.cache-from=type=gha,scope=r-ver-"${{ matrix.tag }}" \
--set=*.cache-to=type=gha,scope=r-ver-"${{ matrix.tag }}" \
--set=r-ver.tags=r-ver-test-"${{ matrix.tag }}" \
--load \
r-ver
- name: test run rocker scripts
run: |
docker run --rm r-ver-test-"${{ matrix.tag }}" "/rocker_scripts/${{ matrix.script }}"

@cboettig
Copy link
Member Author

looks like the above test failures are network issues (HTTP 500 errors) on the manual install for texlive.

Given how temperamental the network is on https://mirror.ctan.org/ and how large the cuda images are to begin with, I'm not sad to be getting texlive from the much more reliable ubuntu repos. The changes in this PR should make the relevant scripts all more robust to working either with the hand-selected packages via tlmgr or the more standard (but larger) use of the official ubuntu apt binaries.

For the reason we discussed above, these tests are still not 100% correct when running on the CI, because they are not running on the r-ver and cuda images that would be built here.

I think a more complete check that I'm using locally is the above Dockerfile that runs these scripts in top of a r-ver rebuilt from scratch, and the same thing but for the cuda side. I've tested both of those builds locally.

We can tickle those checks again on the CI as well, but as a hotfix to correct the previous broken builds I think this will now succeed, modulo the stability of CTAN.org servers....

@cboettig cboettig marked this pull request as ready for review January 24, 2024 05:40
@cboettig
Copy link
Member Author

@eitsupi ok tests are green, including I think the additional test you mentioned. I think this is ready for review, thanks for all the help!

scripts/install_verse.sh Outdated Show resolved Hide resolved
scripts/install_texlive.sh Show resolved Hide resolved
tests/rocker_scripts/matrix.json Outdated Show resolved Hide resolved
tests/rocker_scripts/matrix.json Outdated Show resolved Hide resolved
@cboettig
Copy link
Member Author

@eitsupi anything more on this? don't mean to rush you but do want to finish off this hot-fix for the broken builds created by #743 , also current workflows won't run on some of the published images until this is merged.

@eitsupi
Copy link
Member

eitsupi commented Jan 25, 2024

Thanks, but is this a hotfix?
The broken images have already been fixed, and I don't think we should rush the build after merging this (it is a waste of resources to build an almost identical one with only one day open)

@cboettig
Copy link
Member Author

I have had several workflows that are broken until this is fixed. I am glad that we are respectful of the resources that Microsoft is providing for free, but I am also spending quite a bit of time working around the broken images at the moment and this probably impacts other users as well.

@cboettig cboettig merged commit 31b9272 into master Jan 25, 2024
41 checks passed
@cboettig cboettig deleted the patch/texlive branch January 25, 2024 16:34
@eddelbuettel
Copy link
Member

eddelbuettel commented Jan 25, 2024

respectful of the resources that Microsoft is providing for free

It's complicated but I think "the community as a whole" is also overdoing it on builds, and CI runs, and whatnot. (This comment is not about Rocker, or a stack, but more generic. E.g. I have a beef with "kids these days" being lead to cookie-cutter development that runs each commit over a matrix of five or more builds for really no apparent reason. My hill to to die on I guess.) Then again we are rounding error compared to, say, bitcoin mining and LLM training.

@cboettig
Copy link
Member Author

@eddelbuettel yeah I totally hear you that. I think your point is that CI should be used thoughtfully, not mindlessly, otherwise developers and contributors time is not well spent. I don't think anyone is worried that one of the 'magnificent 7' most profitable companies needs our help to ensure it remains financially viable. There is of course a real carbon footprint to mindless CI (and needless use of chatbots etc), despite the aggressive carbon targets...

but I digress, I'm just hoping to get tensorflow working smoothly and other such bits of the python+gpu experience on our image during this brief moment when I have a few cycles to invest on this before other things take over all my time again.

@eddelbuettel
Copy link
Member

As an (index fund owning hence indirect) MSFT shareholder I am not too worried about their profitability. Free resources tend to be a cost of user acquisition that gets amortised in the long run over eventual fees.

It's the carbon footprint from endlessly running jobs just because we can...

But yep thanks for making it easy to deploy gpus. I appreciate it too per my recent bug report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants