Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feanil/ubuntu 24.04 #35713

Merged
merged 6 commits into from
Nov 7, 2024
Merged

feanil/ubuntu 24.04 #35713

merged 6 commits into from
Nov 7, 2024

Conversation

feanil
Copy link
Contributor

@feanil feanil commented Oct 23, 2024

On Oct 9th the version of ubuntu that ubuntu-latest pointed to in github action workflows changed from Ubuntu 22.04 to Ubuntu 24.04 which resulted in failures of the edx-platform CI jobs due to a couple of different issues. A timeline of the series of events and errors can be found here.

Issue 1: The mongo install/mirror URL was OS specific and did not get correctly updated to be resilient to the version of the OS changing.

Resolution: 5f2e853 updates the workflow to use a re-usable action which should prevent this issue from occuring in the future.

Issue 2: The version of libxmlsec and lxml that were constrained in edx-platform were not compatible with newer versions of the underlying libssl and libxmlsec C libraries that were available on the newer Ubuntu 24.04 images.

Resolution: The constraints were put in place due to incremental incompatabilities between the LXML and libxmlsec libraries but both have published new versions that are compatible with eachother and more resilient to the underlying C library versions. By un-pinning and upgrading, we've resolved all known errors with the ubuntu upgrade.

Upgrading to ubuntu-24.04 instead of ubuntu-latest

Given the complexities of edx-platform depedencies and the sheer number of them, I think it makes sense to explicitly test on a named version of ubuntu that doesn't slip out from under us, instead of a more dynamic pointer. Enough edx-platform libraries rely on the underlying systems C libraries that we'll want to explicitly roll that forward.

This does not mean that we should roll back the rest of the openedx ecosystem as we have not seen any issues that indicate any other repos have this level of deep dependencies on the underlying OS.

@feanil
Copy link
Contributor Author

feanil commented Oct 24, 2024

@dianakhuang with both xmslec and lxml upgraded it looks like we've got everything passing. Is there any further testing you want to do before we update the testing and merge this?

@dianakhuang
Copy link
Contributor

@feanil I think I'm okay with it as long as you're okay with me panic reverting it if it does break our deploys.

@feanil
Copy link
Contributor Author

feanil commented Oct 25, 2024

@dianakhuang any way to test prior to it going out?

Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI & requirement changes look good.

Could you add more context in your commit / PR description? There are a lot of pieces to this upgrade story which I don't have straight in my head but maybe you do. Reading through https://openedx.atlassian.net/wiki/spaces/COMM/pages/4546199561/Mini-RCA+doc+ubuntu-latest+issues+on+edx-platform, I have somewhat of a sense what happened, but it's not clear to me what caused 2U to revert before and what/whether anything is different now.

Comment on lines 49 to 54
# We only need to test older versions of Mongo with modules that directly interface with Mongo (that is: xmodule.modulestore)
# This code is left here as an example for future refernce in case we need to reduce the number of shards we're
# testing but still have good confidence with older versions of mongo. We use Mongo 4.4 in the example.
#
# exclude:
# - mongo-version: "4.4"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs updating

@feanil
Copy link
Contributor Author

feanil commented Nov 4, 2024

@kdmccormick this is ready for another review.

Rather than doing the install ourselves and needing to keep the debion
source URL up-to-date, use a community action to handle installing
mongo.
libxmlsec and lxml need to be updated in lockstep and the version we had
wouldn't work with Ubuntu 24.04 so unpinning lxml along with libxmlsec
to see if that resolves the build and dependency issues.
Run `make upgrade-package package='lxml[html_clean]'` to update lxml and
then `make compile-requirements`
Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY

@feanil feanil merged commit 642dae6 into master Nov 7, 2024
51 checks passed
@feanil feanil deleted the feanil/ubuntu-24.04 branch November 7, 2024 16:20
@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

1 similar comment
@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

rpenido added a commit to open-craft/edx-platform that referenced this pull request Nov 8, 2024
…4.04"

This reverts commit 642dae6, reversing
changes made to d82aada.
@bradenmacdonald
Copy link
Contributor

Hey, we are getting xmlsec.InternalError: (-1, 'lxml & xmlsec libxml2 library version mismatch') on our (tutor-based) sandboxes if they include this PR. Do you know off hand what update we need to make to avoid that?

@feanil
Copy link
Contributor Author

feanil commented Nov 8, 2024

@bradenmacdonald I was seeing that error on ubuntu 24.04 until I install both the newest version of lxml and xmlsec python packages and then I didn't see any issues. What version of ubuntu are you running for grove? And is the libxmlsec1-dev system package installed?

@bradenmacdonald
Copy link
Contributor

bradenmacdonald commented Nov 8, 2024

@feanil Ah, looks like we're on 20.04 because we pinned to an old tutor version to fix a mysql charset bug.

is the libxmlsec1-dev system package installed?

Yes, libxmlsec1-dev is already the newest version (1.2.28-2)

I assume if we upgrade to the latest Tutor nightly which is using 22.04 it may fix the issue. I will check.

Or should we be using 24.04?

@feanil
Copy link
Contributor Author

feanil commented Nov 10, 2024

22.04 should fix it, but as of the latest edx-platform version, 24.04 should work as well. Keep in mind that there are some changes to make codejail work correctly on the newer versions if you're running that service on the same boxes.

@cmltaWt0
Copy link
Contributor

22.04 should fix it, but as of the latest edx-platform version, 24.04 should work as well. Keep in mind that there are some changes to make codejail work correctly on the newer versions if you're running that service on the same boxes.

I'll double check, but for me the latest nightly having ubuntu 22.04 doesn't work.

cmltaWt0 added a commit to raccoongang/edx-platform that referenced this pull request Nov 11, 2024
…4.04"

This reverts commit 642dae6, reversing
changes made to d82aada.
@feanil
Copy link
Contributor Author

feanil commented Nov 12, 2024

Given that multiple people are seeing issues with it, perhaps it makes sense to revert this for now and I can try to re-produce the error locally rather than have everyone run into it.

@feanil
Copy link
Contributor Author

feanil commented Nov 12, 2024

Revert PR is here: #35826

@feanil
Copy link
Contributor Author

feanil commented Nov 12, 2024

@bradenmacdonald @cmltaWt0 can either of you provide a stack trace of where this failure is happening?

I'm trying to re-produce it on my local tutor-nightly and am unable to do it so far.

@feanil
Copy link
Contributor Author

feanil commented Nov 12, 2024

FYI, after some digging, it looks like this might be an issue with uwsgi, xmlsec/python-xmlsec#320 so an issue on the tutor side not on the edx-platform side. I'm gonna see if I can fix it there quickly rather than rollback edx-platform but if others are opposed to this idea, please say so.

@DawoudSheraz
Copy link
Contributor

@bradenmacdonald @cmltaWt0 can either of you provide a stack trace of where this failure is happening?

I'm trying to re-produce it on my local tutor-nightly and am unable to do it so far.

Can you try image build with --no-cache option and see if it is still reproducing? In some cases, maybe due to cache, I had seen the build fail on some machines but when they would re-build with --no-cache, it would fix the issue.

Second, the stack trace and the error location would certainly be helpful. During Ubuntu upgrade in Tutor, xmlsec error would come up during translation related commands execution.

@feanil
Copy link
Contributor Author

feanil commented Nov 12, 2024

FYI, the following change in tutor fixed the issue for me: overhangio/tutor#1157

Given that this is an upstream issue, I'm not going to revert the change from edx-platform and hopefully we can fix forward.

@DawoudSheraz I am building after a docker system prune and with no caches and am able to re-produce the error on the openedx image. After building the image if you try to run uwsgi /openedx/uwsgi.ini it throws the error noted in earlier posts for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants