Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disabling musl as it isn't capable to load dynamic library #3917

Merged
merged 21 commits into from
Dec 1, 2022

Conversation

JaroslavTulach
Copy link
Member

@JaroslavTulach JaroslavTulach commented Nov 25, 2022

Pull Request Description

Disabling musl as it isn't capable to load dynamic library.

Important Notes

With this change it is possible to:

$ sbt  bootstrap
$ sbt  engine-runner/buildNativeImage
$ ./runner --run ./engine/runner/src/test/resources/Factorial.enso 3
6
$ ./runner --run ./engine/runner/src/test/resources/Factorial.enso 4
24
$ ./runner --run ./engine/runner/src/test/resources/Factorial.enso 100
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000

Is it OK, @radeusgd to disable musl? If not, we would have to find a way to link the parser in statically, not dynamically.

Checklist

Please include the following checklist in your PR:

  • All code conforms to the style guides.
  • All code has been tested:
    • The ./runner is built and executed in the CI

@JaroslavTulach JaroslavTulach added the CI: No changelog needed Do not require a changelog entry for this PR. label Nov 25, 2022
@JaroslavTulach JaroslavTulach self-assigned this Nov 25, 2022
Copy link
Collaborator

@hubertp hubertp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-runner native images would need to be tested separately.
@radeusgd probably knows more about them

@radeusgd
Copy link
Member

radeusgd commented Nov 25, 2022

What is the alternative? Are we switching back to glibc?

Then I'm not sure if we can do that. If I think correctly, it would break the launcher project. See our docs explaining why we link musl: https://enso.org/docs/developer/enso/infrastructure/native-image.html#additional-linux-dependencies
It is related to a very old glibc bug which was segfaulting the launcher. I don't remember on what occasions, it was probably when downloading a file (which is one of the main purposes of the launcher). I'm not sure if we didn't switch backends from Apache Commons Http to Akka which may have solved the issue - I would need to double check it in a few ways. Please let me know if I should verify this.

Also, we may try to do a mixed configuration and stay on musl for the launcher (which is independent of the engine anyway) and switch to glibc for the engine, if you think that will be helpful.

However, let's double check if this will not make some bad stuff - for example, will the Enso Http library still work? If it relies on the same Java backend that was broken in the launcher which in the native build delegates to this glibc getaddrinfo call, it may also not work. So switching to glibc for native Enso engine builds may make the Http library unusable on such Linux builds. I'm not sure if it is a price we want to pay. But I'm not 100% certain we have to - I guess we would need to switch back and run some tests to see if it breaks, as I cannot with 100% certainty assure you that this call will be used. But I remember we were having problems with glibc in the past, hence the musl workaround, so I'm warning it may lead to trouble.

The developer documentation for the Enso compiler, runtime, and IDE..

@radeusgd
Copy link
Member

I'm afraid we may not have tests that will catch if the launcher is broken by this migration.

We have some NativeTest tests which run the native image. But I don't think any of them performs downloads. Maybe we should add such tests (I think we didn't back in the day as we wanted to avoid networking in tests thinking it was slowing CI down too much); it probably would need to call into some local server.

Still, I'm not certain such tests will catch this (potential) segfault. It may not be as deterministic as we want, unfortunately I don't remember the exact details as this was some time ago. I can try to look a bit deeper into it if necessary, just need a note to do so.

@Akirathan
Copy link
Member

Akirathan commented Nov 25, 2022

I have just tried to reproduce the bug mentioned in https://sourceware.org/bugzilla/show_bug.cgi?id=10652 on Ubuntu 22.04 with glibc version 2.35, and I cannot reproduce it. The last comment from that issue is from 2020-06-21 mentioned that the bug is reproducible with glibc 2.31. So maybe in 2.35 it is already fixed? Can someone else try that? cc @4e6

@radeusgd
Copy link
Member

I have just tried to reproduce the bug mentioned in sourceware.org/bugzilla/show_bug.cgi?id=10652 on Ubuntu 22.04 with glibc version 2.35, and I cannot reproduce it. The last comment from that issue is from 2020-06-21 mentioned that the bug is reproducible with glibc 2.31. So maybe in 2.35 it is already fixed? Can someone else try that? cc @4e6

Thanks for checking it! It may be a good find. Although I'm surprised as no one seems to have been working on this bug so it must have been an accidental fix? I can try to repro it too, but probably only next week.

Another question is - are we going to require glibc >= 2.35? Because I assume linking dynamically we'll rely on whatever glibc is available on the users OS, right? (Or am I horribly wrong here? Sorry I've not been looking into native distributions for some time) If then, this may have some implications on minimum supported OS versions we'll require. This may not be a blocker, but needs to be consciously considered.

@Akirathan
Copy link
Member

Although I'm surprised as no one seems to have been working on this bug so it must have been an accidental fix?

Maybe an accidental impact of other fix? I honestly don't know. That is why I suggested someone else to try.

Another question is - are we going to require glibc >= 2.35?

Not quite. More specifically, it depends on if we want to stick to statically linked executable. And I would say that we do want that, since it provides more portability. Then, the only question is which version of glibc we have on our GitHub runners, that will be building this stuff for the releases.

@4e6
Copy link
Contributor

4e6 commented Nov 25, 2022

@Akirathan I also have no problems building native images with glibc 2.36

Copy link
Contributor

@4e6 4e6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, we might have problems with the distribution of the binaries

ldd built-distribution/enso-launcher-0.0.0-dev-linux-amd64/enso/bin/enso
	linux-vdso.so.1 (0x00007ffeaf3f9000)
	libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f7eea800000)
	libz.so.1 => /usr/lib/libz.so.1 (0x00007f7eee639000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007f7eea619000)
	libm.so.6 => /usr/lib/libm.so.6 (0x00007f7eee551000)
	/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f7eee66d000)
	libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f7eee531000)

@radeusgd
Copy link
Member

Indeed, I think the whole point of musl was to be able to link statically, making the build more portable. Can we link statically with gcc? (Both from the technical and legal perspective?)

@@ -1636,7 +1636,7 @@ lazy val `engine-runner` = project
rebuildNativeImage := NativeImage
.buildNativeImage(
"runner",
staticOnLinux = true,
staticOnLinux = false,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling musl only for the runner image.

@JaroslavTulach
Copy link
Member Author

Ideally we would link enso_parser in statically, but I am still not sure how to do that? Talking to Tomáš Zezula... Meanwhile we can use the dynamic linking for the runner image - it allows us to use the new parser in the Native Image mode and we don't distribute the runner image anyway - we just use it for testing purposes.

@JaroslavTulach JaroslavTulach added the CI: Ready to merge This PR is eligible for automatic merge label Nov 28, 2022
@JaroslavTulach JaroslavTulach requested a review from 4e6 November 28, 2022 09:58
@4e6
Copy link
Contributor

4e6 commented Nov 28, 2022

we don't distribute the runner image anyway

You're right, I thought it might affect the launcher, but it is not. LGTM from me then

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Nov 28, 2022

ld -lz failed. Can we install the z library on the CI machine, @mwu-tow? I seem to have following two packages on my Ubuntu:

$ dpkg -l | grep zlib
ii  zlib1g:amd64                                  1:1.2.11.dfsg-2ubuntu9.2                    amd64        compression library - runtime
ii  zlib1g-dev:amd64                          1:1.2.11.dfsg-2ubuntu9.2                    amd64        compression library - development

and I am able to engine-runner/buildNativeImage fine. I took a look at scala-new.yml, but there are no apt get commands! Probably we want to stick to GitHub Actions. There is a github-actions-for-graalvm. I assume it would contain the necessary dependencies to produce dynamically linked native-image.

@JaroslavTulach JaroslavTulach removed the CI: Ready to merge This PR is eligible for automatic merge label Nov 30, 2022
@JaroslavTulach JaroslavTulach added the CI: Ready to merge This PR is eligible for automatic merge label Dec 1, 2022
@mergify mergify bot merged commit 030dbe4 into develop Dec 1, 2022
@mergify mergify bot deleted the wip/jtulach/NewParserInNativeImage_183802194 branch December 1, 2022 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants