-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: disable CGO when building on Fedora to avoid linking issues on the Ubuntu-based image (#2140) #2141
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, do we need CGO in non-fedora?
Btw, the released kamel
is usually built on my Fedora laptop (never at latest version)
I believe this might be a problem when/if we start building Camel K on any system that uses glibc 2.32 or newer. The reason is that on 2.32 they moved some of the pthread_*sigmask symbols from libpthread to libc. Therefore any binary built against 2.32, that uses pthread_* symbols won't work on older versions. The opposite is not true, though. This is the kamel released:
This is the one built on Fedora 33:
So, I am guessing it may be a problem on any/most of the distros where the ABI is incompatible so crossing these version boundaries may be problematic. So, yes, I think we might need to rely on CGO to ensure we have statically-linked binaries for the time being (specially since it looks like more changes like this will happen in the future). |
My understanding is that the Kamel binary added to the operator image should be built on the image OS. I think that the current approach, that is to build the It seems a possible solution would be to refactor the Dockerfile, and use a build stage to compile the binary on the same OS as the final image. |
Compiling it on the same image could solve the problem, indeed. However there's one thing that concerns me with this approach: it makes it harder for distribution packagers and downstream users building and productizing this. This is specially true on environments with secluded internet access, where the dependencies may not be available at build time. Thus forcing the users/packagers to go through the work of copying previously cached dependencies into the build stage and dealing with all the downsides of doing that. |
However it's done (multi-stage build, vendoring, Go proxy), I think packaging a binary built against the runtime image it will be the executed on, is more robust, and worth the possible extra efforts. |
I think it's worth mentioning that this affects not just the operator, but the CLI client as well. As I pointed with the CLI client that is available on the release page: if you are building the kamel CLI on any system that is based on glibc 2.32 (or newer), you are eventually going to have this problem. It is not the case yet because @nicolaferraro is using an older Fedora version, but eventually it might be. |
Right. I think that's a bit different for client binaries, as probably Mac OS and Windows ones are cross-compiled, which disable CGO as far as a know. So it's probably fine disabling it for Linux client binary. That leads to the question, why disabling it only for Fedora? And does it necessarily imply it should be done for the operator binary, for which it's possible to guarantee compatibility with its container image? |
I think it might be safe to disable it for any Linux system regardless of OS and I can adjust the PR accordingly. Fedora was just the one that I could reproduce the issue. I am not sure how many others distros have upgraded to 2.32 already, but I guess this is likely to become more common with newer distributions.
Yes if you are building on Linux: Lines 228 to 236 in 2ac21c4
|
…he Ubuntu-based image (apache#2140)
FYI, I modified it to disable CGO on any Linux system. |
OK, hopefully the operator won't be affected by not using any of the |
Hm, maybe let's wait, then? Although there is a problem that affects both the CLI and operator, if the solution may cause unforeseen problems ... then that's not ideal. I think the CGO think would work OK for the CLI, but the container may be a different problem indeed. I'll ping you to brainstorm this tomorrow. |
Totally agree. I'd like to make sure we understand the impacts for the operator. That's the reason I proposed to consider building it against its container image. |
@astefanutti and I talked today about this PR and it seems we have sufficient information to believe the change is safe. I am writing a summary of these, so we can understand the reasoning if the PR turns out to be broken.
3.1 The first one is os/user for which the documentation says: " ... For most Unix systems, this package has two internal implementations of resolving user and group ids to names. One is written in pure Go and parses /etc/passwd and /etc/group. The other is cgo-based and relies on the standard C library (libc) routines such as getpwuid_r and getgrnam_r ... When cgo is available, cgo-based (libc-backed) code is used by default. This can be overridden by using osusergo build tag, which enforces the pure Go implementation ... " 3.2 The second is the resolver on the net package. In this package the documentation states: "... On Unix systems, the resolver has two options for resolving names. It can use a pure Go resolver that sends DNS requests directly to the servers listed in /etc/resolv.conf, or it can use a cgo-based resolver that calls C library routines such as getaddrinfo and getnameinfo ... By default the pure Go resolver is used, because a blocked DNS request consumes only a goroutine, while a blocked C call consumes an operating system thread. When cgo is available, the cgo-based resolver is used instead under a variety of conditions: on systems that do not let programs make direct DNS requests (OS X), when the LOCALDOMAIN environment variable is present (even if empty), when the RES_OPTIONS or HOSTALIASES environment variable is non-empty, when the ASR_CONFIG environment variable is non-empty (OpenBSD only), when /etc/resolv.conf or /etc/nsswitch.conf specify the use of features that the Go resolver does not implement, and when the name being looked up ends in .local or is an mDNS name ..." The binary does seem to be using the cgo implementation:
So, given these findings, we agreed it would be reasonably safe to do this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Disable CGO when building on Fedora to avoid linking issues on the Ubuntu based image (#2140)
Release Note