-
Notifications
You must be signed in to change notification settings - Fork 15.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation Fault (Segfault) on the ruby gem #4460
Comments
Could you include all steps to reproduce the bug? |
I can reproduce it in docker now. |
@pherl, I can reproduce the bug with the gem downloaded from ruby gem repository. However, I cannot reproduce the bug with the gem locally built on alpine3.7 docker container. Is cross-compiling related to the issue? |
@larribas Can you try install locally built protobuf gem? If you have difficulty building it, I can show you. |
Just tested it. It does look like a cross-compilation bug. Here's what I did: Within
Let me know if I can help with anything else |
I am having a very similar issue: |
I tested 3.6.x branch. Unfortunately, the linux-x86_64 gem still doesn't work on alpine.
Feel free to reopen this issue if this workaround is unacceptable. |
Hello, thanks for looking into this. Your workaround is not working for us. Same segfault error. Here is our dockerfile:
Were you suggesting we try alpine 3.6.x? Anything else you can think of trying? |
Did you try |
Just tried it, same error :(
|
The segmentation happens after the docker file is executed or at the place of calling |
The segmentation occurs after the build, when I try to run the rails app itself (instantiate the class that requires google-cloud-pubsub). Thanks again for all your timely help! |
@TeBoring - I've recreated our bug in minimal form, including an attempt to follow the workaround @larribas demonstrated. https://github.com/BenefitsDataTrust/tmp-protobuf-segfault To replicate, just clone, drop a creds.json file in the repo, and docker build . |
(I added your workaround suggestion in the ./teboring subdir if you want to check that out as well.) |
@jeffdeville, thanks, I'll take a look. |
@TeBoring any updates on this issue? |
Hi guys, Any workaround for this issue? |
Hi guys, facing the same issue. Any fix/workaround available? @TeBoring |
Same issue with Tried: into Docker, no luck( |
here is an example: https://github.com/OpakAlex/reproduce-google-protobuf-gem-issue for reproduce bug |
Hi @OpakAlex have you found a solution in the meantime? Thanks |
In my case the error occurs with the following Dockerfile: FROM ruby:2.5.3-alpine
RUN gem install google-protobuf --version=3.8.0
CMD ruby -e 'require "google/protobuf/any_pb"' whereas with the following it doesn't occur: FROM ruby:2.5.3-alpine
RUN gem install google-protobuf --version=3.11.4
CMD ruby -e 'require "google/protobuf/any_pb"' Unfortunately, [edit] Anyway |
Try |
I am facing the same issue, segmentation fault. I've tried @nhattan's solution with using |
I've just tried it again with |
@rnnds I'm not sure if you've seen my follow up, but it ended up actually not working for us. Just to confirm this is without |
@DawidJanczak sorry, you are right I didn't notice your message and I also got an error. I definitively solve it using a different image:
|
This is not going to be a problem-solving comment, so if you're looking for solutions, please disregard. If you're disgruntled like me, maybe it'll confirm your own biases. Regardless, I'm putting it here in the hopes that someone reads it and maybe something changes. TLDR: A protocol buffers dependency is absolutely hamstringing us and we will be leaving Google Cloud due to this issue. The long version: protobuf is a dependency for background jobs on our Cloud Run instance (via Cloudtasker). I do understand that protocol buffers saves overhead in the long-run vs. JSON. I know the benefits, and they sound absolutely wonderful for a massive system at-scale. Considering, however, how much time we've spent chasing this issue and, when we do solve it, sitting around waiting for Docker images to rebuild gRPC and protobuf from scratch, we are not seeing any benefit. Serializing is a solved problem, and protocol buffers should not be an exception. It needs an implementation that is lightweight and performant; this does not seem like a heavy lift if Google expects us to use it in order to use their cloud offerings. The Rust implementation might be a good place to start. Or why not distribute a pre-built Go version? That would be more in Google's wheelhouse. Whatever the case, stuff like this is going to cost Google customers. And we are one of those customers. And that's a real bummer - both for us in time and money wasted on this, and for Google in customers lost. |
I've also left GCP for this and a few other reasons. It's a pretty ridiculous reason to lose customers. |
Unlike python which allows us to build different wheels for different platforms, gem only allows us to upload a single version to support all platforms. For that reason, we have no way to provide a gem which supports alpine but also supports the other platforms. So, for now, the only solution is to build gem locally on alpine from our source code. rake-compiler/rake-compiler-dock#20 (comment) Hopefully, gem could do similar thing like wheel in python (might raise issue in gem instead :)). |
Thanks for the note, @TeBoring - I appreciate your position here, and totally understand there are technical limitations at the Ruby layer. This is probably a more appropriate discussion to have about the protobuf library itself, since it is what is segfaulting. While Rubygems forcing a re-build slows builds down, it's by no means a dealbreaker; the library being built but not functioning, on the other hand, is. I do want to say that I don't post comments like the above lightly. I am grateful daily for anyone's open-source contributions, and don't want to come across as a "choosing beggar." If I wasn't paying my hosting provider to force the technology on me, I would never dream of criticizing, even implicitly, the contributions of folks such as yourself. Thanks for everything you do! Flip |
@flipsasser (and others): In this case, the protobuf library is only surfacing a deeper issue, which is the platform ABI. The libstdc++ docs include an explanation of the API/ABI interplay which (IMO) is quite clear and well-written, at least relative to the topic, but also a bit C++-centric. The ABI-related issue with Alpine is actually deeper than C++, though: Alpine uses musl libc instead of glibc . In fact, this is one of the key selling points on the Alpine linux homepage. However, despite the benefit to the Alpine ecosystem, this also means glibc-based binaries (and libraries) are generally not drop-in compatible with Alpine Linux. For prebuilt protobuf libraries targeting x86_64 CPUs and the Linux kernel, protobuf prebuilts further use a GNU-based target -- i.e., one where the library named Ultimately, for prebuilt binary libraries, we must choose an implementation and version of libc consistent with all other runtime libraries and the running process itself. Unfortunately, this means that protobuf prebuilts targeting glibc are rendered incompatible with Alpine Linux, because they disagree at a minimum on the lowest-level [2] runtime library. As @TeBoring pointed out, building from source does not tie you to the target OS of our prebuilt binary libraries. The protobuf sources are source-compatible with Musl, and hence Alpine, similar to how they are source-compatible with macOS and Windows targets. (I do, however, appreciate that Docker surfaces the build overhead cost quickly and often.) Alternatively, as @nhattan and @rnnds point out, there are excellent binary compatibility shims you can use to run glibc-targeted binaries under Musl; and as @rnnds also points out, switching away from an Alpine-based image altogether to a Debian-based image also eliminates the issue. At the risk of droning on, I would also point out that Alpine+Musl's differing ABI is not particularly new, nor is it unique to protobuf, Google Cloud, Ruby, or even C/C++... a quick web search shows other projects which have tripped over various ABI aspects when targeting Alpine and/or Musl. It is certainly a tricky topic, though, and one of my least favorite parts of Linux build engineering. [1] The |
Thank you for the deeper dive, @dlj-NaN! This is super helpful for understanding the scope of the challenge in distributing protobuf on as many systems as possible. I'll be the first to say that, at a technical level, protobuf is well above my paygrade to criticize, as well as a wonderful and essential piece of engineering. Or rather, I assume it's wonderful; Clarke's 3rd law applies for me in this case! 😃 I readily concede that Alpine's choice of musl is causing the headache and that the fault lies there (otherwise we wouldn't have added gcompat to our Having considered your perspective on this, I'm hoping you'll be willing to consider mine. As a level-set, it's worth clarifying that we do not approach this issue from a purely technical perspective. My goal is not to argue that "Alpine is right" or "musl is the best." I'm also not attempting to compare it to other serialization technologies like JSON. I don't know enough about C++ or streaming deserialization or whatever API/ABI interplay even is to make a big, technical case for doing anything in any way differently. Instead I ask you to consider that we, as a small, open-source project funded by grants, must contend with a fiscal reality in addition to a technical one. From where we stand, we pay (from a limited budget) for the use of GCP, and GCP, in turn, forces the technical decision to use protobuf on us. Our expectation is therefore that protobuf not throw a wrench into our process. We consicously chose Alpine because it is widely used and extremely lightweight, two benefits which reduce costs and speed up build processes. And for most of our project's life, that reasoning has proved sound. In fact, right up until we added a dependency on protobuf (which was well after we cut over to GCP from Heroku), we experienced exactly zero Alpine-specific problems (in fairness, we have since experienced one issue related to timezone data). We therefore view adopting a different OS (and re-writing and re-testing all of our Docker configuration) to be a non-starter. From our perspective, we are being asked to invalidate primary technical decisions on behalf of a tertiary dependency that we are required to use in exchange for paying Google. That the technology causing this dissonance solves what feels, to us, like a solved problem (in this case serialization) is all the more frustrating. That frustration is compounded by the fact that there are other quick-compiling, light-weight, portable dependencies all over our application. By way of comparison, here are some benchmarks for building fairly high-performance and complicated pieces of software on my 4-year-old iMac: Redis: builds from source in 13 seconds
PostgreSQL: builds from source in 2 minutes and 4 seconds
Protobuf: builds from source in 6 minutes and 31 seconds (3x as long as PostgreSQL and 30x as long as Redis)
Say what you will about the value of comparing benchmarks, this illustrates my point: protobuf appears to take a very long time to build, and (as the ongoing conversation in this Github issue clearly illustrates) still frequently segfaults after it has built, on one of the most popular operating systems in the virtualization ecosystem. And so, to your customers, and regardless of Alpine's C++ std library choices, protobuf feels like a difficult-to-use dependency compounded by frustation at the fact that it feels unnecessary in all but the highest-volume use cases. Paying a vendor, only to be forced into making big technical decisions for the wrong reasons in order to use a serialization technology that feels redundant, overpowered, or both, is a huge, huge, huge bummer. Please, please, please do not read this as an attack. I'm confident there are perfectly valid reasons protobuf is written in C++ and takes a very long time to compile, and you've exhaustively convinced me that protobuf and Alpine are not destined to play nicely any time soon, for understandable, fair, and valid reasons. I am merely trying to illustrate what it feels like being forced to use it, and again, I would never dare offer even comparisons like the above if I weren't paying money to do so. Ultimately it's our fault for selecting GCP without planning far enough down the road. Had we considered that we'd eventually need a background worker solution, we could have selected a provider who offers such a feature without workarounds involving disparate services like Cloud Tasks and long-running HTTP requests. And we're willing to pay the price for that mistake, but for our money, that price is not rebuilding our infrastructure to use a different OS because Google said we should; it's rebuilding our infrastructure with a different hosting provider. And that is specifically because Google made it too difficult to use theirs and we no longer felt we could trust them not to do it again. Thank you for all of your contributions to the cause, and I don't take lightly that protobuf is probably at the core of many of the systems I take for granted on a daily basis. It's simply not a good solution for us. I'd be happy to chat more about this over email or on the phone; my contact information can be found on my website, which is listed in my Github profile. Flip |
Hi all it took me long to fix it on our project but i ended with this:
No segmentation fault and all running ok. |
@ltchurik can confirm it solves the issue |
Although forcing the ruby_platform works, all gems with compiled native extensions are recompiled during bundle install, taking way more time to install than normal. Do we have an alternative? |
…uby/docker/alpine It now builds and runs without any issues with the google/protobuf gem. See also, protocolbuffers/protobuf#4460, protocolbuffers/protobuf#3509.
thanks for this @ilyutov. Worked for me |
This work for me too after two days of pains, my server almost died :(. But I will be much grateful if the author @nplusp give some explanation on it :-) |
Thank you for your information. In my environment, I was able to fix as follows. gem "google-protobuf", "3.25.0", force_ruby_platform: true |
Hello there,
The following segmentation fault pops up upon running
ruby -e 'require "google/cloud/trace"'
inside of a Docker container based on
ruby:2.5-alpine
(alpine 3.7.0)Gemfile.lock (partial)
Ruby Segfault Dump
The text was updated successfully, but these errors were encountered: