-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zlib-ng: Illegal instruction on riscv64gc-unknown-linux-gnu #200
Comments
Thanks for the detailed report and for providing enough details to reproduce the issue. To me it seems that Lastly, |
Yes. Assuming this is zlib-ng/zlib-ng#1705, which I believe to be the case, there is a patch that has been reported to work, though as far as I know it has not yet been offered as a PR. This is as detailed in zlib-ng/zlib-ng#1705 (comment) and preceding comments.
When building this crate, is there a way to pass configuration variables to This would provide a way to make this crate work while neither changing the source code of If this is not feasible, or if it is feasible but not easy, then should a feature be added here for it, or something? (But maybe this really is quite easy and I am just not aware of it.)
Achieving the effect of |
A great idea, this is absolutely possible!
The
Now I have hopes that this can be fixed here, and if you say it will remain useful even if Thanks again for all your great work! |
When compiling binaries to be distributed or otherwise to be run on another system, one may want to insist that RVV instructions be emitted, or insist that they not be emitted. Especially since an upstream fix should eventually take care of the auto-detection case, I think it would be best for a change here to allow it to be manually overridden. This would not necessarily preclude implementing auto-detection here as well. But the appraisal of "easy enough" might be an underestimate, considering that the difficulty and subtleties involved in checking this are the cause of the upstream bug. (Detection is attempted, but the result is not always correct.) Furthermore, manually overriding it is really the capability that would remain useful here even after the upstream bug is fixed. Since arbitrary logic, including logic specific to building particular architectures, could go in Is it acceptable if I add (Auto-detection done here could then be by an |
Thanks for the update.
Yes, I'd also do it with a cargo feature, which would also have to be additive. Of course it's possible to add logic to allow |
I've noticed that this project introduces the nonstandard configuration name Why was this done for that, rather than using a feature? Does it mean I should consider introducing other such nonstandard configuration names for overriding RVV detection, rather than using features? That seems like a wrong thing to do, but since I am not clear on why it was done for
Do you mean that the added features should be minimal and thus not try to support I think that, outside of architectures where it makes a difference, the new features could be no-ops and supplying them together could be permitted, while prohibiting it on RISC-V. Whether that's the best way to do it, I am not sure. |
I'd hope Git has information on this, as I myself joined late enough to not know anything on how things came to be, unfortunately.
All cargo-features should be additive so that |
I'll look into it and let you know what I find.
I'll make just a feature to force RVV to be turned off, since that's the specific problem right now. |
I've opened #218 to add the |
zlib-ng 2.2.2, which includes the changes from zlib-ng/zlib-ng#1770 that attempt to fix or mitigate zlib-ng/zlib-ng#1705, has been released. 2.2.2 also carries other improvements, including zlib-ng/zlib-ng#1773 (for zlib-ng/zlib-ng#1772). So I'll open a PR to bump the submodule version shortly. [Edit: I have opened #219 for this.] Because the zlib-ng submodule of this project is currently at 2.2.1 (since #211), and going from 2.2.1 to 2.2.2 is a patch change, I expect upgrading to be straightforward, and building locally suggests that no further changes may be needed. To be clear, I am not sure whether new versions of this project's crates should be released at this time, which I think could be decided separately. It seems to me that bumping the submodule to the 2.2.2 tag may be useful either way. But it doesn't look like that fully fixes zlib-ng/zlib-ng#1705, and in particular it does not appear to fix this issue on the RISC-V test machine I am using. It does change the problem though. With zlib-ng 2.2.1, the crash occurs in With zlib-ng 2.2.1:
With zlib-ng 2.2.2:
When I wrote this issue description, I showed a To test 2.2.2, I used the same procedure as in #218--in particular, I used the modified Full backtrace with zlib-ng 2.2.1:
Full backtrace with zlib-ng 2.2.2:
|
I've posted zlib-ng/zlib-ng#1705 (comment) to inform the upstream project of the SIGILL during auto-detection (which can also be observed when running its test suite on the test machine I have been using). |
Note that, while I wrote the description in #218 in such a way as to cause this to be closed as completed when it was merged, that is really a workaround. The problem described here is really the upstream bug zlib-ng/zlib-ng#1705, which remains open. |
This strongly resembles #148 (GitoxideLabs/gitoxide#955) but affects riscv64 rather than x86_64, seems to happen every time rather than sporadically, and may be a case of a known zlib-ng bug, zlib-ng/zlib-ng#1705.
What happens
Both
dev
andrelease
builds are affected. On a 64-bit RISC-V machine running Ubuntu 24.04 LTS, runninggix clone
begins the download but is always terminated withSIGILL
(as indicated by the message text and as can be inferred from the exit status):Passing
--trace
does not help, since it doesn't get far enough to print the traced output, and since the program is immediately being terminated by the system, rather than hitting a panic and being able to unwind. The prompt printed afterwards is interleaved with the displayed progress due to the sudden way the program was terminated. (The displayed exit status, here 132, is part of the prompt.)Getting a backtrace using
gdb
Running it in a debugger and printing a backtrace after the crash provides far more information:
This shows that the problem happens in
zlib-ng
and more specifically inlibz-ng-sys-1.1.15/src/zlib-ng/arch/riscv/adler32_rvv.c
, which through thezlib-ng
repository resolves through the git submodule to this file.Likely explanation
This looks a lot like a case of zlib-ng/zlib-ng#1705. My kernel is old enough to trigger that bug:
Furthermore, that issue mentions the problem happening in starship, and I wonder if starship might even be triggering the crash in the same way through one of its
gix-*
dependencies.Perhaps this issue is a victim of its own success and should be closed, to become (and be referenced by) a new comment on zlib-ng/zlib-ng#1705. But I figured I'd start by opening this, in case more is known or can be discerned here, in case it might somehow help with #148, and in case there's anything to be done about it anywhere else (such as gitoxide).
Simplified reproduction
To check if the problem is occurring, and to check that it happens in
ein
as well asgix
, thegix status
andein t h
commands can be used:The text was updated successfully, but these errors were encountered: