-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pyarrow Segfault Fix #7568
Pyarrow Segfault Fix #7568
Conversation
Can one of the admins verify this patch? |
The test script might be a little bit too clever (calling itself), have you considered doing We should also change the comment that contain Other than that it looks good to me! @raulchen Can you check if there is any impact on the streaming system? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
python/ray/__init__.py
Outdated
@@ -37,7 +37,7 @@ | |||
if os.path.exists(so_path): | |||
import ctypes | |||
from ctypes import CDLL | |||
CDLL(so_path, ctypes.RTLD_GLOBAL) | |||
CDLL(so_path, ctypes.RTLD_LOCAL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably leave a comment here pointing to the issue this fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely will add!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ijrsvt Could you give an example which symbols conflict? We used linker script ray_exported_symbols.lds
and ray_version_script.lds
to limit exported symbols. And using RTLD_GLOBAL
by purpose so that _streaming.so can using symbols in _raylet.so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conflicting symbols were in the GRPC library. The problem only showed up on Linux, and when pyarrow
was imported before ray
. The specific symbol that was segfaulting was: google::protobuf::internal::AssignDescriptors
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give more complete detailed log for this symbol segfault. I tink we didn't expose protobuf symbols in ray_exported_symbols.lds/ray_version_script.lds
. And this import order issue exists before RTLD_GLOBAL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chaokunyang Can you look more into this and why the symbols are conflicting despite the version linker script? This is blocking the current release I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pcmoritz I'm looking into it. There may be some symbols exported unexpectedly despite of the version linker script, like template. If that's the reason, maybe we can use __attribute__((visibility(....)))
. If not, we'll need more time for this.
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test PASSed. |
Test FAILed. |
Hi @ijrsvt , I used Python 3.7.5, ray 0.8.2, pyarrow 0.16.0 in a Ubuntu 19.10 docker container. I can't reproduce your issue. I also searched symbols in _raylet.so using I test ray 0.82 with pyarrow 0.16.0/0.15.0. Both of all works well. But when I use pyarrow 0.14.0, here is a segment fault. I think it's a issue of pyarrow? |
@chaokunyang I can't reproduce it either (and nor can the original bug reporter). I reverted the |
Test PASSed. |
@ijrsvt After merging the PR it is causing an error, can you look at this? |
""" | ||
import subprocess | ||
|
||
TESTED_LIBRARIES = ["pyarrow"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mildly surprised that this test passes in our CI. Is pyarrow installed in our CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't fail if pyarrow
is installed, only if there is some strange symbol collision. There was initially a fix to limit the scope of exported symbols, but that was removed after it became un-reproducable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I agree it doesn't fail if pyarrow
is installed. I just thought that pyarrow
isn't installed in our Travis tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh okay! Let me check that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added Pyarrow to the dependencies. It looks like the error is a segfault again :/
https://travis-ci.com/github/ray-project/ray/jobs/308220125
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this pyarrow version 0.14?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is 0.16. I added the version to the debug info, so it shows up in this test.
|
||
|
||
def test_imports(): | ||
def try_imports(library1, library2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ijrsvt slightly simpler way to do this (and I think will give a better error message)
subprocess.check_output(["python", "-c", "import {}; import {}".format(library1, library2)])
subprocess.check_output(["python", "-c", "import {}; import {}".format(library2, library1)])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh okay. I'll switch to this!
…ay-project#7805)" This reverts commit eb61036.
Why are these changes needed?
Ray will segfault if pyarrow is imported before it because exported symbol are colliding . This fixes that bug and adds a test to ensure that future changes do not reintroduce it.
Related issue number
Fixes Issue #7393
Checks
scripts/format.sh
to lint the changes in this PR.