-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pydrake: Permit granular Bazel targets (linking analysis, or providing granular shared libs) #6760
Comments
I've been tinkering with a more modular Python setup to avoid the present monolith workaround and avoid this @jwnimmer-tri @jamiesnape Just to check if I'm on the right track, behind bazelbuild/bazel#492, is this also one of the constraints that makes developing with more modular shared libraries a headache? (I faintly remember a mention of shared libraries on Mac being an issue, and see the mention in I ask because I was able to get some toy Python pybind libraries to play more nicely within Bazel on Linux by just using It seems that Tensorflow's solution is to isolate it in a container, handcraft your Python setup, or disable SIP (which does not seem like a great solution unless you're air-gapped...) Of course, that's contingent upon how 492 also affects this. (On Linux, at least for Bazel 0.6.1-produced libraries, it seems fine, so perhaps externals can be re-written to support this workflow?) |
Unfortunately, I don't have any details on what those challenges were. I just remember David fighting with it for days, from which my lesson was "test all shared-library-related PRs on Mac before merging". |
Honestly, FYI @fbudin69500. |
Gotcha, thank y'all for the explanation!
At least for Python / I did try
Does this mean we'd run into the same SIP-related error for |
This is with the Homebrew Python? |
I guess not. Doing EDIT: That worked! Thanks! |
It must be new in Bazel 0.6.1 that (prepending the Homebrew Python to the PATH) actually works: #6686 |
Ah, that makes more sense now, thank you for pointing that out! Also, looking more at the Bazel issue on transitive linking, it looks like one option for development would be to always use shared linking, but once we want to start installing artifacts, we force For Python, in The main caveat is installed binary dependencies, but meh. |
I have prototyped two things: (a) adding some sort of I will test this on Mac, ensure that it does not mess anything up there. Prototype WIP branch: https://github.com/RobotLocomotion/drake/compare/master...EricCousineau-TRI:feature/py_system_bindings-wip?expand=1 |
After tuning some of the Bazel-built / -linked external libraries to use more consistent linking options, this approach now works in Ubuntu and on Mac in non-devel (including installation) and devel mode. My last item is to ensure that this does not violate ODR in either development or installation mode, and I will test this using If this is successful, I would like to start using this setup as soon as possible in EDIT: ODR test passes and works as expected. When using "development mode", but using static libraries (present state), the ODR test fails - creating two Variables from two separate modules increments two separate instances of |
Proposed PRs: (numbers are separate steps; letters indicate possible parallelism) 1.a. Merge
2.. Separate 3.. Teach |
@EricCousineau-TRI Concerning the SIP error because of the relative path, I am about to submit a patch which I think will solve this issue. It will remove the relative rpaths that have been added at build time and therefore are irrelevant after install. I'll add the PR# below. |
Awesome, thanks! |
Here is the PR I was talking about: #7301 . It may not solve all your problems, but it may help. |
FYI, I think the problem on Mac is fixed. See bazelbuild/bazel#507 and bazelbuild/bazel#3450. I think the patch was applied to I've made the following changes to test it on my mac. I've checked that diff --git a/tools/skylark/drake_cc.bzl b/tools/skylark/drake_cc.bzl
index e42302a8e..1c45948a9 100644
--- a/tools/skylark/drake_cc.bzl
+++ b/tools/skylark/drake_cc.bzl
@@ -310,7 +310,7 @@ def drake_cc_library(
deps = [],
copts = [],
gcc_copts = [],
- linkstatic = 1,
+ linkstatic = 0,
install_hdrs_exclude = [],
**kwargs):
"""Creates a rule to declare a C++ library.
diff --git a/tools/workspace/gtest/gtest.BUILD.bazel b/tools/workspace/gtest/gtest.BUILD.bazel
index 4cf13b543..239567be7 100644
--- a/tools/workspace/gtest/gtest.BUILD.bazel
+++ b/tools/workspace/gtest/gtest.BUILD.bazel
@@ -42,7 +42,7 @@ cc_library(
# This is a bazel-default rule, and does not need @drake//
"@//conditions:default": [],
}),
- linkstatic = 1,
+ # linkstatic = 1,
visibility = ["//visibility:public"],
) P.S. I used |
libdrake.so
I've been writing code that uses drake as an external and I have been creating pybindings alongside the C++ code. I keep running into weird bugs when building the project due to linking with |
Perhaps to tackle this from a different angle, @jwnimmer-tri had suggested a ways back to perhaps let Bazel analyze what would needed to be stripped from linking to avoid the dreaded diamond dependency problem. I may file another issue as an alternative route, or rename this to be more general, like I'm motivated by this because it's painful in Anzu to have C++ code that could be bound in Python quite easily, but we don't because there's overhead in having to avoid ODR violations from linking both static and shared lib portions of Drake. \cc @kunimatsu-tri @thduynguyen @siyuanfeng-tri EDIT: Yeah, I'm just gonna rename this issue. EDIT 2: For diamond deps, could also do whatever is necessary to make a re-compiled version of the library that is for shared linking (Cc sources recompilation). |
libdrake.so
I would like to see https://github.com/bazelbuild/rules_cc/blob/master/examples/experimental_cc_shared_library.bzl become supported (non-experimental) before we delve too deeply into more than one shared library inside Drake. |
Can I ask what the technical benefits are for waiting on But if it's just a moderate improvement in semantics (e.g. fewer wrapper rules), could we perhaps timebox how long we wait? Ideally, I'd like to start carving up our shared library parts starting at the end of January / start of Feb. |
Looking at the code, it seems like it does solve the diamond dep problem? (or at least, fail fast if there's overlap?) I would like to start tinkering with it, even if it's experimetal, starting at that time. (Assuming that it works, of course.) |
Correct, it (I think) deals with diamond deps / ODR. I am loathe to have us try to deal with ODR bespokely -- we need starlark tooling for that, and I'm hoping we can just use upstream tooling instead of writing out own. We are also going to have to do something about circular package dependencies inside libdrake before we can finish carving it up. |
As the alternative to solving ODR, we should consider defining the public API for our Bazel targets, and stop using a default visibility of public (if that is possible). |
One thing we could consider is trying to accelerate local development -- that during The other related idea is that we could try to modularize Drake itself (by having a few large C++ shared libraries, and having pydrake then align with those). The discussions #15735 relate to this issue somewhat. I'd rather just leave that to whatever is best for C++ though, and then Python can ride that wave if we like. I don't think the Python build system alone is enough to justify a C++ library rework. Given all of that, I'd like to advocate that we close this issue. I don't see any good bang for the buck on the horizon here. WDYT? |
Yup, SGTM to close it - thanks for the analysis and related issues! |
UPDATE (2019/08/22):
Per this comment below, I've renamed this to address the real problem, which is that
//bindings/pydrake
is (properly) encapsulated due to the linking of monolithiclibdrake.so
.At some point, I (and others) want to consume parts of Drake for Python (e.g. just
//math
). I hate having to build all oflibdrake.so
just so I can importRigidTransform
. This also makes ODR issues a pain when doing work in Anzu or other downstream projects (like what Mark mentioned below).OLD Discussion:
The goal of PR #6465 was to mitigate duplicate global variable definitions by relying on compile-time linking to resolve duplicates and consolidating it to link everything to
libdrake.so
.(The original issue was caused by the original
pydrake
bindings which dynamically loaded individual*.so
s -- which would not see each other's internal symbol linkages -- that had redundant dependencies.)There may be a viable workaround, using
dlopen
sRTLD_GLOBAL
flag to prevent duplicate symbols (only for thedrake
libraries, and possibly the upstream dependencies).Proofs of concept:
-rdynamic
,RTLD_GLOBAL
, and potentially reloading a*.so
to be global (if we kludge up too much by switching Python's import mechanisms)pybind11
Case: example output in code directoryNOTE: Producers are singletons, consumers are, uh, things calling the producers from different
*.so
files.Motivation: I'd like to be able to be able to test out Python bindings (for development and algorithm development) without having to recompile the entirety of
libdrake.so
, especially if tinkering with core components with tons of downstream dependencies :(Potential caveats:
RTLD_GLOBAL
is enabled, possibly due to symbol collision. This may cause those errors if we are not conservative with how we load this.\cc @soonho-tri @jwnimmer-tri
The text was updated successfully, but these errors were encountered: