[FEA] RAFT should ensure all its symbols are hidden from shared object libraries #1722

jrhemstad · 2023-08-07T14:41:00Z

Is your feature request related to a problem? Please describe.

As a user of RAFT, I would like to build a shared object library, libA.so, that internally uses RAFT function templates, including __global__ function templates.

Today, RAFT does nothing to hide the visibility of its __global__ function templates or any other host template functions, and by default these symbols have weak visibility. In short, this means if I link two dynamic libraries A.so and B.so into my application that both contain identical instantiations of a RAFT template, then the linker will discard one of the two instantiations and use only one of them. This can lead to disastrous and insidious issues like spurious silent failures.

This issue is true of any header-only, C++ template library, but is particularly bad for CUDA C++ libraries that ship __global__ function templates. Consider this trivial example of one of many ways things can go wrong

The following code has two TUs:

volta.cu compiled with sm_70
pascal.cu compiled with sm_60

Each TU has a single function ( volta() or pascal() respectively) and this function queries and prints the ptxVersion of a kernel<void> using cudaFuncGetAttributes.

These TUs are linked into a program that determines the compute capability of device 0 and invokes volta() or pascal() accordingly.

One would expect that invoking volta() would always print 70 and invoking pascal() would print 60.

However, this is not the case. As described above, the kernel template has weak linkage, and so when linking the volta.o and pascal.o TUs together, the linker selects one of the instantiations of kernel<void> and discards the other.

The end result is that the program will randomly print 60 or 70 depending on which instantiation the linker picked.

// kernel.cuh
template <typename T>
__global__ void kernel(){}

// volta.cu
#include "kernel.cuh"
void volta(){
   cudaFuncAttributes attrs;
   cudaFuncGetAttributes(&attrs, (void *)(kernel<void>));
   printf("%d\n, attrs.ptxVersion*10);
}

// pascal.cu
#include "kernel.cuh"
void pascal(){

   cudaFuncAttributes attrs;
   cudaFuncGetAttributes(&attrs, (void *)(kernel<void>));
   printf("%d\n, attrs.ptxVersion*10);

}

// main.cpp
void volta();
void pascal();
int main(){
   int compute_capability;
   cudaDeviceGetAttribute(&compute_capability, cudaDevAttrComputeCapabilityMajor, 0);
   if(compute_capability >= 70)
      volta();
   else
      pascal();
}

nvcc -c -arch=sm_70 volta.cu
nvcc -c -arch=sm_60 pascal.cu
nvcc -c main.cpp
nvcc -o test volta.o pascal.o main.o
./test

TL;DR:

Given a __global__ function, kernel, with weak linkage (like a template)
Instantiate kernel in separate TUs compiled with different PTX architectures
Link the separate TUs into a single program
The linker will see the two instantiations of kernel as identical and discard one
Invoking kernel() results in potentially executing code you did not expect

Describe the solution you'd like

Luckily the solution is quite simple.

Every __global__ function should be annotated as static
Every host template function should be annotated as __attribute__((visibility("hidden"))).

This makes the symbol hidden in any resulting dynamic library.

For convenience, you'd likely want to use a macro to wrap these like this:

#if defined(__CUDACC__)
#       define RAFT_KERNEL \
            static __global__
#else
#   define CUB_KERNEL
#endif

Additional Context

We've been bitten by this in Thrust/CUB several times over the years.

Thrust/CUB also have the ability to allow users to customize the namespace in order to differentiate the symbols and avoid this problem. However, this solution is not robust. First of all, it requires every user to remember to customize the namespace. Secondly, it's possible for users to properly customize the namespace and still run afoul of the issues that can result.

See:

The text was updated successfully, but these errors were encountered:

jrhemstad · 2023-08-08T15:45:30Z

@cjnolet note that there are a few alternatives to explicitly annotating every function with __attribute__((visibility("hidden"))).

gcc also provides a pragma based solution that allows modifying the visibility of many symbols at once:

#pragma GCC visibility push(hidden)
extern void someprivatefunct(int);
#pragma GCC visibility pop

You can also do the inverse of annotating non-binary APIs with __attribute__((visibility("hidden"))), you could instead annotate your public binary APIs with __attribute__((visibility("default"))) and instead compile with fvisibility=hidden to implicitly make the visibility of every function that is not otherwise annotated be hidden.

There's pros and cons among all these options, so it'll come down to what makes the most sense for RAFT.

See https://gcc.gnu.org/wiki/Visibility

ahendriksen · 2023-08-08T15:53:24Z

Hi Jake, thanks for the excellent write up. It's really clear.

What potential ramifications could it have if "Invoking kernel() results in potentially executing code you did not expect". What I can think of is:

Two libraries use different versions of RAFT.
They are linked into the same program.
The host code of one version of RAFT calls the device code of another version.

Is this a concern?

The failure case you mentioned where within one version of RAFT the cudaPTXversion of a kernel could be different is something that I already expected could happen. I think our dispatch mechanism should continue to work in this case (because it calls cudaFuncGetAttributes on the kernel that it is going to run, not on another kernel).

jrhemstad · 2023-08-08T16:06:48Z

Is this a concern?

Yes. Getting the different PTX versions is just one example. In general, I think "running code that you did not except" is not good.

The most trivial example would be if there was a bug in the older version of RAFT that was fixed in the newer version, and the library using the newer version inadvertently picks up the kernel with the bug.

Another other example I've seen where this causes problems is if in one library invokes runtime APIs to configure attributes of a kernel, but instead the kernel from the other library gets invoked and the kernel configuration is ignored.

Related to issues rapidsai#1511 and rapidsai#1490. Should perhaps make this static (can remove weak linkage) or hidden (should keep weak linkage). (See issue rapidsai#1722)

Fixes issue #1511. Make get_cache_idx a weak symbol (to allow linking multiple symbols) without marking it inline (to avoid compilation warnings that are promoted to errors in nvcc 12). Related issues: - #1490 - #1722 Related PRs: - #1732 - #1492 Authors: - Allard Hendriksen (https://github.com/ahendriksen) - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Artem M. Chirkin (https://github.com/achirkin) - Corey J. Nolet (https://github.com/cjnolet) URL: #1733

This marks all kernels in CUCO as `static` so that they have internal linkage and won't conflict when used by multiple DSOs. I didn't see a single shared/common header in cuco where I could place a `CUCO_KERNEL` macro so I modified each instance instead. While `cccl` went with a `__attribute__ ((visibility ("hidden")))` approach to help reduce RDC size, this approach seemed very invasive for cuco. This is due to the fact that we would need to pragma push and pop both gcc warnings and nvcc warnings in each cuco header so that we don't introduce any warnings. This is needed as the compiler incorrectly state that the `__attribute__ ((visibility ("hidden")))` has no side-effect. Context: rapidsai/cudf#14726 NVIDIA/cccl#166 rapidsai/raft#1722 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yunsong Wang <[email protected]>

jrhemstad added the feature request New feature or request label Aug 7, 2023

cjnolet mentioned this issue Aug 7, 2023

Hiding symbols from shared object libraries #1723

Closed

ahendriksen mentioned this issue Aug 8, 2023

Add dispatch based on compute architecture #1295

Closed

ahendriksen mentioned this issue Aug 10, 2023

Make get_cache_idx a weak symbol with dummy template #1733

Merged

cjnolet added this to VS/ML/DM Primitives Release Board Aug 15, 2023

jrhemstad mentioned this issue Aug 16, 2023

[FEA]: Create tooling to detect improper symbol visibility NVIDIA/cccl#343

Open

1 task

cjnolet mentioned this issue Aug 16, 2023

[FEA] Build static link library artifact #1741

Closed

This was referenced Aug 17, 2023

[Experiment] Compile libraft.so with separable compilation #1747

Closed

[FEA] RAFT should allow customizing its namespace #1766

Open

cjnolet moved this to Todo in VS/ML/DM Primitives Release Board Sep 26, 2023

cjnolet moved this from Todo to In Progress in VS/ML/DM Primitives Release Board Sep 26, 2023

robertmaynard mentioned this issue Oct 12, 2023

Make all cuda kernels have hidden visibility #1898

Merged

robertmaynard mentioned this issue Jan 9, 2024

Mark all cuco kernels as static so they have hidden visibility NVIDIA/cuCollections#422

Merged

vyasr mentioned this issue May 10, 2024

Support dynamic linking between RAPIDS wheels rapidsai/build-planning#33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] RAFT should ensure all its symbols are hidden from shared object libraries #1722

[FEA] RAFT should ensure all its symbols are hidden from shared object libraries #1722

jrhemstad commented Aug 7, 2023 •

edited

Loading

jrhemstad commented Aug 8, 2023

ahendriksen commented Aug 8, 2023

jrhemstad commented Aug 8, 2023

[FEA] RAFT should ensure all its symbols are hidden from shared object libraries #1722

[FEA] RAFT should ensure all its symbols are hidden from shared object libraries #1722

Comments

jrhemstad commented Aug 7, 2023 • edited Loading

jrhemstad commented Aug 8, 2023

ahendriksen commented Aug 8, 2023

jrhemstad commented Aug 8, 2023

jrhemstad commented Aug 7, 2023 •

edited

Loading