Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False positive when calling a function using dlsym #911

Open
rnburn opened this issue Feb 2, 2018 · 12 comments
Open

False positive when calling a function using dlsym #911

rnburn opened this issue Feb 2, 2018 · 12 comments

Comments

@rnburn
Copy link

rnburn commented Feb 2, 2018

Code like this

extern "C" {
int __attribute((weak))
		OpenTracingMakeTracerFactory(const char* opentracing_version,
                                             const void** error_category,
                                             void** tracer_factory);
}  // extern "C"

...

  int (*make_tracer_factory)(const char*, const void**, void**) =
      reinterpret_cast<decltype(OpenTracingMakeTracerFactory)*>(
          dlsym(handle, "OpenTracingMakeTracerFactory"));

...

  const void* error_category = nullptr;
  void* tracer_factory = nullptr;
  const auto rcode = make_tracer_factory("1.0.0", &error_category,
                                         &tracer_factory);

gives the error


main.cpp:36:22: runtime error: call to function (unknown) through pointer to incorrect function type 'int (*)(const char *, const void **, void **)'
(liba.so+0x980): note: (unknown) defined here

even though it's the standard way of invoking a function from dlopen.

I put together a more complete example here.

@kcc
Copy link
Contributor

kcc commented Feb 2, 2018

Off the top of my head I don't know if this is an expected/desired behavior of ubsan or not.
We don't use -fsanitize=function widely.
To unblock yourself you can use https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#disabling-instrumentation-with-attribute-no-sanitize-undefined

@rnburn
Copy link
Author

rnburn commented Feb 2, 2018

Undefined behavior sanitizer claims not to produce false positives, yet this is a legitimate use of casts.

When I asked about this on LLVM's IRC channel, they recommended putting in a report for it.

@kcc
Copy link
Contributor

kcc commented Feb 2, 2018

I bet they actually meant to report at bugs.llvm.org (not sure if its easy to register a new account )
Anyway, @zygoloid, WDYT about -fsanitize=function barking on the pointer returned by dlsym?

@rnburn
Copy link
Author

rnburn commented Feb 2, 2018

I can put it there (I already have an account), but I didn't see any category for sanitizer bugs... and isn't this project meant to track these issues with sanitizers?

@kcc
Copy link
Contributor

kcc commented Feb 2, 2018

Yea, we kind of use both trackers interchangeably. This tracker is not-llvm-specific, e.g. gcc issues are also discussed here.

@morehouse
Copy link
Contributor

@zygoloid: Ping.

@zygoloid
Copy link

Sorry I missed this before. I would imagine the problem is that use of RTLD_LOCAL is causing the type_infos for the function types to not match across DSOs. This is likely not just an fsan problem; I'd expect (eg) a function pointer thrown from the DSO to not be caught outside it either.

Maybe Clang needs to check whether a type_info object's type involves any user-defined types and force the use of string comparison for equality if not. Yuck.

We should discuss this on the clang lists, maybe @rjmccall has a better idea.

@rjmccall
Copy link

Just to state Richard's basic point explicitly: there's nothing wrong with this use of dlsym.

I think there are four key points in the analysis of this problem:

  1. dlopen is arguably wrong to interpret RTLD_LOCAL as permitting it to avoid resolving weak symbols in the DSO to entities already in the program. (The Darwin loader seems to do this resolution in my testing, FWIW.) If dlopen did this resolution correctly, this bug would be limited to calls between late-loaded DSOs.

  2. If we take the behavior of dlopen as a given, RTLD_LOCAL can break C++ semantics in a number of different ways, and I don't think it's the ABI's responsibility to try to compensate for that. There is certainly no way to compensate for RTLD_LOCAL's impact on static variables in inline functions or static data members of class templates.

  3. I don't think there's an ABI-compatible way to compensate for this anyway. There is no bit we can set in the standard ABI that will force the use of string comparison for type_info equality, and there is no space to add such a bit that wouldn't break existing compiled code that manipulates type_info objects. If you're considering breaking ABI to try to solve this, you should break ABI to adopt a Darwin-ARM64-like approach that does not rely on vague linkage across DSOs.

  4. UBSan can fix this bug without requiring any ABI changes by simply not using RTTI anymore for the type matching. Just make a string literal with the mangled function type in it or something, and you will magically reduce the binary-size and load-time impact of this feature while avoiding a long chain of headaches.

@pitrou
Copy link

pitrou commented May 20, 2019

This does not seem to happen only with dlsym, but even with std::function (or at least some uses thereof?).

@Flamefire
Copy link

I ran into the same issue, but without the weak attribute: Simply an extern "C" function in a shared library later loaded with dlopen(..., RTLD_LOCAL). In this case it is a simple function of type unsigned int (*)().

Note that this is extremely common for plugin systems where every plugin must provide some common functions which are then dlopen'ed later. Obviously RTLD_GLOBAL must not be used or symbols from different plugins would clash.

@ezyang
Copy link

ezyang commented Dec 13, 2019

We at the PyTorch project also ran into this. More context at pytorch/pytorch#28536 I'm planning to work around it by LD_PRELOAD ing the library in question into Python at startup when I want to run under UBSAN (I have to do this already anyway for the ASAN library, so it's no big deal.)

@chrstphrchvz
Copy link

4. UBSan can fix this bug without requiring any ABI changes by simply not using RTTI anymore for the type matching.

Has this effectively been accomplished in LLVM Clang 17 by llvm/llvm-project@46f366494f3c ?

copybara-service bot pushed a commit to google/dawn that referenced this issue Jun 19, 2024
Workaround for known false-positive: google/sanitizers#911

Bug: chromium:348087176
Bug: chromium:42251292
Change-Id: I9c41b4608f079bd5eef9373e76c1e5ac218fffc1
Reviewed-on: https://dawn-review.googlesource.com/c/dawn/+/194343
Reviewed-by: dan sinclair <[email protected]>
Commit-Queue: Antonio Maiorano <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants