Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid native function pointer (Polyglot::ForeignException) with latest libxml2 2.11.0 #3031

Closed
flavorjones opened this issue May 2, 2023 · 9 comments

Comments

@flavorjones
Copy link
Contributor

I need some assistance understanding an issue I'm seeing with nokogiri and libxml2 2.11.0 (just released).

The error I'm seeing with truffleruby 23.1.0-dev-17ec84b1, like ruby 3.1.3, GraalVM CE Native [x86_64-linux] is

/__w/nokogiri/nokogiri/tmp/x86_64-linux/nokogiri/3.1.3/tmp/x86_64-unknown-linux-gnu/ports/libxml2/2.11.1/libxml2-2.11.1/threads.c:143:in `xmlInitMutex': Invalid native function pointer (Polyglot::ForeignException)
	from /__w/nokogiri/nokogiri/tmp/x86_64-linux/nokogiri/3.1.3/tmp/x86_64-unknown-linux-gnu/ports/libxml2/2.11.1/libxml2-2.11.1/globals.c:5[9](https://github.com/sparklemotion/nokogiri/actions/runs/4863778294/jobs/8671986447#step:7:10):in `xmlInitGlobalsInternal'
	from /__w/nokogiri/nokogiri/tmp/x86_64-linux/nokogiri/3.1.3/tmp/x86_64-unknown-linux-gnu/ports/libxml2/2.11.1/libxml2-2.11.1/parser.c:14292:in `xmlInitParser'
	from /__w/nokogiri/nokogiri/ext/nokogiri/nokogiri.c:183:in `Init_nokogiri'

Link to CI failure

I can easily reproduce this using the container image ghcr.io/sparklemotion/nokogiri-test:truffle-nightly but can't reproduce it on my dev host machine with the same version of truffle (installed via ruby-build).

The code in question isn't obviously passing around an invalid function pointer:

void
xmlInitMutex(xmlMutexPtr mutex)
{
#ifdef HAVE_POSIX_THREADS
    pthread_mutex_init(&mutex->lock, NULL);
#elif defined HAVE_WIN32_THREADS
    InitializeCriticalSection(&mutex->cs);
#else
    (void) mutex;
#endif
}
static xmlMutex xmlThrDefMutex;

/* ... */

void xmlInitGlobalsInternal(void) {
    xmlInitMutex(&xmlThrDefMutex);
}
struct _xmlMutex {
#ifdef HAVE_POSIX_THREADS
    pthread_mutex_t lock;
#elif defined HAVE_WIN32_THREADS
    CRITICAL_SECTION cs;
#else
    int empty;
#endif
};

typedef struct _xmlMutex xmlMutex;
typedef xmlMutex *xmlMutexPtr;

Valgrind doesn't catch anything in native CRuby, and I can't actually reproduce this outside of the test container, so it's not reproducing in all TruffleRuby versions, either.

I'd love if someone could help me understand if there's a subtle bug in libxml2 that needs to be fixed, or whether this is something to do with the CI container environment, or if this is perhaps a Sulong issue?

@eregon
Copy link
Member

eregon commented May 2, 2023

It seems Invalid native function pointer can be thrown when the function pointer is null, but also for some cases which are less clear to me.

The line in the stacktrace is https://github.com/GNOME/libxml2/blob/v2.11.1/threads.c#L143
I suspect the pthread_mutex_init is problematic, together with https://github.com/GNOME/libxml2/blob/v2.11.1/threads.c#L66

And https://github.com/GNOME/libxml2/blob/v2.11.1/threads.c#L28-L30 seems a likely difference between host and container.

The nokogiri-test:truffle-nightly container has GLIBC 2.31, so it doesn't use __libc_single_threaded but the more hacky logic with weak symbols:

root@14974608ec32:/# ldd --version
ldd (Debian GLIBC 2.31-13+deb11u6) 2.31

I suspect Sulong doesn't handle these weak symbols correctly.
Maybe it detects pthread_mutex_init is available but somehow when reading the function pointer it gets 0 or fails to read it?


Is there maybe a way to configure libxml2 so it links to pthread or considers it loaded?
We'll always have pthreads loaded for TruffleRuby and for any Ruby implementation (on non-Windows).

@flavorjones
Copy link
Contributor Author

Great catch. I'll explore configuring (or patching) libxml2.

For posterity, the change to libxml2 around thread detection is described here: https://gitlab.gnome.org/GNOME/libxml2/-/issues/427

flavorjones added a commit to sparklemotion/nokogiri that referenced this issue May 2, 2023
Because libxml2 2.11 changed how it does thread support detection and
we're about to monkey with it in a patch.

For context, see:

- https://gitlab.gnome.org/GNOME/libxml2/-/issues/427
- oracle/truffleruby#3031
flavorjones added a commit to sparklemotion/nokogiri that referenced this issue May 2, 2023
because the weak symbols won't work with truffleruby, for more context
see:

- oracle/truffleruby#3031
@flavorjones
Copy link
Contributor Author

Attempt 1 to work around this was to patch threads.c to remove the weak symbol bit and set XML_IS_THREADED to 1 ... but saw:

ld.lld: error: undefined reference due to --no-allow-shlib-undefined: pthread_getspecific
>>> referenced by /nokogiri/ports/x86_64-linux/libxml2/2.11.1/lib/libxml2.so

Attempt 2, then, was to add -lpthread to the linkline in Nokogiri's extconf.rb when running TruffleRuby.

sparklemotion/nokogiri@4d5486d

That seems to do the trick. This is running through CI now at sparklemotion/nokogiri#2866

@flavorjones
Copy link
Contributor Author

flavorjones commented May 3, 2023

Unfortunately, that change broke most of the linux CRuby builds and did not fix the --enable-static TruffleRuby build (though it did fix the --disable-static build). I'll try to find another workaround tomorrow.

@eregon
Copy link
Member

eregon commented May 3, 2023

Sorry for the complications, libxml2 does way too many hacky things there, and unfortunately that seems to result in hitting a bug in Sulong.

It seems CRuby and --enable-static TruffleRuby fail in the same way, probably because it's not linking to libpthread for the static build:

CRuby:
  CCLD     libxml2.la
  CCLD     xmllint
/usr/bin/ld: ./.libs/libxml2.a(libxml2_la-threads.o): undefined reference to symbol 'pthread_getspecific@@GLIBC_2.2.5'
/usr/bin/ld: /lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
TruffleRuby --enable-static:
  CCLD     libxml2.la
  CCLD     xmllint
ld.lld: error: undefined symbol: pthread_getspecific
>>> referenced by threads.c:566 (/__w/nokogiri/nokogiri/tmp/x86_64-linux/nokogiri/3.1.3/tmp/x86_64-unknown-linux-gnu/ports/libxml2/2.11.1/libxml2-2.11.1/threads.c:566)
>>>               lto.tmp:(xmlGetGlobalState)

Maybe the append_ldflags("-lpthread") doesn't affect static libraries?

Also it's under if truffle?, so of course it does not apply on CRuby. It should be safe to do it on CRuby too, and just skip it on Windows.

Instead of the && false you could use && !defined(TRUFFLERUBY) if you want to change it only for TruffleRuby.

@eregon
Copy link
Member

eregon commented May 3, 2023

Maybe the append_ldflags("-lpthread") doesn't affect static libraries?

I have found that for some libraries built as part of TruffleRuby, I pass -pthread in both cflags and ldflags, maybe that helps.

@eregon
Copy link
Member

eregon commented May 3, 2023

In #3031 (comment) I think the issue is it fails on the CCLD xmllint step, NOT on creating the static libxml2 (which worked fine, I think).
And that makes sense since a static library can't have dependencies or link to another library or so (AFAIK).

So either skipping building xmllint or passing -lpthread to it should work.
The latter might be possible by setting BASE_THREAD_LIBS or so, as mentioned in https://github.com/GNOME/libxml2/blob/v2.11.1/threads.c#L52.

@flavorjones
Copy link
Contributor Author

flavorjones commented May 3, 2023

OK, I reset and discovered that there are two things going on:

  1. libxml2's threads.c in v2.11.0+ is buggy and doesn't always check for weak symbol resolution before calling the pthread methods. This is why we're seeing a crash now but never did before.
  2. with previous versions of libxml2, nokogiri on truffleruby has never had thread support enabled (because -pthread isn't passed in CFLAGS or LDFLAGS)

For (1) I've submitted a fix upstream at https://gitlab.gnome.org/GNOME/libxml2/-/merge_requests/215

For (2) I'm considering making sure that pthreads are always compiled and linked in libxml2.so, including in TruffleRuby.

@flavorjones
Copy link
Contributor Author

See conversation with libxml2 maintainer at https://gitlab.gnome.org/GNOME/libxml2/-/merge_requests/215 but the TLDR is

  • it doesn't seem possible to do this in a way that works across platforms, and the maintainer is considering removing it
  • I'm removing it from the patched Nokogiri distribution
  • and I'm updating extconf to compile and link with -pthread under TruffleRuby when the library is available (posix platforms) and needed (static linking)

Thanks for your help with this, you definitely helped me arrive at these decisions more quickly.

@flavorjones flavorjones closed this as not planned Won't fix, can't repro, duplicate, stale May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants