csighandler3: try a fallback interpreter if none in TLS #22530

tonycoz · 2024-08-22T00:42:16Z

If an external library, possibly loaded by an XS module, creates a thread, perl has no control over this thread, nor does it create an interpreter for that thread, so if a signal is delivered to that thread my_perl would be NULL, and we would crash trying to safely deliver the signal.

To avoid that uses the saved main interpreter pointer instead.

Since trying to unsafe deliver the signal to the handler from the wrong thread would result in races on the interpreter's data structures I chose not to direct deliver unsafe signals.

Accessing PL_psig_pend[] from the wrong thread isn't great, but to ensure safe handling of that we'd need lockless atomics, which we don't have access to from C99 (and aren't required in C11).

Fixes #22487

If an external library, possibly loaded by an XS module, creates a thread, perl has no control over this thread, nor does it create an interpreter for that thread, so if a signal is delivered to that thread my_perl would be NULL, and we would crash trying to safely deliver the signal. To avoid that uses the saved main interpreter pointer instead. Since trying to unsafe deliver the signal to the handler from the wrong thread would result in races on the interpreter's data structures I chose not to direct deliver unsafe signals. Accessing PL_psig_pend[] from the wrong thread isn't great, but to ensure safe handling of that we'd need lockless atomics, which we don't have access to from C99 (and aren't required in C11). Fixes Perl#22487

Leont · 2024-08-25T09:00:05Z

I'm wondering if a pthread_kill based approach wouldn't be better.

tonycoz · 2024-08-28T01:04:40Z

I'm wondering if a pthread_kill based approach wouldn't be better.

Maybe, but Win32 already does something like my patch, Ctrl-C events are delivered in a separate thread, and PERL_GET_SIG_CONTEXT aka win32_signal_context() sets the the context for the current thread from PL_curinterp and expects Perl_csighandler3 to deal with it.

Right now I don't see a direct way for us to get the main thread ID/handle. That's something I'd need to add to intrpvar.h to use pthread_kill().

bulk88 · 2024-10-17T19:16:59Z

@tonycoz Has anyone ever tried taking a shot at implementing https://en.wikipedia.org/wiki/Dekker%27s_algorithm / spin locks?

CTRL-C rarely but still randomly crashing on blead to this day. vmem.h locks or HeapAlloc related usually. Or some linked list controlled by P5P is half spliced in physical memory, or a piece of the P5P linked list or P5P struct is sitting in a CPU register on OS thread #1 for 1.0 nanoseconds, and OS thread #2 tried a read from phy memory seeing an old value. I have some code I'm preparing to clean up CTRL-C on Win32, and let $SIG{'ALRM'} on Win32 break every frozen I/O call, PP, XS, OS native C code/DLLs, etc.

bulk88 · 2024-10-18T13:46:37Z

CTRL-C on windows perl blead still often crashes, but I have plans to fix it. Makes me wonder if Perl_croak_no_mem() should be PP trapable or not? i can think of reasons for it being PP trapable, but in this call stack, it only lets more corruption happen in the thread race

 	[Inline Frame] perl541.dll!Perl_cx_popsub_common(interpreter *) Line 3394	C
 	perl541.dll!Perl_cx_popsub(interpreter * my_perl, context * cx) Line 3438	C
 	perl541.dll!Perl_dounwind(interpreter * my_perl, long cxix) Line 1740	C
 	perl541.dll!S_my_exit_jump(interpreter * my_perl) Line 5526	C
 	perl541.dll!Perl_my_exit(interpreter * my_perl, unsigned long status) Line 5406	C
	perl541.dll!Perl_croak_no_mem_ext(const char * context, unsigned int len) Line 1988	C
 	perl541.dll!Perl_safesysrealloc(void * where, unsigned int size) Line 360	C
 	perl541.dll!Perl_sv_grow(interpreter * my_perl, sv * const sv, unsigned int newlen) Line 1425	C
 	perl541.dll!S_sv_catpvn_simple(interpreter * my_perl, sv * const sv, const char * const buf, const unsigned int len) Line 11418	C
 	perl541.dll!Perl_sv_vcatpvfn_flags(interpreter * my_perl, sv * const sv, const char * const pat, const unsigned int patlen, char * * const args, sv * * const svargs, const unsigned int sv_count, bool * const maybe_tainted, const unsigned long flags) Line 12426	C
 	perl541.dll!Perl_sv_vsetpvfn(interpreter * my_perl, sv * const sv, const char * const pat, const unsigned int patlen, char * * const args, sv * * const svargs, const unsigned int sv_count, bool * const maybe_tainted) Line 11400	C
 	perl541.dll!Perl_vmess(interpreter * my_perl, const char * pat, char * * args) Line 1673	C
 	perl541.dll!Perl_vwarn(interpreter * my_perl, const char * pat, char * * args) Line 2214	C
 	perl541.dll!Perl_warn(interpreter * my_perl, const char * pat, ...) Line 2260	C
 	perl541.dll!sig_terminate(interpreter * my_perl, int sig) Line 2748	C
 	perl541.dll!win32_ctrlhandler(unsigned long dwCtrlType) Line 5142	C
 	kernel32.dll!_CtrlRoutine@4�()	Unknown
 	kernel32.dll!@BaseThreadInitThunk@12�()	Unknown
 	ntdll.dll!___RtlUserThreadStart@8�()	Unknown
 	ntdll.dll!__RtlUserThreadStart@8�()	Unknown

iabyn · 2024-10-21T09:21:58Z

On Fri, Oct 18, 2024 at 06:47:08AM -0700, bulk88 wrote: Makes me wonder if ```Perl_croak_no_mem()``` should be PP trapable or not?

What do you mean by "PP trapable"?

…

-- A power surge on the Bridge is rapidly and correctly diagnosed as a faulty capacitor by the highly-trained and competent engineering staff. -- Things That Never Happen in "Star Trek" #9

tonycoz · 2024-10-22T04:27:20Z

Was that backtrace with a perl signal handler set for SIGINT?

That looks like it's calling the emulation done when no signal handler is set, which while a problem, is a different problem. (sig_terminate() should probably be calling _write() or maybe WriteFile(), not Perl_warn()).

bulk88 · 2024-10-22T06:53:02Z

On Fri, Oct 18, 2024 at 06:47:08AM -0700, bulk88 wrote: Makes me wonder if Perl_croak_no_mem() should be PP trapable or not?
What do you mean by "PP trapable"?

Here is file "c.pl", please RUN ON 32B perl ONLY to hit 2GB limit quickly.

use Inline 'C';
package sumMod;
sub DESTROY {
  print "my dtor\n";
}
sub new {
  my $s = 1;
  $o = \$s;
  return bless($o,'sumMod')
}
package main;

my $o = sumMod->new();
my $str = "ABCD";
diemem($str);

__END__
__C__

void diemem(SV* sv) {
  while(1) {
    SvGROW(sv, (SvLEN(sv)+0)*2);
  }
}

C:\sources\plxs>perl c.pl
Out of memory!
my dtor

C:\sources\plxs>

Why did my PP DESTROY dtor execute?

I did try above code with PP eval{} block, not string, and NO YOU CAN NOT PP eval{} trap and resume execution after perl's OOM C/XS croak, but you can run PLENTY of unlimited wall time, unlimited CPU and unlimited amounts of P5 PP source code, AFTER perl's OOM C/XS croak's function call executed, as I showed above. And while that DESTROY sub is simple, a production app/module, could try writing JSON to disk in a DESTROY sub after Perl_croak_no_mem_ext() executes, or decide to defragment a SQL DB on "closure"/dtor/DESTROY of the SQL DB "meta" object/handle/reference, after Perl_croak_no_mem_ext() executes, or a server type PP app, keeps the I/O loop pumping data to clients in a DESTROY() for a few more seconds until all previous clients get full HTTP responses before returning from DESTROY.

The SQL "meta handle" implemented in pure perl, XS/C, or as an IPC IP socket. Doesn't matter, just massive I/O and massive amount of pure perl statements (PP op count) will now execute in that DESTROY {} sub. Maybe even calling require in DESTROY. Now that Perl core friendly "soft error", became a hard error SEGV and the Win32 OS SEGV popup box.

 	perl541.dll!Perl_my_exit(interpreter * my_perl, unsigned long status) Line 5406	C
	perl541.dll!Perl_croak_no_mem_ext(const char * context, unsigned int len) Line 1988	C
 	perl541.dll!Perl_safesysrealloc(void * where, unsigned int size) Line 360	C
 	perl541.dll!Perl_sv_grow(interpreter * my_perl, sv * const sv, unsigned int newlen) Line 1425

For linux people, assume libc_realloc() returned NULL because libc_realloc() detected heap corruption and will keep returning NULL randomly and intermittently, or malloc will return NULL even for malloc(1), for the rest of the process lifespan. This was NOT an actual OOM in terms of ps or burning up GBs of page file space.

bulk88 · 2024-10-22T07:52:29Z

I see only 1.5 solutions to signals/Win32 CTRL-C/signal CB fires on a random no-``my_perl``` thread from OS thread pool problem.

Step 1, from random thread SuspendThread() against official my_perl tid
Step 2 then ask OS for a CPU register context dump of my_perl tid thread
Step 3 if official thread is stuck in kernel mode on a syscall, we can safely seize the my_perl from old official tid, reregister my_perl into TLS aka PERL_SET_CONTEXT, and dispatch the signal/event into the interp.

Step 3A if suspended thread CPU execution pointer in official my_perl tid, is not at the kernel mode call gate, we could, medium danger, reregister my_perl into TLS aka PERL_SET_CONTEXT, and dispatch the signal/event into the interp. But what if old official TID, was suspended inside malloc with a half spliced linked list? malloc will never ever work again.

Weird proposal, check my_perl->Isig_pending semi-flag inside each Newx/Safefree? for "risk signals" api, as compared to "safe signals" API (only between each PL OP)

Other idea, random OS thread pool signal/event callback thread, sets my_perl->Isig_pending, then thread pool OS starts a
15 or 30 ms timer (Win32 kernel timeslicer unit/NUMA weirdness). The official my_perl TID, has 15/30 ms, to talk back/mutex/semaphore event trip, back to random OS thread that official my_perl TID will execute the signal (but PP/XS/p5p C re-hang risk), or official my_perl TID has 15/30 ms to signal it finished dispatching the signal and its 1 statement from libc exit(),

If 15/30 ms timer expires in thread pool thread, with no answer from my_perl in official TID, do a libc only no perlIO write or printf and call exit, game over, interp lost, it is stuck in a PP or XS or P5P infinite loop.

Another step 3A, if timer expires in thread pool thread, ask OS for cpu usage time on official my_perl TID, if it didn't change, SuspendThread and seize my_perl, do PERL_SET_CONTEXT, dispatch signal into interp.

Win NT Kernel after Vista or after Win 8, or Win 10, has some drama with CPU usage I've read about, and I think my unsubmitted code on another FOSS project hit it, Basically different NUMA clusters cache or keep thread CPU time in a per-core or per NUMA cluster, or per 64 core unit, and dont update the root CPU core's (aka Kernel's global struct) for multiple 15 ms quantums or like 100 ms, when the granularity is supposed to 15 ms to avoid lock contention. Plus a couple kernel optimizations, or certain driver and I/O calls, since those drivers "burn CPU" or force assign CPU time to the user mode thread for accounting reasons, those kernel calls don't update the thread cpu usage, only actually switching threads and swapping page tables to another process, triggers an update of the global struct with thread CPU time. This might be an edge case tho for perl signal reasons that doesn't matter 15 ms or 100 ms. Both are instant to human eye.

(legacy Win NT Kernel API problems from 1980s, thread to core pinning API, accepts a 64 bit MASK, not an var len array of core

tonycoz · 2024-10-28T05:09:37Z

CTRL-C on windows perl blead still often crashes,

From what I can tell that's the default handler that emulates default SIGINT behaviour.

The problem is we call Perl_warn() from a what is effectively a signal thrown in a fresh thread - it's even less safe than Perl's old unsafe signals on POSIX-like systems (ie. very not safe.)

It populates THX with the main thread's interpreter which is still running at this point, and unlike this PR, does some slightly to very complex things with the state of the interpreter (slightly: allocates new SVs, very: if STDERR is tied, invoke it's PRINT callback, or if __WARN__ is hooked, invokes that hook).

If sig_terminate is going to do more than set a flag or send a window message it needs to work without touching the interpreter state.

Leont · 2024-10-31T19:49:08Z

Right now I don't see a direct way for us to get the main thread ID/handle. That's something I'd need to add to intrpvar.h to use pthread_kill().

That could be set during initialization, right? That doesn't sound like much of a problem?

tonycoz · 2024-11-05T02:59:43Z

CTRL-C on windows perl blead still often crashes,

For anyone wondering this is #13596

This is only done for pthreads, Win32 already uses something like my suggestion from Perl#22530 and unlike POSIX doesn't have a way to asynchronously interrupt a thread that I'm aware of. It's also complicated by pseudo-processes. Fixes Perl#22487

This is only done for pthreads, Win32 already uses something like my suggestion from #22530 and unlike POSIX doesn't have a way to asynchronously interrupt a thread that I'm aware of. It's also complicated by pseudo-processes. Fixes #22487

This is only done for pthreads, Win32 already uses something like my suggestion from Perl#22530 and unlike POSIX doesn't have a way to asynchronously interrupt a thread that I'm aware of. It's also complicated by pseudo-processes. Fixes Perl#22487

tonycoz requested a review from Leont August 22, 2024 00:42

tonycoz mentioned this pull request Nov 18, 2024

csighandler3: forward signals to the main thread if not a perl thread #22758

Merged

github-actions bot added the hasConflicts label Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csighandler3: try a fallback interpreter if none in TLS #22530

csighandler3: try a fallback interpreter if none in TLS #22530

tonycoz commented Aug 22, 2024

Leont commented Aug 25, 2024

tonycoz commented Aug 28, 2024

bulk88 commented Oct 17, 2024

bulk88 commented Oct 18, 2024

iabyn commented Oct 21, 2024 via email

tonycoz commented Oct 22, 2024

bulk88 commented Oct 22, 2024 •

edited

Loading

bulk88 commented Oct 22, 2024

tonycoz commented Oct 28, 2024

Leont commented Oct 31, 2024

tonycoz commented Nov 5, 2024

csighandler3: try a fallback interpreter if none in TLS #22530

Are you sure you want to change the base?

csighandler3: try a fallback interpreter if none in TLS #22530

Conversation

tonycoz commented Aug 22, 2024

Leont commented Aug 25, 2024

tonycoz commented Aug 28, 2024

bulk88 commented Oct 17, 2024

bulk88 commented Oct 18, 2024

iabyn commented Oct 21, 2024 via email

tonycoz commented Oct 22, 2024

bulk88 commented Oct 22, 2024 • edited Loading

bulk88 commented Oct 22, 2024

tonycoz commented Oct 28, 2024

Leont commented Oct 31, 2024

tonycoz commented Nov 5, 2024

bulk88 commented Oct 22, 2024 •

edited

Loading