From 04aa9b8324b03a6326a8ca89fefa71e723b0d66b Mon Sep 17 00:00:00 2001 From: Timothy Elgersma Date: Fri, 15 Dec 2023 11:33:25 -0500 Subject: [PATCH] [BACKPORT 2.18.5][#20309] YSQL: Fix segmentation fault in webserver sighup handler at cleanup Summary: Original commit: D31071 / af01dc070fbf166d059f090af8366f17fa922305 Short version: `MyLatch` is safe to use at anytime, but `MyProc->procLatch` refers to a shared latch that may not exist at startup and shut down. We don't worry about the startup case, because exit handlers have not been registered yet. = Background = === Latch Context === ```lang=c, name=latch.h /* ... * There are two kinds of latches: local and shared. A local latch is * initialized by InitLatch, and can only be set from the same process. * A local latch can be used to wait for a signal to arrive, by calling * SetLatch in the signal handler. A shared latch resides in shared memory, * and must be initialized at postmaster startup by InitSharedLatch. Before * a shared latch can be waited on, it must be associated with a process * with OwnLatch. Only the process owning the latch can wait on it, but any * process can set it. ... ``` === Process Life Cycle === A process is forked by the postmaster. Inside the new process, we set the values in the following order: 1. `InitPostmasterChild` sets `MyLatch` to a local latch object. 2. `InitProcess` sets `MyProc` to a non-null value from the free proc list 3. `InitProcess` calls `OwnLatch(MyProc->procLatch)`, allowing itself to wait on the latch of the PGPROC struct. 4. `InitProcess` switches the value of`MyLatch` to `MyProc->procLatch` from the local latch. Since the PGPROC list is shared, other processes can send signals to this process now. 5. The process sets up its signal handlers. In most cases (including the webserver), these involve setting the process's own latch to cause it to enter normal execution again. 6. **The process runs normally, waiting on the latch for new events. ** 7. When the process begins to shut down, we enter the cleanup phase in `ProcKill`. (This runs under normal circumstances, even though the name is aggressive) 8. The process sets its `MyLatch` to the local latch, because its about to release it's shared latch. 9. The process sets its `MyProc` to NULL. 10. The process disowns the shared latch. = Problem = If the process receives a SIGHUP signal... * before (5) above, nothing happens - no signal handlers are registered. * during normal execution, `MyProc->procLatch` and `MyLatch` both point to the latch of the PGPROC struct. Both work identically. * during cleanup, after `MyProc` was set to null: `MyLatch` is still valid to set and wait on (points to its local latch) but `MyProc->procLatch` causes a segmentation fault. The solution is to use `MyLatch` in all circumstances, so that we allow the process life cycle to control whether we use the shared latch or the local latch. PG16 has very few references to `MyProc->procLatch`. Test Plan: 1. Apply the patch ```lang=diff diff --git a/src/postgres/src/backend/storage/lmgr/proc.c b/src/postgres/src/backend/storage/lmgr/proc.c index eedb2057ea..c4529af546 100644 --- a/src/postgres/src/backend/storage/lmgr/proc.c +++ b/src/postgres/src/backend/storage/lmgr/proc.c @@ -885,6 +885,7 @@ ProcKill(int code, Datum arg) SwitchBackToLocalLatch(); proc = MyProc; MyProc = NULL; + pg_usleep(10 * 1000 * 1000); DisownLatch(&proc->procLatch); ReleaseProcToFreeList(proc); @@ -995,6 +996,7 @@ AuxiliaryProcKill(int code, Datum arg) * latch. */ SwitchBackToLocalLatch(); proc = MyProc; MyProc = NULL; + pg_usleep(10 * 1000 * 1000); DisownLatch(&proc->procLatch); ``` 2. Get the webserver PID `WEBSERVER_PID=$(pgrep -f "YSQL webserver")` 3. Send a SIGTERM to the webserver - this will cause it to shut itself down, but sleep for 10s during shut down. `kill -SIGTERM $WEBSERVER_PID` 4. While it's sleeping, send a SIGHUP `kill -SIGHUP $WEBSERVER_PID` 5. Before this change, a segmentation fault would occur - MyProc was null. After this change, we use `MyLatch` which has been switched to use a local latch instead of the shared `PGPROC` `procLatch`, so the SIGHUP completes successfully. Reviewers: amartsinchyk, kfranz, smishra Reviewed By: smishra Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31073 --- src/postgres/contrib/yb_pg_metrics/yb_pg_metrics.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/postgres/contrib/yb_pg_metrics/yb_pg_metrics.c b/src/postgres/contrib/yb_pg_metrics/yb_pg_metrics.c index 57230508006..1aec9da6fc3 100644 --- a/src/postgres/contrib/yb_pg_metrics/yb_pg_metrics.c +++ b/src/postgres/contrib/yb_pg_metrics/yb_pg_metrics.c @@ -342,7 +342,7 @@ ws_sighup_handler(SIGNAL_ARGS) { int save_errno = errno; got_SIGHUP = true; - SetLatch(&MyProc->procLatch); + SetLatch(MyLatch); errno = save_errno; } @@ -416,8 +416,8 @@ webserver_worker_main(Datum unused) int rc; while (!got_SIGTERM) { - rc = WaitLatch(&MyProc->procLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1, PG_WAIT_EXTENSION); - ResetLatch(&MyProc->procLatch); + rc = WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1, PG_WAIT_EXTENSION); + ResetLatch(MyLatch); if (rc & WL_POSTMASTER_DEATH) break;