Skip to content

Commit

Permalink
[BACKPORT 2.18.5][#20309] YSQL: Fix segmentation fault in webserver s…
Browse files Browse the repository at this point in the history
…ighup handler at cleanup

Summary:
Original commit: D31071 / af01dc0

Short version: `MyLatch` is safe to use at anytime, but `MyProc->procLatch` refers to a shared latch that may not exist at startup and shut down. We don't worry about the startup case, because exit handlers have not been registered yet.

= Background =
=== Latch Context ===
```lang=c, name=latch.h
/*
...
 * There are two kinds of latches: local and shared. A local latch is
 * initialized by InitLatch, and can only be set from the same process.
 * A local latch can be used to wait for a signal to arrive, by calling
 * SetLatch in the signal handler. A shared latch resides in shared memory,
 * and must be initialized at postmaster startup by InitSharedLatch. Before
 * a shared latch can be waited on, it must be associated with a process
 * with OwnLatch. Only the process owning the latch can wait on it, but any
 * process can set it.
...
```

=== Process Life Cycle ===
A process is forked by the postmaster. Inside the new process, we set the values in the following order:
1. `InitPostmasterChild` sets `MyLatch` to a local latch object.
2. `InitProcess` sets `MyProc` to a non-null value from the free proc list
3. `InitProcess` calls `OwnLatch(MyProc->procLatch)`, allowing itself to wait on the latch of the PGPROC struct.
4. `InitProcess` switches the value of`MyLatch` to `MyProc->procLatch` from the local latch. Since the PGPROC list is shared, other processes can send signals to this process now.
5. The process sets up its signal handlers. In most cases (including the webserver), these involve setting the process's own latch to cause it to enter normal execution again.
6. **The process runs normally, waiting on the latch for new events. **
7. When the process begins to shut down, we enter the cleanup phase in `ProcKill`. (This runs under normal circumstances, even though the name is aggressive)
8. The process sets its `MyLatch` to the local latch, because its about to release it's shared latch.
9. The process sets its `MyProc` to NULL.
10. The process disowns the shared latch.

= Problem =
If the process receives a SIGHUP signal...
* before (5) above, nothing happens - no signal handlers are registered.
* during normal execution, `MyProc->procLatch` and `MyLatch` both point to the latch of the PGPROC struct. Both work identically.
* during cleanup, after `MyProc` was set to null: `MyLatch` is still valid to set and wait on (points to its local latch) but `MyProc->procLatch` causes a segmentation fault.

The solution is to use `MyLatch` in all circumstances, so that we allow the process life cycle to control whether we use the shared latch or the local latch. PG16 has very few references to `MyProc->procLatch`.

Test Plan:
1. Apply the patch
```lang=diff
diff --git a/src/postgres/src/backend/storage/lmgr/proc.c b/src/postgres/src/backend/storage/lmgr/proc.c
index eedb2057ea..c4529af546 100644
--- a/src/postgres/src/backend/storage/lmgr/proc.c
+++ b/src/postgres/src/backend/storage/lmgr/proc.c
@@ -885,6 +885,7 @@ ProcKill(int code, Datum arg)
        SwitchBackToLocalLatch();
        proc = MyProc;
        MyProc = NULL;
+       pg_usleep(10 * 1000 * 1000);
        DisownLatch(&proc->procLatch);

        ReleaseProcToFreeList(proc);
@@ -995,6 +996,7 @@ AuxiliaryProcKill(int code, Datum arg)
         * latch.
         */
        SwitchBackToLocalLatch();
        proc = MyProc;
        MyProc = NULL;
+       pg_usleep(10 * 1000 * 1000);
        DisownLatch(&proc->procLatch);
```
2. Get the webserver PID `WEBSERVER_PID=$(pgrep -f "YSQL webserver")`
3. Send a SIGTERM to the webserver - this will cause it to shut itself down, but sleep for 10s during shut down. `kill -SIGTERM $WEBSERVER_PID`
4. While it's sleeping, send a SIGHUP `kill -SIGHUP $WEBSERVER_PID`
5. Before this change, a segmentation fault would occur - MyProc was null. After this change, we use `MyLatch` which has been switched to use a local latch instead of the shared `PGPROC` `procLatch`, so the SIGHUP completes successfully.

Reviewers: amartsinchyk, kfranz, smishra

Reviewed By: smishra

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D31073
  • Loading branch information
timothy-e committed Dec 15, 2023
1 parent 36eca16 commit 04aa9b8
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions src/postgres/contrib/yb_pg_metrics/yb_pg_metrics.c
Original file line number Diff line number Diff line change
Expand Up @@ -342,7 +342,7 @@ ws_sighup_handler(SIGNAL_ARGS) {
int save_errno = errno;

got_SIGHUP = true;
SetLatch(&MyProc->procLatch);
SetLatch(MyLatch);

errno = save_errno;
}
Expand Down Expand Up @@ -416,8 +416,8 @@ webserver_worker_main(Datum unused)
int rc;
while (!got_SIGTERM)
{
rc = WaitLatch(&MyProc->procLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1, PG_WAIT_EXTENSION);
ResetLatch(&MyProc->procLatch);
rc = WaitLatch(MyLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1, PG_WAIT_EXTENSION);
ResetLatch(MyLatch);

if (rc & WL_POSTMASTER_DEATH)
break;
Expand Down

0 comments on commit 04aa9b8

Please sign in to comment.