Skip to content

Commit

Permalink
pgstat: store statistics in shared memory.
Browse files Browse the repository at this point in the history
Previously the statistics collector received statistics updates via UDP and
shared statistics data by writing them out to temporary files regularly. These
files can reach tens of megabytes and are written out up to twice a
second. This has repeatedly prevented us from adding additional useful
statistics.

Now statistics are stored in shared memory. Statistics for variable-numbered
objects are stored in a dshash hashtable (backed by dynamic shared
memory). Fixed-numbered stats are stored in plain shared memory.

The header for pgstat.c contains an overview of the architecture.

The stats collector is not needed anymore, remove it.

By utilizing the transactional statistics drop infrastructure introduced in a
prior commit statistics entries cannot "leak" anymore. Previously leaked
statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On
systems with many small relations pgstat_vacuum_stat() could be quite
expensive.

Now that replicas drop statistics entries for dropped objects, it is not
necessary anymore to reset stats when starting from a cleanly shut down
replica.

Subsequent commits will perform some further code cleanup, adapt docs and add
tests.

Bumps PGSTAT_FILE_FORMAT_ID.

Author: Kyotaro Horiguchi <[email protected]>
Author: Andres Freund <[email protected]>
Author: Melanie Plageman <[email protected]>
Reviewed-By: Andres Freund <[email protected]>
Reviewed-By: Thomas Munro <[email protected]>
Reviewed-By: Justin Pryzby <[email protected]>
Reviewed-By: "David G. Johnston" <[email protected]>
Reviewed-By: Tomas Vondra <[email protected]> (in a much earlier version)
Reviewed-By: Arthur Zakirov <[email protected]> (in a much earlier version)
Reviewed-By: Antonin Houska <[email protected]> (in a much earlier version)
Discussion: https://postgr.es/m/[email protected]
Discussion: https://postgr.es/m/[email protected]
Discussion: https://postgr.es/m/[email protected]
  • Loading branch information
anarazel committed Apr 7, 2022
1 parent be902e2 commit 5891c7a
Show file tree
Hide file tree
Showing 50 changed files with 4,253 additions and 5,343 deletions.
19 changes: 14 additions & 5 deletions doc/src/sgml/monitoring.sgml
Original file line number Diff line number Diff line change
Expand Up @@ -1110,10 +1110,6 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry><literal>LogicalLauncherMain</literal></entry>
<entry>Waiting in main loop of logical replication launcher process.</entry>
</row>
<row>
<entry><literal>PgStatMain</literal></entry>
<entry>Waiting in main loop of statistics collector process.</entry>
</row>
<row>
<entry><literal>RecoveryWalStream</literal></entry>
<entry>Waiting in main loop of startup process for WAL to arrive, during
Expand Down Expand Up @@ -2115,6 +2111,18 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
<entry>Waiting to access the list of predicate locks held by
serializable transactions.</entry>
</row>
<row>
<entry><literal>PgStatsDSA</literal></entry>
<entry>Waiting for stats dynamic shared memory allocator access</entry>
</row>
<row>
<entry><literal>PgStatsHash</literal></entry>
<entry>Waiting for stats shared memory hash table access</entry>
</row>
<row>
<entry><literal>PgStatsData</literal></entry>
<entry>Waiting for shared memory stats data access</entry>
</row>
<row>
<entry><literal>SerializableXactHash</literal></entry>
<entry>Waiting to read or update information about serializable
Expand Down Expand Up @@ -5142,7 +5150,8 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
<returnvalue>timestamp with time zone</returnvalue>
</para>
<para>
Returns the timestamp of the current statistics snapshot.
Returns the timestamp of the current statistics snapshot, or NULL if
no statistics snapshot has been taken.
</para></entry>
</row>

Expand Down
39 changes: 27 additions & 12 deletions src/backend/access/transam/xlog.c
Original file line number Diff line number Diff line change
Expand Up @@ -1842,7 +1842,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
WriteRqst.Flush = 0;
XLogWrite(WriteRqst, tli, false);
LWLockRelease(WALWriteLock);
WalStats.m_wal_buffers_full++;
PendingWalStats.wal_buffers_full++;
TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
}
/* Re-acquire WALBufMappingLock and retry */
Expand Down Expand Up @@ -2200,10 +2200,10 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)

INSTR_TIME_SET_CURRENT(duration);
INSTR_TIME_SUBTRACT(duration, start);
WalStats.m_wal_write_time += INSTR_TIME_GET_MICROSEC(duration);
PendingWalStats.wal_write_time += INSTR_TIME_GET_MICROSEC(duration);
}

WalStats.m_wal_write++;
PendingWalStats.wal_write++;

if (written <= 0)
{
Expand Down Expand Up @@ -4877,6 +4877,7 @@ StartupXLOG(void)
XLogCtlInsert *Insert;
CheckPoint checkPoint;
bool wasShutdown;
bool didCrash;
bool haveTblspcMap;
bool haveBackupLabel;
XLogRecPtr EndOfLog;
Expand Down Expand Up @@ -4994,7 +4995,10 @@ StartupXLOG(void)
{
RemoveTempXlogFiles();
SyncDataDirectory();
didCrash = true;
}
else
didCrash = false;

/*
* Prepare for WAL recovery if needed.
Expand Down Expand Up @@ -5106,6 +5110,22 @@ StartupXLOG(void)
*/
restoreTwoPhaseData();

/*
* When starting with crash recovery, reset pgstat data - it might not be
* valid. Otherwise restore pgstat data. It's safe to do this here,
* because postmaster will not yet have started any other processes.
*
* NB: Restoring replication slot stats relies on slot state to have
* already been restored from disk.
*
* TODO: With a bit of extra work we could just start with a pgstat file
* associated with the checkpoint redo location we're starting from.
*/
if (didCrash)
pgstat_discard_stats();
else
pgstat_restore_stats();

lastFullPageWrites = checkPoint.fullPageWrites;

RedoRecPtr = XLogCtl->RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
Expand Down Expand Up @@ -5180,11 +5200,6 @@ StartupXLOG(void)
LocalMinRecoveryPointTLI = 0;
}

/*
* Reset pgstat data, because it may be invalid after recovery.
*/
pgstat_reset_all();

/* Check that the GUCs used to generate the WAL allow recovery */
CheckRequiredParameterValues();

Expand Down Expand Up @@ -6081,8 +6096,8 @@ LogCheckpointEnd(bool restartpoint)
CheckpointStats.ckpt_sync_end_t);

/* Accumulate checkpoint timing summary data, in milliseconds. */
PendingCheckpointerStats.m_checkpoint_write_time += write_msecs;
PendingCheckpointerStats.m_checkpoint_sync_time += sync_msecs;
PendingCheckpointerStats.checkpoint_write_time += write_msecs;
PendingCheckpointerStats.checkpoint_sync_time += sync_msecs;

/*
* All of the published timing statistics are accounted for. Only
Expand Down Expand Up @@ -8009,10 +8024,10 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)

INSTR_TIME_SET_CURRENT(duration);
INSTR_TIME_SUBTRACT(duration, start);
WalStats.m_wal_sync_time += INSTR_TIME_GET_MICROSEC(duration);
PendingWalStats.wal_sync_time += INSTR_TIME_GET_MICROSEC(duration);
}

WalStats.m_wal_sync++;
PendingWalStats.wal_sync++;
}

/*
Expand Down
7 changes: 0 additions & 7 deletions src/backend/commands/vacuum.c
Original file line number Diff line number Diff line change
Expand Up @@ -351,13 +351,6 @@ vacuum(List *relations, VacuumParams *params,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("PROCESS_TOAST required with VACUUM FULL")));

/*
* Send info about dead objects to the cumulative stats system, unless
* we are in autovacuum --- autovacuum.c does this for itself.
*/
if ((params->options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
pgstat_vacuum_stat();

/*
* Create special memory context for cross-transaction storage.
*
Expand Down
2 changes: 2 additions & 0 deletions src/backend/commands/vacuumparallel.c
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,15 @@

#include "access/amapi.h"
#include "access/table.h"
#include "access/xact.h"
#include "catalog/index.h"
#include "commands/vacuum.h"
#include "optimizer/paths.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
#include "tcop/tcopprot.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"

/*
* DSM keys for parallel vacuum. Unlike other parallel execution code, since
Expand Down
Loading

0 comments on commit 5891c7a

Please sign in to comment.