This extension for PostgreSQL helps you to find out most resource intensive activities in your PostgreSQL databases.
This extension is based on standard statistics views of PostgreSQL. It is written in pl/pgsql and doesn't need any external libraries or software, but PostgreSQL database itself, and a cron-like tool performing periodic tasks. Initially developed and tested on PostgreSQL 9.6, and may be incompatible with earlier releases.
Some sort of historic repository will be created in your database by this extension. This repository will hold statistics "snapshots" for your databases, just as Oracle AWR do. Snapshot is taken by calling snapshot() function. PostgreSQL doesn't have any job-like engine, so you'll need to use cron.
Periodic snapshots can help you finding most resource intensive statements in the past. Suppose, you were reported performance degradation several hours ago. No problem, you can build a report between several snapshots to see load profile of your database between snapshots. It's worse using some monitoring tool such a Zabbix to know exact time when performance issues was happening.
Of course, you can make an explicit snapshot before running any batch processing, and after it will end.
Any time you making a snapshot, pg_stat_statements_reset() will be called, ensuring you will not loose statements due to reaching value of pg_stat_statements.max. Also, report will contain section, informing you if captured statements count in any snapshot reaches 90% of pg_stat_statements.max.
Since version 0.0.6 extension, installed in one cluster is able to collect statistics from several clusters, called nodes. You just need to define some nodes, providing names and connection strings and make sure connection can be established. Now you can track statistics on your standbys from master, or from any other node. Collecting snapshot statistics, extension will connect to all databases on all nodes.
Extensions, you'll need:
- pg_stat_statements (for collecting statements stats) - see PostgreSQL documentation for instructions. Optional, but highly recommended on monitored nodes. Without this extension statement statistics will be unavailable.
- dblink (for collecting object stats from nodes and databases)
Ensure you set statistics collecting parameters:
track_activities = on
track_counts = on
track_io_timing = on
track_functions = on
And pg_stat_statements parameters (your values may differ):
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 1000
pg_stat_statements.track = 'top'
pg_stat_statements.save = off
You must install and use pg_profile extension as cluster superuser (for example, postgres), because only superusers can see all statements in pg_stat_statements view. And user, that will make snapshots must be able to login in any database of your cluster without providing a password. Dblink is used for collecting object statistics. Peer authentication preferred - make sure your pg_hba.conf allows this.
There are two ways for installing extension files:
- Makefile provided, so you can use
# make install
But for it to work, you'll need develop-packages of PostgreSQL.
- Or you can just manual copy files of extension (pg_profile*) to PostgreSQL extensions location, which is
# cp pg_profile* `pg_config --sharedir`/extension
Most easy way is to install everything in public schema of postgres database:
postgres=# CREATE EXTENSION dblink;
postgres=# CREATE EXTENSION pg_stat_statements;
postgres=# CREATE EXTENSION pg_profile;
If you want to install pg_profile in other schema, just create it, and install extension in that schema (pg_stat_statements is recommended to be in public schema):
postgres=# CREATE EXTENSION dblink;
postgres=# CREATE EXTENSION pg_stat_statements;
postgres=# CREATE SCHEMA profile;
postgres=# CREATE EXTENSION pg_profile SCHEMA profile;
All objects will be created in schema, defined by SCHEMA clause. Installation in dedicated schema is recommended way - extension will create its own tables, views, sequences and functions. It is a good idea to keep them separate. If you do not want to specify schema qualifier when using module, consider changing search_path setting.
New versions of pg_profile will contain all necessary to update from any previous one. So, in case of update you will only need to install extension files (see Step 1) and update the extension, like this:
postgres=# ALTER EXTENSION pg_profile UPDATE;
All your historic data will remain unchanged if possible.
You can define extension parameters like any other parameter in postgresql.conf. Default values a shown in following list:
- pg_profile.topn = 20 - Number of top objects (statements, relations, etc.), to be reported in each sorted report table. Also, this parameter affects size of a snapshot - the more objects you want appear in your report, the more objects we need to keep in a snapshot.
- pg_profile.retention = 7 - Retention time of snapshots in days. Snapshots, aged pg_profile.retention days and more will be automatically deleted on next snapshot() call.
Once installed, extension will create "local" node - this is cluster, where extension is installed. Nodes management functions:
- node_new(node, connstr, node_enabled = true) Creates a new node with node name (must be unique) and connection string. If enabled is set, node will be included in common snapshot() call.
- node_drop(node) Drops a node and all its snapshots.
- node_enable(node) Includes node in common snapshot() call
- node_disable(node) Excludes node from common snapshot() call
- node_rename(node, new_name) Renames a node
- node_connstr(node, new_connstr) Define a new connection string for a node
- node_show() Display existing nodes
Node creation example:
SELECT profile.node_new('omega','host=name_or_ip dbname=postgres port=5432');
You must create at least 3 snapshots to be able to build your first report between 2nd and 3rd snapshot. Snapshots for all enabled nodes are taken by calling snapshot() function. If you want to create snapshot for only one node, call snapshot(node_name). I'm using cron of user postgres like this:
*/30 * * * * psql -c 'SELECT profile.snapshot()' > /dev/null 2>&1
Schema qualifier can be omitted, if extension installed in public schema:
*/30 * * * * psql -c 'SELECT snapshot()' > /dev/null 2>&1
Function will return 'OK' if snapshots for all enabled nodes are created:
$ psql -c 'SELECT profile.snapshot()'
snapshot
----------
OK
(1 row)
In case of one or several nodes fail, function will inform you. All other snapshots will be created. To display existing snapshots of local node, you call function snapshot_show(days integer), for any other node, provide it's name: snapshot_show(node name, days integer). The days parameter defines interval in days in the past for displaying snapshots. This parameter may be omitted (or set to null) for displaying all snapshots:
postgres=# select * from profile.snapshot_show(1);
snapshot | date_time
----------+------------------------
...
480 | 2017-09-11 10:00:01+03
481 | 2017-09-11 10:30:01+03
482 | 2017-09-11 11:00:01+03
483 | 2017-09-11 11:30:02+03
484 | 2017-09-11 12:00:02+03
485 | 2017-09-11 12:30:01+03
486 | 2017-09-11 13:00:02+03
(48 rows)
You build a local report in HTML format using two snap_id's:
$ psql -qtc "SELECT profile.report(480,482)" -o report_480_482.html
For any other node, use it's name:
$ psql -qtc "SELECT profile.report('omega',12,14)" -o report_omega_12_14.html
Now you can view report file in any web browser.
This report will contain several tables describing database cluster load profile. Here you will find queries, with most time elapsed, most gets, most reads, I/O waits, and so on. You will see per database statistics, such as hit ratio, calls, reads, and so on. There will be statistics for database objects - most DML-intensive tables, most scanned tables, most growth tables, and so on. Finally, you will be reported on most readed tables and indexes.
The report will contain following sections:
- Cluster statistics
- Databases stats
- Statements stats by database
- Cluster stats
- SQL Query stats
- Top SQL by elapsed time
- Top SQL by executions
- Top SQL by I/O wait time
- Top SQL by gets
- Top SQL by temp usage
- Complete List of SQL Text
- Schema objects stats
- Most scanned tables
- Top DML tables
- Top Delete/Update tables with vacuum run count
- Top growing tables
- Top growing indexes
- Unused indexes
- I/O Schema objects stats
- Top tables by I/O
- Top indexes by I/O
- User function stats
- Top functions by total time
- Top functions by executions
- Vacuum related stats
- Tables ordered by dead tuples ratio
- Tables ordered by modified tuples ratio
You can see simple report example in report_examples folder of this repo.
Baseline is a snapshot sequence, excluded from default snapshot retention policy. Each baseline can have individual retention period, or does not have it at all - in this case baseline won't deleted automatically. You can use baselines when you want to save information about database workload on certain time period. For example, you may want to save snapshots, gathered during load testing, or during regular load on your system. Since version 0.0.7 it is possible to build compare-report for two time intervals, and you can use baseline as interval. Baseline functions:
- baseline_new(node name, baseline_name varchar(25), start_id integer, end_id integer, days integer) - create a baseline
- node - node name.
- name - baseline name. Each baseline must have unique name.
- start_id, end_id - first and last snapshots, included in baseline.
- days - baseline retention time. Defined in days since now(). This parameter may be omitted (or be set to null), meaning infinite retention.
- baseline_drop(node name, name varchar(25)) - drop a baseline
- node - node name.
- name - baseline name to drop. This parameter may be omitted (or be set to null), in this case all existing baselines will be dropped. Dropping a baseline does not mean immediate drop of all snapshots, they are just excluded from any baselines. Snapshots, to be deleted based on default retention policy, will be deleted only on next snapshot() call.
- baseline_keep(node name, name varchar(25), days integer) - change retention of baselines
- node - node name.
- name - baseline name. This parameter may be omitted (or be set to null) to change retention of all existing baselines.
- days - retention time of a baseline in days since now(). Also, may be omitted (or set to null) to set infinite retention.
- baseline_show(node name) - displays existing baselines. Call this function to get information about existing baselines (names, snapshot intervals, and retention times)
- node - node name. node parameter of any baseline management function may be omitted to deal with local node.
postgres=# SELECT * FROM profile.baseline_show('local');
Compare reports is used to compare workload of two time intervals. You can use it in many cases, such as compare database workload before and after application release, investigate sudden performance hits, or just see how workload on you database changes. Compare report in most will contain the same tables as a regular report, but each object in each table will contain statistics from first and second time interval, making it easy to compare. Compare report is built by function report_diff:
- report_diff(node, start1_id, end1_id, start2_id, end2_id) - compare two intervals
- report_diff(node, baseline1_name, baseline2_name) - compare two baselines
- report_diff(node, baseline1_name, start2_id, end2_id) - compare baseline to interval
- report_diff(node, start1_id, end1_id, baseline2_name) - and vice-versa
Example:
$ psql -qtc "SELECT profile.report_diff('omega',12,14,20,22)" \
-o report_omega_12_14_20_22.html
Omit node parameter to build compare report for local node:
$ psql -qtc "SELECT profile.report_diff(12,14,20,22)" \
-o report_12_14_20_22.html
- PostgreSQL collects execution statistics after execution is complete. If single execution of a statement lasts for several snapshots, it will affect statistics of only last snapshot (in which it was completed). And you unable to get statistics on still running statements.
- Resetting any PostgreSQL statistics may affect accuracy of a next snapshot.
- Exclusive locks on relations conflicts with calculating relation size. So, long transactions, performing DDL on tables will "hang" snapshot operation until end of a transaction. lock_timeout is set to 3000, so if snapshot() function will be unable to acquire a lock for 3 seconds, it will fail, and no snapshot will be generated.