Table of Contents
- What will cover
- Good Performance
- Principles
- Agenda
- Configurations & Tooling
- Table Schemas
- Queries
- Indices
- Monitoring
- To be Extreme
- Reference
-
Purpose
- OLTP (Online Transaction Processing)
- store and query the transactions
OLAP (Online Analytical Processing)- report data, for planning and management
- OLTP (Online Transaction Processing)
-
Vendors
- Postgres
MySQLOracleDBSQL Server
Good Enough
Number of Concurrent Connections- No Always-Slow Queries
- Minimal Randomly-Slow Queries
It is IMPOSSIBLE to avoid Randomly-Slow Queries
Not all factors we are able to control
- networking
- connection pools
- read-write bottlenecks from storage
-
Transactional queries should be FAST (<500ms)
-
Number of concurrently queries of the same sessions should be less than 5
-
Queries inside a transaction should be a constant in most cases
-
Locks that blocks other queries should be avoided (Especially
Table-level Locking
)- optimistic lock - concurrent read is allowed; write is not allowed
- pessimistic lock - concurrent read nor write allowed
Advanced Topic: Locks, Latches, Enqueues and Mutex
- Configurations & Tooling
- Table Schemas
- Queries
- Indices
- Monitoring
- To be Extreme
-
- Manage connection pool efficiently
- Azure Flexible Server provides the integration of PgBounder as an addon
-
- Optimize Postgres configuration file according to the hardware specification
-
- All query statistics are stored in schema
azure_sys
- check here
SELECT max(datname), query_id, query_sql_text, SUM(calls) total_calls, SUM(total_time) total_time, SUM(total_time) * 1.0 / SUM(calls) avg_time, SUM(rows) * 1.0 / SUM(calls) avg_rows FROM query_store.qs_view q JOIN pg_database d ON q.db_id = d.oid GROUP BY query_id, query_sql_text ORDER BY avg_rows DESC LIMIT 10 -- change top N based on preferences;
- All query statistics are stored in schema
Reference is here
-
Normalization
- design the schema so that any change of one field only affect one record
-
De-normalization
- store module key on tables under the module
- store the number of sum / average result of certain table in its parent table
- Whether the query plan has cached, if yes, use the query plan
- What developers can do: Make sure the query plans can be re-used as much as possible
- Analyze the query
- What developers can do: Be avoid to write complicated queries
- Decide the query plan (Relational Algebra)
- The Execution Plan Optimizer of Postgres is
Cost-based
, NOTRule-based
- What developers can do: Design the queries that can use the indices
- The Execution Plan Optimizer of Postgres is
- Execute
- What developers can do: Do not fetch too much data per query
-
Include non-DML (Data Manipulation Language) in a single Transaction
- Always-Slow Queries on itself, Randomly-Slow Queries
- Include Queries are purely to fetch data
- Include long running function calls NOT RELATED to database
- send emails
- send to service bus
- communicate with external APIs
-
Get too many records from one query
- Always-Slow queries on itself, Randomly-Slow Queries
- growing tables
- with
OR
andUNION
- module tables without
limit
- with
-
- Affect Concurrent Connections, Randomly-Slow Queries
- while fetching data from the grand-children tables (companies -> projects -> records)
SELECT id, "name" FROM classes WHERE school_code = 'LSC'; -- | id | name | -- | 200101 | 1A | -- | 200102 | 1B | -- | 200103 | 1C | -- | 200104 | 1D | -- | 200105 | 1E | -- | 200106 | 1F | -- ... SELECT id, "name" FROM students WHERE class_id = '200101'; SELECT id, "name" FROM students WHERE class_id = '200102'; SELECT id, "name" FROM students WHERE class_id = '200103'; SELECT id, "name" FROM students WHERE class_id = '200104'; SELECT id, "name" FROM students WHERE class_id = '200105'; SELECT id, "name" FROM students WHERE class_id = '200106'; -- ...
-
Store huge records (>500KB each)
- Always-Slow Queries on itself, Randomly-Slow Queries
- Unstructured JSON
- no schema
- schema with recursive nature
- schema with dynamic keys
- Store encoded images
- Store binary data
-
Illegitimate Use of TEXT filter with
LIKE
- Always-Slow Queries on itself, Randomly-Slow Queries
- prefix search is acceptable
- infix search is slow
- postfix search is slow
- Query Plan
- Be avoid to query with complicated conditions that requires
Full Table Scan
on several tables - Be aware of queries on Critical Tables (User Table,
Modules
Tables) - For each query, indices should be used for tables > 1k records
-
Indices are a separate store that can help running the queries faster (no clustered indices support in PG)
-
Good - Make some DDL queries faster while it makes a small portions of DDL queries slower
-
Bad - Require more storage, DML queries slower
- IF you are unsure how the query performs, try
EXPLAIN
orEXPLAIN ANALYZE
Advanced Topic(s): What does "Recheck Cond" in Explain result mean? "Recheck Cond:" line in query plans with a bitmap index scan
- Selectivity
- Cardinality
B-tree
use by default - O(log n)Hash Index
could be a good choice UNDER a few circumstances - O(1)- REQUIREMENT: Postgres Version > 11, not us
GIST & GIN
for JSON data
Advanced Topic(s): How is it possible for Hash Index not to be faster than Btree for equality lookups? POSTGRESQL HASH INDEX PERFORMANCE
- Good for frequently used queries
- Partial Index
- Index on Expressions
-- Can only use inefficient PREFIX search
SELECT * FROM projects WHERE directory LIKE '00123>%';
-- Can use Index on Expressions, IF the length of SEARCH token is FIXED
SELECT * FROM projects WHERE SUBSTRING(directory, 0, 6) = '00123>';
CREATE INDEX IF NOT EXISTS project_directory_idx ON "projects"(SUBSTRING(directory, 0, 6));
- Alerts on Database Server Resource Usage
- Alerts to slow queries
No code is the best way to write secure and reliable applications. Write nothing; deploy nowhere.
No queries is the best way to make the database performing good.
- Query the replica
- Cache the result in application
- Do not store static configuration data
- store in the services providing
strong consistency
- store in the services providing
NoOnly necessary queries to Master Database is the best way to make the database performing good