Think about partitioning the log table #15

mreithub · 2016-03-12T12:35:59Z

For large log tables it might make sense to transparently partition them (speeds up lookups and cleanup)

The recall_enable() function could then then get an additional parameter specifying the number of partitions you want (for the given logInterval). If the partition count is set to 1 (default), partitioning won't be enabled for that log table.

The trigger function has to dynamically create new partitions when needed (and could at the same time easily drop old ones, making manual cleanup() calls practically obsolete)

Special care has to be taken to decide on good partition start times as well as a good naming scheme for each of the partitioned tables.

The text was updated successfully, but these errors were encountered:

mreithub · 2016-03-12T12:37:50Z

This feature though is something for the future (>1.0).

Also it might be a good idea (for performance reasons) to have separate trigger functions (one with partitioning, one without - maybe it's possible to have the partitioning one use (call) the simple one to avoid duplicate code)

mreithub · 2016-05-17T21:23:46Z

I noticed that there actually are two ways to partition the log data (since we store two timestamps for each entry):

by start time:
That one's the more straight forward one, but it doesn't necessarily help us with cleaning up the data (cleanup's done using the end timestamp - for good reasons), so after each cleanup call we'd have to check if there are any partitions are empty (and remove them). Worst case scenario would be having a huge number of almost empty partitions (a more advanced cleanup function might move those entries to a fallback partition though)
by end time:
With that one, the cleanup process is simple (simply drop outdated partitions), but log entries need to be moved to another partition when they become outdated.
There has to be a special partition where end-ts is NULL (but guess what, that's exactly the contents of the data table - plus information on when each entry were created)

Also: PostgreSQL honours check constraints on partitions to quickly discard the ones that can't possibly contain any data matching the query. If we're partitioning by end time, queries selecting by start time (which I guess will come up more often than those for the end timestamp) might not be able to take advantage of that.

To sum up, there's no one-size-fits-all solution for partitioning our log data. When using a naive version of start-time partitioning, we might lose old but still-existing log entries (but at least the start-time never changes). With end-time partitioning, there's no way to avoid moving entries between partitions when they change.

mreithub added the enhancement label May 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Think about partitioning the log table #15

Think about partitioning the log table #15

mreithub commented Mar 12, 2016

mreithub commented Mar 12, 2016

mreithub commented May 17, 2016

Think about partitioning the log table #15

Think about partitioning the log table #15

Comments

mreithub commented Mar 12, 2016

mreithub commented Mar 12, 2016

mreithub commented May 17, 2016