Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Think about partitioning the log table #15

Open
mreithub opened this issue Mar 12, 2016 · 2 comments
Open

Think about partitioning the log table #15

mreithub opened this issue Mar 12, 2016 · 2 comments

Comments

@mreithub
Copy link
Owner

For large log tables it might make sense to transparently partition them (speeds up lookups and cleanup)

The recall_enable() function could then then get an additional parameter specifying the number of partitions you want (for the given logInterval). If the partition count is set to 1 (default), partitioning won't be enabled for that log table.

The trigger function has to dynamically create new partitions when needed (and could at the same time easily drop old ones, making manual cleanup() calls practically obsolete)

Special care has to be taken to decide on good partition start times as well as a good naming scheme for each of the partitioned tables.

@mreithub
Copy link
Owner Author

This feature though is something for the future (>1.0).

Also it might be a good idea (for performance reasons) to have separate trigger functions (one with partitioning, one without - maybe it's possible to have the partitioning one use (call) the simple one to avoid duplicate code)

@mreithub
Copy link
Owner Author

I noticed that there actually are two ways to partition the log data (since we store two timestamps for each entry):

  • by start time:
    That one's the more straight forward one, but it doesn't necessarily help us with cleaning up the data (cleanup's done using the end timestamp - for good reasons), so after each cleanup call we'd have to check if there are any partitions are empty (and remove them). Worst case scenario would be having a huge number of almost empty partitions (a more advanced cleanup function might move those entries to a fallback partition though)
  • by end time:
    With that one, the cleanup process is simple (simply drop outdated partitions), but log entries need to be moved to another partition when they become outdated.
    There has to be a special partition where end-ts is NULL (but guess what, that's exactly the contents of the data table - plus information on when each entry were created)

Also: PostgreSQL honours check constraints on partitions to quickly discard the ones that can't possibly contain any data matching the query. If we're partitioning by end time, queries selecting by start time (which I guess will come up more often than those for the end timestamp) might not be able to take advantage of that.

To sum up, there's no one-size-fits-all solution for partitioning our log data. When using a naive version of start-time partitioning, we might lose old but still-existing log entries (but at least the start-time never changes). With end-time partitioning, there's no way to avoid moving entries between partitions when they change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant