-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance best practice suggestions #32
Comments
Hey there, @dino2gnt… The query performed by That said, PL/pgSQL is not capable of streaming rows to a caller. As such, the result set it is building here does reside in memory before being scanned by the parent query.
That all said, your feedback and use case is important… PL/pgSQL is always just a first step, and it would be possible to reimplement this function as a custom scan node in C, which would have the side-effect of cleaning up the usage syntax and also make the more stream/scan-like approach possible. I'll consider this a feature request. What is it you need to accomplish with this query? Maybe there's some workaround I can help you with in the meantime? |
In the application i am developing this plugin for, the time series request comes into the plugin containing the start and end times, the key, the aggregation (if any), the metric type, and the stride (or step, in seconds, dynamically calculated by something higher up in the stack). This lent itself to jamming everything into a query that took advantage of I spent some time this afternoon working on this, and came up with this CTE In my testing environment, this returns numerically equal results to |
Hi,
Let me preface this by stating that I am not a PostgreSQL expert, so if there's something obvious I'm not understanding I apologize.
I'm writing a plugin that leverages pg_timeseries in an open source enterprise network monitoring platform. I'm using a simple time series schema:
There are currently several hundred unique keys, but this could grow to tens of thousands for busy instances, making a table per datasource untenable.
Samples are collected every five minutes, but the interval is configurable per data source so it could be more or less frequently.
When using
date_bin_table
for time bucketting, I'm getting query plan results showing a lot of temp IO, as ~a million rows are selected and the filtered during the function scan, resulting in several hundred MB of read and write IO per query. I assume this is caused by thedate_bin_table
function selecting everything between the start time and end time, then filtering out rows that don't match the key?See also: https://explain.depesz.com/s/G4gq
Granted, this development instance is quite under-resourced and the clock time here is illustrative of that, but as the total number of keys grows the absolute number of rows between a given start and end will grow, creating even more IO as more rows are selected and discarded.
Is there something in my schema design or query that violates the expected best practice for
pg_timeseries
that I should change to mitigate this circumstance? Are there indexes thatdate_bin_table
expects or could take advantage of that I am missing?The text was updated successfully, but these errors were encountered: