-
Notifications
You must be signed in to change notification settings - Fork 882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Right-closed time_bucket
#5806
Comments
In the meantime, is there a way to circumvent this issue? I tried adding intervals to my time bucket labels, however this does not work with |
you can imitate a right labelled behavior with: SELECT
time_bucket('5 minutes', time - '1 second'::INTERVAL) + '5 minutes' AS five_min,
avg(cpu)
FROM metrics
GROUP BY five_min
ORDER BY five_min DESC LIMIT 10; (the time you subtract needs to be smaller than the smallest interval your data has in the table Maybe you could achieve the same using the |
Thanks for the tip, Jonan. The downside of this approach is that you cannot create a continuous aggregate of this query: you need a time_bucket of an untransformed column from the underlying table.
…________________________________
Van: Jonas Kittner ***@***.***>
Verzonden: Sunday, June 25, 2023 1:52:13 AM
Aan: timescale/timescaledb ***@***.***>
CC: Maurits van Altvorst ***@***.***>; Author ***@***.***>
Onderwerp: Re: [timescale/timescaledb] [Feature]: Right-closed `time_bucket` (Issue #5806)
you can imitate a right labelled behavior with:
SELECT
time_bucket('5 minutes', time - '1 second'::INTERVAL) + '5 minutes' AS five_min,
avg(cpu)
FROM metrics
GROUP BY five_min
ORDER BY five_min DESC LIMIT 10;
(the time you subtract needs to be smaller than the smallest interval your data has in the table metrics (e.g. if it's 1 minute, 1 second is fine)
but that unfortunately does not work with MATERIALIZED VIEW.
Maybe you could achieve the same using the offset argument in the time_bucket function instead of the subtraction
—
Reply to this email directly, view it on GitHub<#5806 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AKFCP32KI737NTCW5AEV6ZDXM6KU3ANCNFSM6AAAAAAZOW6BWI>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
An alternative is is to expose the function list of "approved" functions for continuous aggregates as a catalog table and allow any functions to be registered there and and store the information we have in the internal cache that marks functions as "ok". You still have to know what you do when you write something like this, but at least it would make it possible to register any "cagg-friendly" function to be used with a continuous aggregate. |
Hi @mkindahl Can you elaborate a bit? So simulating a right-closed time-bucket for continuous aggregates is not possible right now, is that right? Or is this some kind of workaround you posted? |
For reference, pd.DataFrame.resample() has optional |
Hi @melicheradam ! It is possible to write your own version of a right-closed bucketing function (for example, as done in #5864 (comment)) that is straightforward to use from a normal query. However, since this function will not be marked as "approved", it cannot be used in a continuous aggregate. There is a hard-coded list of approved functions that is used when deciding what can be allowed in continuous aggregates and the internal function cache is populated with it on startup. Instead of using a hard-coded list, code could be re-written to store this information in the catalog, which would allow new functions to be added there and "approved" for use in continuous aggregates as bucketing functions. It does place some requirements on any authors of such functions, and it is probably necessary to add some checks to make sure that these functions do not do anything wildly out of the expected, but it would allow users to solve this problem. |
I think a dynamic list of approved continuous aggregate functions would add a lot of complexity if the only goal is to create simple right-closed time buckets, are there other current Github issues such an implementation would solve? |
There was one linked above that was closed because it is related to this one. I have found some issues that are about the offsets, which may be in effort to create right-closed right-labeled time_bucket. In the issue linked above, there is already an implementation for such time_bucket ready, but it cannot be used in continuous aggregate. If someone could include it in the approved list of functions it would be great. I would prepare a PR myself but I am lacking the required experience for this. |
Also for |
Are there any updates regarding this? Sorry, but all our customers don't like to see aggregated points on the graph which represent the values in the future and we really look stupid because of that and always try to explain such behavior but no one cares about our explanations. We found that the Timescale extension is pretty compatible for us in the aggregation and in the performance. It covers most of our needs. In case we don't find in near time the solution to be able to use right-closed time buckets we must leave this extension and rebuild our project using something else. We don't like it because some of the resources were already put into implementing integration with the Timescale extension. |
Hey @Sanprof , currently we are using a hack that does not degrade performance like this, which simulates right-closed timebuckets (aggregations) correctly SELECT time_bucket('5min', timestamp, "offset" => '1ms') + INTERVAL '5min' - INTERVAL '1ms', avg(value) .... But you cannot use this for continuous aggregates, those can only be left-labeled. |
Hey @melicheradam, Thanks for your suggestion, but as you rightly mentioned we need the right-closed time bucket function for continuous aggregates. |
@Sanprof oh sorry, have not realized you were the one originally posted it :D , but you might wanna consider doing some performance tests of your custom function vs offset like i sent above, because we have seen some degradation in performance using a custom function very similar to yours. |
@melicheradam sure, we can make comparison tests for both approaches. I don't promise it will be near time but we will make it once we have time for that. A bit more info about our I can't remember exactly the result of the performance tests we made but when we tried aggregate data for ~100,000,000 records and used call of A long time ago we refused to use such small functions for querying big data tables. E.g. you can find such a function like |
What problem does the new feature solve?
In the current state of TimescaleDB, it is unclear from the docs whether
time_bucket
creates left- or right-closed intervals. Through experimentation I found thattime_bucket
provides left-closed, right-opened, left-labeled intervals.What does the feature do?
It would be desirable to be able to choose on which sides your intervals are open and close. It would also be a nice feature to be able to select which side of the interval should be taken as a label for the
time_bucket
.Implementation challenges
Subtle off-by-one errors for materialized views which might depend on the current behaviour of
time_bucket
.The text was updated successfully, but these errors were encountered: