[Feature]: Right-closed `time_bucket` #5806

mvanaltvorst · 2023-06-21T13:00:10Z

What problem does the new feature solve?

In the current state of TimescaleDB, it is unclear from the docs whether time_bucket creates left- or right-closed intervals. Through experimentation I found that time_bucket provides left-closed, right-opened, left-labeled intervals.

What does the feature do?

It would be desirable to be able to choose on which sides your intervals are open and close. It would also be a nice feature to be able to select which side of the interval should be taken as a label for the time_bucket.

Implementation challenges

Subtle off-by-one errors for materialized views which might depend on the current behaviour of time_bucket.

The text was updated successfully, but these errors were encountered:

mvanaltvorst · 2023-06-21T13:11:00Z

In the meantime, is there a way to circumvent this issue? I tried adding intervals to my time bucket labels, however this does not work with MATERIALIZED VIEW's.

jkittner · 2023-06-25T01:52:03Z

you can imitate a right labelled behavior with:

SELECT
  time_bucket('5 minutes', time - '1 second'::INTERVAL) + '5 minutes' AS five_min,
  avg(cpu)
FROM metrics
GROUP BY five_min
ORDER BY five_min DESC LIMIT 10;

(the time you subtract needs to be smaller than the smallest interval your data has in the table metrics (e.g. if it's 1 minute, 1 second is fine)
but that unfortunately does not work with MATERIALIZED VIEW.

Maybe you could achieve the same using the offset argument in the time_bucket function instead of the subtraction

mvanaltvorst · 2023-06-25T07:10:32Z

Thanks for the tip, Jonan. The downside of this approach is that you cannot create a continuous aggregate of this query: you need a time_bucket of an untransformed column from the underlying table.

…

________________________________ Van: Jonas Kittner ***@***.***> Verzonden: Sunday, June 25, 2023 1:52:13 AM Aan: timescale/timescaledb ***@***.***> CC: Maurits van Altvorst ***@***.***>; Author ***@***.***> Onderwerp: Re: [timescale/timescaledb] [Feature]: Right-closed `time_bucket` (Issue #5806) you can imitate a right labelled behavior with: SELECT time_bucket('5 minutes', time - '1 second'::INTERVAL) + '5 minutes' AS five_min, avg(cpu) FROM metrics GROUP BY five_min ORDER BY five_min DESC LIMIT 10; (the time you subtract needs to be smaller than the smallest interval your data has in the table metrics (e.g. if it's 1 minute, 1 second is fine) but that unfortunately does not work with MATERIALIZED VIEW. Maybe you could achieve the same using the offset argument in the time_bucket function instead of the subtraction — Reply to this email directly, view it on GitHub<#5806 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AKFCP32KI737NTCW5AEV6ZDXM6KU3ANCNFSM6AAAAAAZOW6BWI>. You are receiving this because you authored the thread.Message ID: ***@***.***>

mkindahl · 2023-08-14T13:39:38Z

An alternative is is to expose the function list of "approved" functions for continuous aggregates as a catalog table and allow any functions to be registered there and and store the information we have in the internal cache that marks functions as "ok". You still have to know what you do when you write something like this, but at least it would make it possible to register any "cagg-friendly" function to be used with a continuous aggregate.

melicheradam · 2023-08-15T07:58:59Z

Hi @mkindahl

Can you elaborate a bit? So simulating a right-closed time-bucket for continuous aggregates is not possible right now, is that right? Or is this some kind of workaround you posted?

mvanaltvorst · 2023-08-15T08:26:21Z

For reference, pd.DataFrame.resample() has optional closed and label arguments for this. I think Pandas' API works quite well, is it fundamentally difficult to add these optional arguments to the time_bucket function?

mkindahl · 2023-08-16T06:33:05Z

Hi @mkindahl

Can you elaborate a bit? So simulating a right-closed time-bucket for continuous aggregates is not possible right now, is that right? Or is this some kind of workaround you posted?

Hi @melicheradam !

It is possible to write your own version of a right-closed bucketing function (for example, as done in #5864 (comment)) that is straightforward to use from a normal query. However, since this function will not be marked as "approved", it cannot be used in a continuous aggregate.

There is a hard-coded list of approved functions that is used when deciding what can be allowed in continuous aggregates and the internal function cache is populated with it on startup. Instead of using a hard-coded list, code could be re-written to store this information in the catalog, which would allow new functions to be added there and "approved" for use in continuous aggregates as bucketing functions.

It does place some requirements on any authors of such functions, and it is probably necessary to add some checks to make sure that these functions do not do anything wildly out of the expected, but it would allow users to solve this problem.

mvanaltvorst · 2023-08-16T09:36:16Z

I think a dynamic list of approved continuous aggregate functions would add a lot of complexity if the only goal is to create simple right-closed time buckets, are there other current Github issues such an implementation would solve?

melicheradam · 2023-08-16T10:29:02Z

@mvanaltvorst

There was one linked above that was closed because it is related to this one. I have found some issues that are about the offsets, which may be in effort to create right-closed right-labeled time_bucket.

In the issue linked above, there is already an implementation for such time_bucket ready, but it cannot be used in continuous aggregate. If someone could include it in the approved list of functions it would be great. I would prepare a PR myself but I am lacking the required experience for this.

melicheradam · 2023-08-21T11:11:03Z

Also for time_bucket_gapfill

Sanprof · 2023-09-22T09:00:42Z

Are there any updates regarding this?

Sorry, but all our customers don't like to see aggregated points on the graph which represent the values in the future and we really look stupid because of that and always try to explain such behavior but no one cares about our explanations.

We found that the Timescale extension is pretty compatible for us in the aggregation and in the performance. It covers most of our needs.

In case we don't find in near time the solution to be able to use right-closed time buckets we must leave this extension and rebuild our project using something else. We don't like it because some of the resources were already put into implementing integration with the Timescale extension.

melicheradam · 2023-09-22T09:45:05Z

Hey @Sanprof ,

currently we are using a hack that does not degrade performance like this, which simulates right-closed timebuckets (aggregations) correctly

SELECT time_bucket('5min', timestamp, "offset" => '1ms') + INTERVAL '5min' - INTERVAL '1ms', avg(value) ....

But you cannot use this for continuous aggregates, those can only be left-labeled.

Sanprof · 2023-09-22T11:17:59Z

Hey @melicheradam,

Thanks for your suggestion, but as you rightly mentioned we need the right-closed time bucket function for continuous aggregates.
For our regular selects, we are using our own time_window function which covers all our needs.

melicheradam · 2023-09-22T11:37:36Z

@Sanprof oh sorry, have not realized you were the one originally posted it :D , but you might wanna consider doing some performance tests of your custom function vs offset like i sent above, because we have seen some degradation in performance using a custom function very similar to yours.

Sanprof · 2023-09-22T12:38:21Z

@melicheradam sure, we can make comparison tests for both approaches. I don't promise it will be near time but we will make it once we have time for that.

A bit more info about our time_window function)
Normally, we don't call time_window function but use its body directly in the select query (copy/paste stuff :D) when we are querying big data tables. For the small statistical data, we use time_window function directly in the select query because we don't care about milliseconds of execution.

I can't remember exactly the result of the performance tests we made but when we tried aggregate data for ~100,000,000 records and used call of time_window directly in the select query the performance was really low compared with using a body of this function inside the select query.

A long time ago we refused to use such small functions for querying big data tables. E.g. you can find such a function like iif (I guess it came from the excel) with the simple body (CASE WHEN ..) which some database developers like to use to simplify writing queries (and it actually good) but it stops working as expected when you query big data tables - timings for selects straightaway up.

mvanaltvorst added the feature-request Feature proposal label Jun 21, 2023

melicheradam mentioned this issue Aug 2, 2023

TimescaleDB continuous aggregation wrong time buckets #5864

Closed

guiofranca mentioned this issue Mar 22, 2024

Support for CAgg with time_bucket origin/offset parameter #6382

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Right-closed `time_bucket` #5806

[Feature]: Right-closed `time_bucket` #5806

mvanaltvorst commented Jun 21, 2023 •

edited

Loading

mvanaltvorst commented Jun 21, 2023

jkittner commented Jun 25, 2023

mvanaltvorst commented Jun 25, 2023 via email

mkindahl commented Aug 14, 2023 •

edited

Loading

melicheradam commented Aug 15, 2023

mvanaltvorst commented Aug 15, 2023

mkindahl commented Aug 16, 2023

mvanaltvorst commented Aug 16, 2023

melicheradam commented Aug 16, 2023

melicheradam commented Aug 21, 2023

Sanprof commented Sep 22, 2023

melicheradam commented Sep 22, 2023

Sanprof commented Sep 22, 2023

melicheradam commented Sep 22, 2023

Sanprof commented Sep 22, 2023 •

edited

Loading

[Feature]: Right-closed time_bucket #5806

[Feature]: Right-closed time_bucket #5806

Comments

mvanaltvorst commented Jun 21, 2023 • edited Loading

What problem does the new feature solve?

What does the feature do?

Implementation challenges

mvanaltvorst commented Jun 21, 2023

jkittner commented Jun 25, 2023

mvanaltvorst commented Jun 25, 2023 via email

mkindahl commented Aug 14, 2023 • edited Loading

melicheradam commented Aug 15, 2023

mvanaltvorst commented Aug 15, 2023

mkindahl commented Aug 16, 2023

mvanaltvorst commented Aug 16, 2023

melicheradam commented Aug 16, 2023

melicheradam commented Aug 21, 2023

Sanprof commented Sep 22, 2023

melicheradam commented Sep 22, 2023

Sanprof commented Sep 22, 2023

melicheradam commented Sep 22, 2023

Sanprof commented Sep 22, 2023 • edited Loading

[Feature]: Right-closed `time_bucket` #5806

[Feature]: Right-closed `time_bucket` #5806

mvanaltvorst commented Jun 21, 2023 •

edited

Loading

mkindahl commented Aug 14, 2023 •

edited

Loading

Sanprof commented Sep 22, 2023 •

edited

Loading