implemented sqlite__datediff macro using epoch deltas #56

tom-juntunen · 2024-04-21T09:51:31Z

I needed to make some empirical adjustments; added comments for discussion
I referenced this as a resource to handle the partial seconds for datepart in ['second', 'millisecond']. This didn't make enough of a difference to matter at coarser dateparts.
I looked into using julian days, also referenced in the article above, but they turned out to not be as effective as using epoch time for measuring single millisecond differences, since they ended up being orders of magnitude smaller than the smallest epoch time delta, which makes the empirical values have a smaller scale which doesn't make the comparison easier.

Overall I think it feels a bit hacky and I'd like to do more thorough testing against a full "solved" calendar table to see if the empirical factors need refinement, or if this implementation needs to be re-thought.

- needed to make some empirical adjustments; added comments for discussion

tom-juntunen · 2024-04-21T10:10:06Z

dbt/include/sqlite/macros/utils/datediff.sql

-        end)
+        CASE
+            WHEN 
+                ((strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 604800.0) >= 0.285715


This empirical value of 0.285715 (~2/7) reflects the dividing line between two scenarios: when the two dates being measured for their weekly difference are greater than two days apart from each other, or within two days.

This is only here because one of the test scenarios says that these two dates below have a week difference of 0:

first_date,second_date,datepart,result 2019-12-31 00:00:00,2020-01-02 00:00:00,week,0

If greater than two days apart, we use CEIL to ensure the week difference is 1
If less than two days apart, we use FLOOR to ensure the week difference is 0.

It does feels arbitrary to say that 2 days is the magical cut off between when a week difference is seen as either 0 or 1. Maybe there is a precedent for this in other RDBMS systems. I'll take a look tomorrow.

If I'm reading it correctly, the implementation for the postgres adapter considers whether the 2nd date "rolls over" into the following week, as determined by whether day of week for second_date is less than that for first_date. So it doesn't have to do with a two-day threshold, it has to do with the fact that 2019-12-31 is a Tuesday, so anything from 2020-01-02 through 2020-01-06 would be considered to have a week difference of 0.

https://github.com/dbt-labs/dbt-postgres/blob/118fbbdc26376dd13cf09b43b31e5eba4235bbe8/dbt/include/postgres/macros/utils/datediff.sql#L11-L17

It's tricky for sqlite because there's no easy way to do a "day of week" calculation, at least with the built-in functions. I wonder if we could find a date library to do this. sqlean, which we're already using, doesn't include any date functions, unfortunately.

So it doesn't have to do with a two-day threshold, it has to do with the fact that 2019-12-31 is a Tuesday, so anything from 2020-01-02 through 2020-01-06 would be considered to have a week difference of 0.

That makes perfect sense now. Thank you for this explanation!

Also thanks for linking the dbt-postgres implementation, it's good that we reference these existing implementations because I find the edge cases exemplified by the unit tests aren't capturing everything we need yet, but seeing their implementation helps close the gap and shed light.

Regarding sqlite and "day of week", we don't have access to postgres' dow, but we can use sqlite's "%w" string format option: https://www.sqlite.org/lang_datefunc.html

select cast(strftime('%w', datetime('now')) as integer); (1,)

Using this we could update our week part to something like:

{% elif datepart == 'week' %} -- Calculate the initial week difference by finding the difference in days and dividing by 7 ( ((strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 86400) / 7 + -- Adjust based on the days of the week to ensure correct week boundaries CASE WHEN strftime('%w', {{ first_date }}) <= strftime('%w', {{ second_date }}) THEN CASE WHEN strftime('%s', {{ first_date }}) <= strftime('%s', {{ second_date }}) THEN 0 ELSE -1 END ELSE CASE WHEN strftime('%s', {{ first_date }}) <= strftime('%s', {{ second_date }}) THEN 1 ELSE 0 END END )

Though for other parts of the macro, I am now questioning the use of CEIL v FLOOR knowing it likely could go wrong somewhere. I'll look into the postgres native implementation for datediff and see if I can learn a bit more from that to refine this PR.

I also think I should consider using the code that was already adapted from postgresql to the left because it might be more performant in some areas, and also reads more like the postgresql one which would be a common reference point.

One example of the performance benefit in the postgresql adapter version is the case statement is only being applied on the second operand used for adjustment, not on the whole expression. This avoids calling the same functions (cast and floor/ceil) multiple times on the same expression.

implemented sqlite__datediff macro using epoch deltas

f52f881

- needed to make some empirical adjustments; added comments for discussion

tom-juntunen requested a review from codeforkjeff April 21, 2024 09:51

tom-juntunen mentioned this pull request Apr 21, 2024

implement datediff macro #26

Open

tom-juntunen commented Apr 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implemented sqlite__datediff macro using epoch deltas #56

implemented sqlite__datediff macro using epoch deltas #56

tom-juntunen commented Apr 21, 2024 •

edited

Loading

tom-juntunen Apr 21, 2024 •

edited

Loading

codeforkjeff Apr 21, 2024

tom-juntunen Apr 22, 2024

tom-juntunen Apr 22, 2024 •

edited

Loading

implemented sqlite__datediff macro using epoch deltas #56

Are you sure you want to change the base?

implemented sqlite__datediff macro using epoch deltas #56

Conversation

tom-juntunen commented Apr 21, 2024 • edited Loading

tom-juntunen Apr 21, 2024 • edited Loading

Choose a reason for hiding this comment

codeforkjeff Apr 21, 2024

Choose a reason for hiding this comment

tom-juntunen Apr 22, 2024

Choose a reason for hiding this comment

tom-juntunen Apr 22, 2024 • edited Loading

Choose a reason for hiding this comment

tom-juntunen commented Apr 21, 2024 •

edited

Loading

tom-juntunen Apr 21, 2024 •

edited

Loading

tom-juntunen Apr 22, 2024 •

edited

Loading