-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implemented sqlite__datediff macro using epoch deltas #56
base: main
Are you sure you want to change the base?
Conversation
- needed to make some empirical adjustments; added comments for discussion
end) | ||
CASE | ||
WHEN | ||
((strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 604800.0) >= 0.285715 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This empirical value of 0.285715 (~2/7) reflects the dividing line between two scenarios: when the two dates being measured for their weekly difference are greater than two days apart from each other, or within two days.
This is only here because one of the test scenarios says that these two dates below have a week difference of 0:
first_date,second_date,datepart,result
2019-12-31 00:00:00,2020-01-02 00:00:00,week,0
If greater than two days apart, we use CEIL to ensure the week difference is 1
If less than two days apart, we use FLOOR to ensure the week difference is 0.
It does feels arbitrary to say that 2 days is the magical cut off between when a week difference is seen as either 0 or 1. Maybe there is a precedent for this in other RDBMS systems. I'll take a look tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading it correctly, the implementation for the postgres adapter considers whether the 2nd date "rolls over" into the following week, as determined by whether day of week for second_date is less than that for first_date. So it doesn't have to do with a two-day threshold, it has to do with the fact that 2019-12-31 is a Tuesday, so anything from 2020-01-02 through 2020-01-06 would be considered to have a week difference of 0.
It's tricky for sqlite because there's no easy way to do a "day of week" calculation, at least with the built-in functions. I wonder if we could find a date library to do this. sqlean, which we're already using, doesn't include any date functions, unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it doesn't have to do with a two-day threshold, it has to do with the fact that 2019-12-31 is a Tuesday, so anything from 2020-01-02 through 2020-01-06 would be considered to have a week difference of 0.
That makes perfect sense now. Thank you for this explanation!
Also thanks for linking the dbt-postgres implementation, it's good that we reference these existing implementations because I find the edge cases exemplified by the unit tests aren't capturing everything we need yet, but seeing their implementation helps close the gap and shed light.
Regarding sqlite and "day of week", we don't have access to postgres' dow
, but we can use sqlite's "%w" string format option: https://www.sqlite.org/lang_datefunc.html
select cast(strftime('%w', datetime('now')) as integer);
(1,)
Using this we could update our week
part to something like:
{% elif datepart == 'week' %}
-- Calculate the initial week difference by finding the difference in days and dividing by 7
(
((strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 86400) / 7 +
-- Adjust based on the days of the week to ensure correct week boundaries
CASE
WHEN strftime('%w', {{ first_date }}) <= strftime('%w', {{ second_date }})
THEN CASE
WHEN strftime('%s', {{ first_date }}) <= strftime('%s', {{ second_date }})
THEN 0
ELSE -1
END
ELSE CASE
WHEN strftime('%s', {{ first_date }}) <= strftime('%s', {{ second_date }})
THEN 1
ELSE 0
END
END
)
Though for other parts of the macro, I am now questioning the use of CEIL v FLOOR knowing it likely could go wrong somewhere. I'll look into the postgres native implementation for datediff and see if I can learn a bit more from that to refine this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think I should consider using the code that was already adapted from postgresql to the left because it might be more performant in some areas, and also reads more like the postgresql one which would be a common reference point.
One example of the performance benefit in the postgresql adapter version is the case statement is only being applied on the second operand used for adjustment, not on the whole expression. This avoids calling the same functions (cast and floor/ceil) multiple times on the same expression.
Overall I think it feels a bit hacky and I'd like to do more thorough testing against a full "solved" calendar table to see if the empirical factors need refinement, or if this implementation needs to be re-thought.