-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Date_histogram
doesn't work too well with timezones
#307
Comments
github-merge-queue bot
pushed a commit
that referenced
this issue
Sep 10, 2024
#307 edit: one case is still wrong... Comment below `(that's also why we don't need to worry about the field being in different timezone in Clickhouse, it's translated to `UTC` anyway)` works in the second case, need to handle it ourselves in the first one. For `date_histogram` we have 2 ways we generate SQLs: a) for longer intervals: e.g. 1 month, we group by using this function `toStartOfMonth(timestamp)`. When taking timezones into consideration, it becomes `toStartOfMonth(toTimeZone(timestamp, "timezone_name"))`. Example: We have such a simple table ![Screenshot 2024-09-09 at 19 00 43](https://github.com/user-attachments/assets/204c0810-a30d-4df6-9ff6-2caf2eb174da) Before, such a request below returned what it returns now if we omit `time_zone` parameter, so `"key_as_string": "2023-12-01T00:00:00.000"` But in Warsaw time, it's `2024-01-01 00:05`, so now we return good, new-year bucket. ![Screenshot 2024-09-09 at 18 59 27](https://github.com/user-attachments/assets/928c2276-fff1-4137-8025-5460160bd4b4) b) for shorter intervals, like the one in the issue #307 , `toUnixTimestamp64Milli("@timestamp") / 3600000` becomes `toUnixTimestamp64Milli("@timestamp")+timeZoneOffset(toTimezone("@timestamp",'Europe/Warsaw'))*1000) / 3600000` Here the SQL is a bit more complex, but I think it needs to be, as this `toUnixTimestamp` function always operates on `UTC`, translates different timezones to `UTC`, etc, so we need to add the offset ourselves. (that's also why we don't need to worry about the field being in different timezone in Clickhouse, it's translated to `UTC` anyway) You can see below that the answers now match those from Elastic, from #307 , the ratio of `doc_count` for days is 1-4, not 2-3 as before this fix. ![Screenshot 2024-09-09 at 19 12 20](https://github.com/user-attachments/assets/02294d24-53bb-43c9-8bd4-89a469950869) in pancakes it works too ![Screenshot 2024-09-09 at 19 16 43](https://github.com/user-attachments/assets/4cf537e5-6af4-47f4-8072-3431e86fb27e)
Fixed. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We'll need to think how to tackle time zones, in both Elastic, and Clickhouse.
How to reproduce:
I changed
docker/log-generator/logger
's code for timestamp to:Then, there's discrepancy between our and Elastic's responses for query at the bottom. Ratio between hits in those 2 buckets is very different.
Elastic returns
And we do:
Query:
The text was updated successfully, but these errors were encountered: