Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider timezone when calculate timekey #2054

Merged
merged 8 commits into from
Jul 23, 2018

Conversation

okkez
Copy link
Contributor

@okkez okkez commented Jul 5, 2018

Because the timestamp of the midnight cannot be divisible by 86400 in
localtime (JST). The timestamp of the midnight is divisible by 86400
only in UTC. Therefore we must consider offset from UTC in localtime.

For example, at 2018-07-04 01:23:23 +0900:

If timekey is 86400 and path template is /log/%Y%m%d.log.
In previous version, extract path template to /log/20180703.log.
In this version, extract path template to /log/20180704.log.

Fix #1986

Because the timestamp of the midnight cannot be divisible by 86400 in
localtime (JST). The timestamp of the midnight is divisible by 86400
only in UTC. Therefore we must consider offset from UTC in localtime.

For example, at `2018-07-04 01:23:23 +0900`:

If timekey is 86400 and path template is `/log/%Y%m%d.log`.
In previous version, extract path template to `/log/20180703.log`.
In this version, extract path template to `/log/20180704.log`.

Fix fluent#1986

Signed-off-by: Kenji Okimoto <[email protected]>
@okkez okkez mentioned this pull request Jul 5, 2018
@repeatedly
Copy link
Member

Test failed. Could you check it?

Signed-off-by: Kenji Okimoto <[email protected]>
append true
<buffer>
timekey_use_utc false
timekey_zone Asia/Tokyo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use "+0900" like value?
I'm not sure but test on Windows fails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just commit the fix :)

if @buffer_config.timekey_use_utc
(time_int - (time_int % @buffer_config.timekey)).to_i
else
offset = Time.at(time_int).utc_offset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should consider timekey_zone value instead of utc_offset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe yes. How about the following function?

diff --git a/lib/fluent/timezone.rb b/lib/fluent/timezone.rb
index bb6fa868..0e79dd03 100644
--- a/lib/fluent/timezone.rb
+++ b/lib/fluent/timezone.rb
@@ -139,5 +139,17 @@ module Fluent
 
       return nil
     end
+
+    def self.utc_offset(time, timezone)
+      return 0 if timezone.nil?
+
+      case timezone
+      when NUMERIC_PATTERN
+        Time.zone_offset(timezone)
+      when NAME_PATTERN
+        tz = TZInfo::Timezone.get(timezone)
+        tz.period_for_utc(time).utc_total_offset
+      end
+    end
   end
 end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@@ -825,6 +822,16 @@ def metadata(tag, time, record)
end
end

def calculate_timekey(time)
time_int = time.to_i
if @buffer_config.timekey_use_utc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accessing object member is slower than accessing instance variable.
@timekey_use_utc = @buffer_config.timekey_use_utc in configure is better.

if @timekey_use_utc
(time_int - (time_int % @timekey)).to_i
else
offset = Fluent::Timezone.utc_offset(time, @timekey_zone)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check @timekey_zone format everytime is slower.
This code is called in each emit so need to avoid extra cost.
Cache the result is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we should consider DST. The naive cache is not good for DST.
What should we do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.
One way is generating calculate_timekey method in configure.
For numeric pattern, the offset value can be embbed directly.
For name pattern, embed getting tz data in the body.
How about this?

@@ -557,6 +557,42 @@ def parse_system(text)
check_gzipped_result(path, formatted_lines * 3)
end

test 'append when JST' do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add more test with different time and timezone, e.g. timekey_zone is +09:00 and actual time is +02:00

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will add more test.

For `NUMERIC_PATTERN`, just return static value.
For `NAME_PATTERN`, return lambda to calculate offset later.

This version is 5-10% faster than the previous version.

Signed-off-by: Kenji Okimoto <[email protected]>
@repeatedly repeatedly merged commit 1c55789 into fluent:master Jul 23, 2018
@repeatedly
Copy link
Member

Thanks!

@repeatedly
Copy link
Member

I released v1.2.4.rc1 for testing this patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants