-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support nanosecond #653
Support nanosecond #653
Conversation
@@ -2,6 +2,8 @@ source 'https://rubygems.org/' | |||
|
|||
gemspec | |||
|
|||
gem 'msgpack', github: 'msgpack/msgpack-ruby' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which target version of msgpack-ruby?
0.6.0 or later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch uses msgpack ext type. msgpack-ruby implemented it in master branch but is not released yet. So this patch will be marged after msgpack-ruby is released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the information! I got it.
Generally speaking, using |
@mururu Did you check https://github.com/nurse/strptime for TimeParser |
@repeatedly Thanks. Not yet, I'll check soon. |
After supported nano-second, hard to cache parsed time so we should use faster time parsing way. |
I have performed benchmark for strptime version in_tail plugin. But strptime gem is not stable and can't cover the case of 'Time.parse', so I will work around them. |
Yes. |
@mr-salty How about this patch? It has nano-second resolution. |
This looks good in general, thanks - we have no special requirement for timestamp to be int or ruby Time, I can update it to use this when it is ready. I made a comment elsewhere about parsing JSON timestamps as float losing precision. We have existing users who have the timestamp 'split' in the JSON to avoid this issue, i.e. "time": { "sec": "123456", "nsec": "7890123456" } - could we add that to the parser? |
Exactly. I think that a user should not use floating point numbers if they want to keep precision. It sounds good that we have another JSON timestamp which enable users to aboid using something like strptime.
Looks good to me. /cc @repeatedly |
|
||
def self.from_time(time) | ||
Fluent::NanoTime.new(time.to_i, time.nsec) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: How about adding those methods for convenience of plugin development?:
def to_time
Time.at(@sec, Rational(@nsec, 1000))
end
def self.now
from_time Time.now
end
def self.parse(*args)
from_time Time.parse(*args)
end
Fluentd core core will also be simple with those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_time
I would recommend to use Time.at(time)
because we can use same code for NanoTime and Integer.
self.now
,self.parse(*args)
Ah, at first I implemented this methods, but I removed them later to keep simple API.
Now, I agree because I feel Fluent::NanoTime.from_time(...)
is too verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add NanoTime.strptime
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time.at(time)
I see. It makes sense.
Hm...but it looses millisec precisions. It may not work for plugins which want to use Time with millisec but also want to keep backward compatibility with all input plugins. How about adding NanoTime.to_time(value)
instead (like eq?)?
NanoTime.strptime
I think it's optional. I don't have use cases in my mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm...but it looses millisec precisions.
Really? I read time.c, it seems not to loose precisions. It seems to simply call to_r
.
results on my environment
irb(main):007:0> Time.at(Fluent::NanoTime.from_time(Time.now)).nsec
=> 283749448
NanoTime.strptime
I think it's optional. I don't have use cases in my mind.
Okey.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I didn't know that! I thought Time.now uses value.to_i. Nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean Time.at
? More preceisely, if the object have both of to_int
and to_r
, Time.at
uses to_r
. If the object have only to_int
, Time.at
uses to_int
. Otherwise raises TypeError.
https://github.com/mururu/fluentd/tree/strptime-gem I am trying to improve the performance of TimeParser by strptime gem on that branch. |
|
👍 > EventTime |
👍 for EventTime |
Done > EventTime And new TimeParser using strptime gem works fine with Ruby 2.0 or over. |
@mururu Could you remove 1.9.3 test from |
8562caf
to
e9523ed
Compare
Thank you for the many works on nanosecond support. It looks great because EventTime instance could be handled seemlessly by most of existing plugins working in the fluentd process. I'd like to ask about the msgpack representation of interprocess event stream on the other hand. It seems that it uses an ext format with type 0 and 8 bytes payload. It this correct? @msgpack_factory = MessagePack::Factory.new
@msgpack_factory.register_type(Fluent::EventTime::TYPE, Fluent::EventTime) module Fluent
class EventTime
TYPE = 0
def to_msgpack_ext
[@sec, @nsec].pack('NN')
end
def self.from_msgpack_ext(data)
new(*data.unpack('NN'))
end There are a pair of topics as below about the msgpack representation using the ext format. [1] It costs additional 5 bytes for every single event, even when nanosecond is not used. [2] It's a breaking change of msgpack representation of interprocess event stream. This means every out_forward protocol listernes or msgpack stream inspectors have to support the ext format of type 0. Counterproposals: [A] Using plain array format rather than the ext format ["tag", [
[[1441337812, 123], {"message": "event1"}],
[[1441337813, 456], {"message": "event2"}],
[[1441337814, 789], {"message": "event3"}],
:
]] Pros: It costs only 2 bytes when nanosecond is not used. [B] Using the third parameter for nanosecond value. ["tag", [
[1441337812, {"message": "event1"}, 123],
[1441337813, {"message": "event2"}, 456],
[1441337814, {"message": "event3"}, 789],
:
]] Pros: It costs only 1 byte when nanosecond is not used. EDITED [C] Using float64 per default. Upgradable to switch to use the ext format with an option at fluentd.conf ["tag", [
[1441337812.000000123, {"message": "event1"}],
[1441337813.000000456, {"message": "event2"}],
[1441337814.000000789, {"message": "event3"}],
:
]] Pros: Most of other products would accept float64 value without any changes. |
@kawanet Thank you for your feedback! [A] [B] If users want to force timestamp to be Integer, they can use If users want subsecond resolution, the cost of ext type format is less than or equal to [A] and [B] in most cases.
[C] |
Ah, "Other out_forward protocol listeners are allowed just to ignore nanosecond value." is attractive. But then it may break other plugins when in_forward receives PackedForward. I will investigate it. |
For the backward compatibility, I don't still think using a fixed type of ext format is a good design as a standard interprocess communication protocol. Not only the The standard protocol should use only standard formats, in my opinion. I imagine that the protocol specificication could be published as a RFC or something, in the future. If we don't do that, somebody may propose it as an IETF standard... Sorry just kidding ;) [A] plain array formatCanceled. It may cost one byte longer than the ext format. [B] nanosecond as the third parameter
[C] float 64 format
[D] nanosecond Date ext format as the new msgpack stanrad.The msgpack specification reserves 128 ext formats for the future use. Why don't you ask msgpack maintainers to add nanosecond Date ext format as the msgpack standard. For instance, ext type [E] msgpack representation optionIn a future, msgpack may add the standard Date ext format rather than type 0. The following option allows users to switch the type to follow the standard.
|
From the side of msgpack core maintainer: From sight of msgpack-ruby maintainer: |
@tagomoris I'm happy to hear that msgpack will have an official ext format for nanosecond precision time, yay! I'm looking forward to seeing it soon. I don't have enough knowlege how it'll cost to transform time between Ruby's Time class and EventTime representation, however. |
@mururu Could you rebase the commits? I merged windows support PR and it caused a conflict. |
@mururu ah, 0.1.2 has correctly binaries but there's no code to load ;-( |
@nurse It works fine. Thanks! |
All tests are passed. If there is no concerns, I will merge it tomorrow and release v0.14.0.dev1. |
@mururu @repeatedly Could you use strptime v0.1.4? |
@nurse Do you mean we should update gem dependency to |
@repeatedly ah, current code, |
@frsyuki @tagomoris This PR uses msgpack v0.7.0.dev. Do you have a plan to release a stable version of v0.7.0? |
@frsyuki @tagomoris ping? This is a blocker for merging this PR. |
Wait some more... I'll proceed it as soon as possible from now on. |
@tagomoris @frsyuki So if release msgpack v0.7.0, we should update this patch with timestamp handling? |
Application defined EventTime is needed. So release v0.7.0 seems enough for merging the patch. |
@frsyuki @tagomoris When do you release stable version of msgpack-ruby v0.7.0? |
Right after msgpack/msgpack-ruby#91 is merged. |
msgpack |
@mururu Could you update gem dependency? |
@repeatedly Done |
@mururu Thanks! @frsyuki @tagomoris @sonots Do you have any concerns? If not, I will merge this PR today. |
👍 |
Thanks! |
Fluent::EventTime
This patch introduces new fluentd timestamp,
Fluent::EventTime
, which can have nanosecond granularity and is compatible with currentInteger
timestamp.Implementaion
Fluent::EventTime
is a class which hold@sec
(epoch timestamp) and@nsec
(representing nano second).It behaves like
Integer
(@sec) in most cases. But if plugin authors want to handle subsecond, they need to use its API explicitly.Now,
Engine.now
returnsFluent::EventTime
.Additinal API
new(sec, nsec = 0)
.eq?(a, b)
a
andb
isFluent::EventTime
, check the both of@sec
and@nsec
, otherwise same with==
..from_time(time)
Time
instance toEventTime
#sec
@sec
#nsec
@nsec
#to_r
EventTime
toRational
considering@nsec
too. This is used at the internal ofTime.at
..from_msgpack_ext(data)
Fluent::Engine.msgpack_factory.unpacker
#to_msgpack_ext
Fluent::Engine.msgpack_factory.packer
It seems that
Fluent::EventTime
behaves likeInteger
enough to be used in existing plugins, but maybe we should makeFluent::EventTime
behave more likeInteger
, for exampleObject#is_a?
.Fluent::Engine.msgpack_factory.(un)packer
Fluent::Engine.msgpack_factory.(un)packer
is for serializing and desirializingFluent::EventTime
as msgpack ext type.It seems to be verbose, so maybe we should make another API easy to use.
For exmaple,
ObjectBufferdOutput
If the plugin is a subclass of
ObjectBufferdOutput
, it automatically hastime_as_integer
option. If the option istrue
, timestamps will be converted toInteger
. By default it istrue
for backward compability.Plugins which use
ObjectBufferdOutput
is following. I searched at GitHub.Backword Imcompatibility
fluent-plugin-map
useseval
for raw timestamp, so if you do complicated operation fortime
, it may be broken. ButFluent::EventTime
behaves likeInteger
in almost all cases.Except it, as far as I have confirmed, there were no plugins which are broken by this feature.
My investigation results: https://gist.github.com/mururu/56430620b084eccedf03
Performance Concerns
time_as_integer
istrue
andObjectBufferdOutput
receive PackedForward, it is desirialized and serialized for convertingFluent::EventTime
toInteger
. But by keeping source nodes old or keepingtime_as_integer
true
on them, we can setfalse
totime_as_integer
on relay nodes.TimeParser
. If time log format including subsecond, it is difficult to cache results of parsing it.strptime
.TimeFormatter
. If time format needs sebsecond, it is difficult to cache results of formatting.For Users
This feature let you get nanosecond granularity. But if any plugins which you are using cannot handle it correctly, you may get only second granularity which is same with current timestamp. So please check and wait until all plugins which you are using become be able to handle it.
Forward
Added
time_as_integer
option to forward Output Plugin. When this option istrue
, timestamps which are sent to another node, for example forward Input Plugin orfluentd_forwarder
, is serialized as Integer. By default,time_as_integer
istrue
for backward compatibility. If nodes which you want send to become be able to handleFluent::EventTime
, you can set this optiontrue
.For Plugin Authors
Output Plugin
If the storage which the plugin send logs to can handle subsecond but the plugin send timestamp as integer, you can make it send timestamps including subsecond.
For example: mururu/fluent-plugin-elasticsearch@e671e36
Input Plugin
If the plugin emits a return value of
Engine.now
, there are nothing to do.If the plugin use build-in parser(
TimeParser
), there are nothing to do.If the plugin emits Integer explicitly as timestamp even though it can get subsecond from its source now, it should return
Fluent::EventTime
.Filter Plugin
If the plugin implements
#filter
, there are nothing to do.If the plugin implements
#filter_stream
and replacetime
with Integer, you should convert it toFluent::EventTime
.Parser Plugin
If the plugin uses
TimeParser.parse
, there are nothing to do.If time parsing is implemented by hand, you can return
Fluent::EventTime
by usingFluent::EventTime.from_time
orFlent::EventTime.new
Formatter Plugin
If the plugin uses
TimeFormatter.format
,Time.at
,to_s
,"#{}"
,to_json
,to_msgpack
(almost all cases), there nothing to do.TODO