Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support nanosecond #653

Merged
merged 63 commits into from
Oct 26, 2015
Merged

Support nanosecond #653

merged 63 commits into from
Oct 26, 2015

Conversation

mururu
Copy link
Member

@mururu mururu commented Aug 13, 2015

Fluent::EventTime

This patch introduces new fluentd timestamp, Fluent::EventTime, which can have nanosecond granularity and is compatible with current Integer timestamp.

Implementaion

Fluent::EventTime is a class which hold @sec(epoch timestamp) and @nsec(representing nano second).

It behaves like Integer(@sec) in most cases. But if plugin authors want to handle subsecond, they need to use its API explicitly.

Now, Engine.now returns Fluent::EventTime.

Additinal API

  • new(sec, nsec = 0)
    • constructor
  • .eq?(a, b)
    • If both of a and b is Fluent::EventTime, check the both of@sec and @nsec, otherwise same with ==.
  • .from_time(time)
    • Convert Time instance to EventTime
  • #sec
    • Return @sec
  • #nsec
    • Return @nsec
  • #to_r
    • Convert EventTime to Rational considering @nsec too. This is used at the internal of Time.at.
  • .from_msgpack_ext(data)
    • This is used by Fluent::Engine.msgpack_factory.unpacker
  • #to_msgpack_ext
    • This is used by Fluent::Engine.msgpack_factory.packer

It seems that Fluent::EventTime behaves like Integer enough to be used in existing plugins, but maybe we should make Fluent::EventTime behave more like Integer, for example Object#is_a?.

Fluent::Engine.msgpack_factory.(un)packer

Fluent::Engine.msgpack_factory.(un)packer is for serializing and desirializing Fluent::EventTime as msgpack ext type.

packer = Fluent::Engine.msgpack_factory.packer
packer.write([tag, Engine.now, record])
packer.to_s

It seems to be verbose, so maybe we should make another API easy to use.
For exmaple,

def format(tag, time, record)
  Fluent::Engine.to_msgpack([tag, time, record])
end

ObjectBufferdOutput

If the plugin is a subclass of ObjectBufferdOutput, it automatically has time_as_integer option. If the option is true, timestamps will be converted to Integer. By default it is true for backward compability.

Plugins which use ObjectBufferdOutput is following. I searched at GitHub.

Backword Imcompatibility

fluent-plugin-map uses eval for raw timestamp, so if you do complicated operation for time, it may be broken. But Fluent::EventTime behaves like Integer in almost all cases.

Except it, as far as I have confirmed, there were no plugins which are broken by this feature.

My investigation results: https://gist.github.com/mururu/56430620b084eccedf03

Performance Concerns

  • When time_as_integer is true and ObjectBufferdOutput receive PackedForward, it is desirialized and serialized for converting Fluent::EventTime to Integer. But by keeping source nodes old or keeping time_as_integer true on them, we can set false to time_as_integer on relay nodes.
  • TimeParser. If time log format including subsecond, it is difficult to cache results of parsing it.
  • TimeFormatter. If time format needs sebsecond, it is difficult to cache results of formatting.

For Users

This feature let you get nanosecond granularity. But if any plugins which you are using cannot handle it correctly, you may get only second granularity which is same with current timestamp. So please check and wait until all plugins which you are using become be able to handle it.

Forward

Added time_as_integer option to forward Output Plugin. When this option is true, timestamps which are sent to another node, for example forward Input Plugin or fluentd_forwarder, is serialized as Integer. By default, time_as_integer is true for backward compatibility. If nodes which you want send to become be able to handle Fluent::EventTime, you can set this option true.

For Plugin Authors

Output Plugin

If the storage which the plugin send logs to can handle subsecond but the plugin send timestamp as integer, you can make it send timestamps including subsecond.
For example: mururu/fluent-plugin-elasticsearch@e671e36

Input Plugin

If the plugin emits a return value of Engine.now, there are nothing to do.
If the plugin use build-in parser(TimeParser), there are nothing to do.
If the plugin emits Integer explicitly as timestamp even though it can get subsecond from its source now, it should return Fluent::EventTime.

Filter Plugin

If the plugin implements #filter, there are nothing to do.
If the plugin implements #filter_stream and replace time with Integer, you should convert it to Fluent::EventTime.

Parser Plugin

If the plugin uses TimeParser.parse, there are nothing to do.
If time parsing is implemented by hand, you can return Fluent::EventTime by using Fluent::EventTime.from_time or Flent::EventTime.new

Formatter Plugin

If the plugin uses TimeFormatter.format, Time.at, to_s, "#{}", to_json, to_msgpack (almost all cases), there nothing to do.

TODO

  • Add tests
  • Documentation
  • Implement new TimeParser using strptime gem

@@ -2,6 +2,8 @@ source 'https://rubygems.org/'

gemspec

gem 'msgpack', github: 'msgpack/msgpack-ruby'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which target version of msgpack-ruby?
0.6.0 or later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch uses msgpack ext type. msgpack-ruby implemented it in master branch but is not released yet. So this patch will be marged after msgpack-ruby is released.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the information! I got it.

@tagomoris
Copy link
Member

Generally speaking, using include Fluentd just to save types is bad idea.
I know that it's done in current Fluentd code, but please don't add more cases and use fully-qualified specification as Fluent::NanoTime.

@repeatedly
Copy link
Member

@mururu Did you check https://github.com/nurse/strptime for TimeParser

@mururu
Copy link
Member Author

mururu commented Aug 17, 2015

@repeatedly Thanks. Not yet, I'll check soon.

@repeatedly
Copy link
Member

After supported nano-second, hard to cache parsed time so we should use faster time parsing way.

@mururu
Copy link
Member Author

mururu commented Aug 17, 2015

I have performed benchmark for strptime version in_tail plugin.
The result looks fast enough.
https://github.com/mururu/fluentd-benchmark/tree/master/tail

But strptime gem is not stable and can't cover the case of 'Time.parse', so I will work around them.

@repeatedly
Copy link
Member

can't cover the case of 'Time.parse', so I will work around them.

Yes. strptime gem is used when time_format is specified.
Almost cases, users specify time_format so we benefit from this gem.

@repeatedly
Copy link
Member

@mr-salty How about this patch? It has nano-second resolution.
Does your plugin need 64bit integer for nano-second or Ruby's Time object?

@mr-salty
Copy link
Contributor

This looks good in general, thanks - we have no special requirement for timestamp to be int or ruby Time, I can update it to use this when it is ready.

I made a comment elsewhere about parsing JSON timestamps as float losing precision. We have existing users who have the timestamp 'split' in the JSON to avoid this issue, i.e. "time": { "sec": "123456", "nsec": "7890123456" } - could we add that to the parser?

@mururu
Copy link
Member Author

mururu commented Aug 18, 2015

I don't think this is avoidable using floating point numbers; theoretically something like strptime("...%S.%9N") could parse it but we also have to consider that whatever wrote the timestamp may have lost precision before writing it.

Exactly. I think that a user should not use floating point numbers if they want to keep precision. It sounds good that we have another JSON timestamp which enable users to aboid using something like strptime.

"time": { "sec": "123456", "nsec": "7890123456" }

Looks good to me.

/cc @repeatedly


def self.from_time(time)
Fluent::NanoTime.new(time.to_i, time.nsec)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: How about adding those methods for convenience of plugin development?:

def to_time
  Time.at(@sec, Rational(@nsec, 1000))
end

def self.now
  from_time Time.now
end

def self.parse(*args)
  from_time Time.parse(*args)
end

Fluentd core core will also be simple with those.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_time

I would recommend to use Time.at(time) because we can use same code for NanoTime and Integer.

self.now, self.parse(*args)

Ah, at first I implemented this methods, but I removed them later to keep simple API.
Now, I agree because I feel Fluent::NanoTime.from_time(...) is too verbose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add NanoTime.strptime too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time.at(time)

I see. It makes sense.

Hm...but it looses millisec precisions. It may not work for plugins which want to use Time with millisec but also want to keep backward compatibility with all input plugins. How about adding NanoTime.to_time(value) instead (like eq?)?

NanoTime.strptime

I think it's optional. I don't have use cases in my mind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm...but it looses millisec precisions.

Really? I read time.c, it seems not to loose precisions. It seems to simply call to_r.

results on my environment

irb(main):007:0> Time.at(Fluent::NanoTime.from_time(Time.now)).nsec
=> 283749448

NanoTime.strptime
I think it's optional. I don't have use cases in my mind.

Okey.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I didn't know that! I thought Time.now uses value.to_i. Nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean Time.at? More preceisely, if the object have both of to_int and to_r, Time.at uses to_r. If the object have only to_int, Time.at uses to_int. Otherwise raises TypeError.

@mururu
Copy link
Member Author

mururu commented Aug 21, 2015

https://github.com/mururu/fluentd/tree/strptime-gem

I am trying to improve the performance of TimeParser by strptime gem on that branch.
It looks good performance.
When it become compatible with the current TimeParser, I will merge it into this branch.

@mururu
Copy link
Member Author

mururu commented Aug 26, 2015

NanoTime too represents its imprementaion, so I think it is good to change its name to EventTime. ( Thanks @repeatedly )
Any thoughts?
/cc @tagomoris @sonots @frsyuki

@sonots
Copy link
Member

sonots commented Aug 26, 2015

👍 > EventTime

@repeatedly repeatedly changed the title [WIP] Support nanosecond Support nanosecond Aug 27, 2015
@tagomoris
Copy link
Member

👍 for EventTime

@mururu
Copy link
Member Author

mururu commented Aug 28, 2015

Done > EventTime

And new TimeParser using strptime gem works fine with Ruby 2.0 or over.

@repeatedly
Copy link
Member

@mururu Could you remove 1.9.3 test from .travis.yml?

@kawanet
Copy link
Contributor

kawanet commented Sep 4, 2015

Thank you for the many works on nanosecond support. It looks great because EventTime instance could be handled seemlessly by most of existing plugins working in the fluentd process.

I'd like to ask about the msgpack representation of interprocess event stream on the other hand. It seems that it uses an ext format with type 0 and 8 bytes payload. It this correct?

      @msgpack_factory = MessagePack::Factory.new
      @msgpack_factory.register_type(Fluent::EventTime::TYPE, Fluent::EventTime)
module Fluent
  class EventTime
    TYPE = 0

    def to_msgpack_ext
      [@sec, @nsec].pack('NN')
    end

    def self.from_msgpack_ext(data)
      new(*data.unpack('NN'))
    end

There are a pair of topics as below about the msgpack representation using the ext format.

[1] It costs additional 5 bytes for every single event, even when nanosecond is not used.

[2] It's a breaking change of msgpack representation of interprocess event stream. This means every out_forward protocol listernes or msgpack stream inspectors have to support the ext format of type 0.

Counterproposals:

[A] Using plain array format rather than the ext format

["tag", [
  [[1441337812, 123], {"message": "event1"}],
  [[1441337813, 456], {"message": "event2"}],
  [[1441337814, 789], {"message": "event3"}],
  :
]]

Pros: It costs only 2 bytes when nanosecond is not used.
Pros: The stream could be easily inspected as a JSON representation as above.
Cons: It's still breaking the good old out_forward protocol.

[B] Using the third parameter for nanosecond value.

["tag", [
  [1441337812, {"message": "event1"}, 123],
  [1441337813, {"message": "event2"}, 456],
  [1441337814, {"message": "event3"}, 789],
  :
]]

Pros: It costs only 1 byte when nanosecond is not used.
Pros: Other out_forward protocol listeners are allowed just to ignore nanosecond value.
Cons: Ugly order. (Who cares?)

EDITED

[C] Using float64 per default. Upgradable to switch to use the ext format with an option at fluentd.conf

["tag", [
  [1441337812.000000123, {"message": "event1"}],
  [1441337813.000000456, {"message": "event2"}],
  [1441337814.000000789, {"message": "event3"}],
  :
]]

Pros: Most of other products would accept float64 value without any changes.
Pros: It could carry almost millisecond places. milliseconds is enough for popular use-cases.
Cons: It looses smaller places, micro and nano second) as float64 has a coefficient of 53 bits.

@mururu
Copy link
Member Author

mururu commented Sep 4, 2015

@kawanet Thank you for your feedback!

[A] [B]

If users want to force timestamp to be Integer, they can use time_as_integer option. In other words, they should not change their configuration.
Then the costs are the same.

If users want subsecond resolution, the cost of ext type format is less than or equal to [A] and [B] in most cases.

irb> Fluent::Engine.msgpack_factory.packer.write(Fluent::EventTime.new(2**32-1, 2**32-1)).to_s.size
=> 10
irb> Fluent::Engine.msgpack_factory.packer.write([2**32-1, 2**32-1]).to_s.size
=> 11
irb> Fluent::Engine.msgpack_factory.packer.write([Fluent::EventTime.new(2**32-1, 2**32-1), {}]).to_s.size
=> 12
irb> Fluent::Engine.msgpack_factory.packer.write([2**32-1, {}, 2**32-1]).to_s.size
=> 12

[C]
You mean that we should provide somthing like time_as_float option instead of time_as_integer option? It makes sense. But it seems to me that the situation of mixing Integer, Float and Fluent::EventTime is so complicated that it doesn't pay. I think that if users want subsecond resolution, they should just set time_as_integer as true explicitly.

@mururu
Copy link
Member Author

mururu commented Sep 4, 2015

Ah, "Other out_forward protocol listeners are allowed just to ignore nanosecond value." is attractive. But then it may break other plugins when in_forward receives PackedForward. I will investigate it.

@kawanet
Copy link
Contributor

kawanet commented Sep 4, 2015

[1] It costs additional 5 bytes for every single event, even when nanosecond is not used.
[2] It's a breaking change of msgpack representation of interprocess event stream.

For the backward compatibility, time_as_integer's default value is true which does NOT change out_forward msgpack representation. I understood that there are no problems on the fluentd and it plugins. Thank you for your explanation.

I don't still think using a fixed type of ext format is a good design as a standard interprocess communication protocol. Not only the fluentd application itself, numbers of applications are already talking the out_forward's protocol.

The standard protocol should use only standard formats, in my opinion. I imagine that the protocol specificication could be published as a RFC or something, in the future. If we don't do that, somebody may propose it as an IETF standard... Sorry just kidding ;)


[A] plain array format

Canceled. It may cost one byte longer than the ext format.

[B] nanosecond as the third parameter

  • Pros: It uses plain msgpack formats only.
  • Pros: It costs as same as the ext format
  • Pros: Other applications could ignore it.
  • Cons: It looks ugly. [1441337812, {"message": "event1"}, 123]

[C] float 64 format

  • Pros: It uses plain msgpack formats only.
  • Pros: It costs one byte less than the ext format
  • Pros: Other applications would accept it without changes.
  • Cons: It only carries almost miliseconds and loose rest.

[D] nanosecond Date ext format as the new msgpack stanrad.

The msgpack specification reserves 128 ext formats for the future use. Why don't you ask msgpack maintainers to add nanosecond Date ext format as the msgpack standard. For instance, ext type 0xDA looks good to represent a DAte value. I'm not sure they would accept it.

[E] msgpack representation option

In a future, msgpack may add the standard Date ext format rather than type 0. The following option allows users to switch the type to follow the standard.

  • time_msgpack integer (default when time_as_integer true)
  • time_msgpack float (this works for most cases)
  • time_msgpack 0 (ext type 0x00. default when time_as_integer false)
  • time_msgpack -38 (ext type 0xDA prospected in the future standard)

@tagomoris
Copy link
Member

From the side of msgpack core maintainer:
We are planning to add the first built-in ext type (type id -1) for Timestamp to the spec of MessagePack, which represents time with nanosecond precision. It'll be done as soon as possible for me to create pull-request for the spec.
Using custom ext type id (0) is not bad idea to implement custom timestamp class until many msgpack implementation will support it.

From sight of msgpack-ruby maintainer:
Built-in Timestamp ext type will be associated with Ruby's Time class. But on Fluentd with nanosecond support, objects with ext type -1 will be deserialized into Fluent::EventTime, and Fluent::EventTime will be serialized with -1 type.
(Of course, any other scripts can get Time object from byte stream from out_forward.)

@kawanet
Copy link
Contributor

kawanet commented Sep 7, 2015

@tagomoris I'm happy to hear that msgpack will have an official ext format for nanosecond precision time, yay!

I'm looking forward to seeing it soon. I don't have enough knowlege how it'll cost to transform time between Ruby's Time class and EventTime representation, however.

@repeatedly
Copy link
Member

@mururu Could you rebase the commits? I merged windows support PR and it caused a conflict.

@nurse
Copy link
Contributor

nurse commented Oct 9, 2015

@mururu ah, 0.1.2 has correctly binaries but there's no code to load ;-(
I released v0.1.3; could you re-update?

@mururu
Copy link
Member Author

mururu commented Oct 9, 2015

@nurse It works fine. Thanks!

@repeatedly
Copy link
Member

All tests are passed.

If there is no concerns, I will merge it tomorrow and release v0.14.0.dev1.

@nurse
Copy link
Contributor

nurse commented Oct 13, 2015

@mururu @repeatedly Could you use strptime v0.1.4?

@repeatedly
Copy link
Member

@nurse Do you mean we should update gem dependency to >= 0.1.4?

@nurse
Copy link
Contributor

nurse commented Oct 13, 2015

@repeatedly ah, current code, >= 0.1.3 is also ok.

@repeatedly
Copy link
Member

@frsyuki @tagomoris This PR uses msgpack v0.7.0.dev. Do you have a plan to release a stable version of v0.7.0?

@repeatedly
Copy link
Member

@frsyuki @tagomoris ping? This is a blocker for merging this PR.

@tagomoris
Copy link
Member

Wait some more... I'll proceed it as soon as possible from now on.
MessagePack-Ruby v0.7.0 will be released without Timestamp ext type support.

@repeatedly
Copy link
Member

@tagomoris @frsyuki So if release msgpack v0.7.0, we should update this patch with timestamp handling?

@repeatedly
Copy link
Member

Application defined EventTime is needed. So release v0.7.0 seems enough for merging the patch.

@repeatedly
Copy link
Member

@frsyuki @tagomoris When do you release stable version of msgpack-ruby v0.7.0?

@tagomoris
Copy link
Member

Right after msgpack/msgpack-ruby#91 is merged.

@tagomoris
Copy link
Member

msgpack v0.7.0 is now available.

@repeatedly
Copy link
Member

@mururu Could you update gem dependency?

@mururu
Copy link
Member Author

mururu commented Oct 24, 2015

@repeatedly Done

@repeatedly
Copy link
Member

@mururu Thanks!

@frsyuki @tagomoris @sonots Do you have any concerns? If not, I will merge this PR today.

@tagomoris
Copy link
Member

👍

repeatedly added a commit that referenced this pull request Oct 26, 2015
@repeatedly repeatedly merged commit 29b0d94 into fluent:master Oct 26, 2015
@repeatedly
Copy link
Member

Merged! Thanks to @mururu and @nurse for working this feature!

@mururu
Copy link
Member Author

mururu commented Oct 26, 2015

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants