Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add null_value_pattern and null_empty_string option for LabeledTSVParser, CSVParser, TSVParser. #657

Merged
merged 7 commits into from
Aug 27, 2015

Conversation

toyama0919
Copy link
Contributor

Hello.

Cannot express NULL in LTSV.

ltsv library for Ruby blank to NULL.
https://github.com/condor/ltsv/blob/master/lib/ltsv.rb#L97

I use LTSV with apache.
Apache log is If value not exists with hyphen('-').

I think hyphens to null, any of the following methods.

  1. LabeledTSVParser add parameter with @null_value
    ※this PR

settings example.

<source>
  type tail
  path /var/log/httpd/access_log
  format ltsv
  time_key time
  null_value -
  time_format %d/%b/%Y:%H:%M:%S %z
  tag td.apache.access
  pos_file /var/log/td-agent/apache_access.pos
</source>
  1. specifications similar to the Ruby library(blank to null)
  2. Make your own filter plugin.

Which one is good?

@frsyuki
Copy link
Member

frsyuki commented Aug 17, 2015

I think 1. is good and it is good enough for now.

I may want to use using 1. and 2. together occasionally. So, alternative idea is to add null_value_pattern: pattern option instead of null_value to accept a regexp, or add null_empty_string: true option in addition to null_value.

@frsyuki
Copy link
Member

frsyuki commented Aug 17, 2015

I think this PR is simple enough and good to merge. Any concerns, @repeatedly?

@toyama0919
Copy link
Contributor Author

@frsyuki
Thank you for your reply!

I may want to use using 1. and 2. together occasionally. So, alternative idea is to add null_value_pattern: pattern option instead of null_value to accept a regexp, or add null_empty_string: true option in addition to null_value.

I see.

Please wait a moment...

@toyama0919 toyama0919 changed the title Add null_value option for LabeledTSVParser. Add null_value_pattern and null_empty_string option for LabeledTSVParser. Aug 17, 2015
@toyama0919
Copy link
Contributor Author

Thank you for advice!

I add null_value_pattern and null_empty_string option for LabeledTSVParser.

settings example.

<source>
  type tail
  path /var/log/httpd/access_log
  format ltsv
  time_key time
  null_value_pattern ^(-|null|NULL)$
  null_empty_string true
  time_format %d/%b/%Y:%H:%M:%S %z
  tag td.apache.access
  pos_file /var/log/td-agent/apache_access.pos
</source>

null_value_pattern

  • default is nil.
  • regular expression syntax.

null_empty_string

  • default is false.
  • empty string to null.
  • this is same behavior.

value = (value == '') ? nil : value
end
if null_value_pattern
value = (value =~ /#{null_value_pattern}/) ? nil : value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating regexp object at each record is high cost.
It should be done in configure method.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!
I'll fix it.

@sonots
Copy link
Member

sonots commented Aug 18, 2015

Why don't we add these options for super class (ValuesParser)?

@ghost
Copy link

ghost commented Aug 18, 2015

@sonots
thank you!
Oh, ValuesParser definition, affects CSVParser and TSVParser.
Null in regular expression definition, I think Good idea!
But empty to NULL casting is LTSV specification.

@repeatedly
I think no problem, Is it OK?

@ghost
Copy link

ghost commented Aug 18, 2015

Certainly, in the format like below with MySQL tsv restore

"1"¥t"2"¥t"3"¥t"4"¥tNULL

@sonots
Copy link
Member

sonots commented Aug 18, 2015

empty to NULL casting is LTSV specification.

Is it official LTSV specification? I thought it is just a custom.

@toyama0919
Copy link
Contributor Author

Is it official LTSV specification? I thought it is just a custom.

That's right!
No such process in other languages.

java

https://github.com/making/ltsv4j/blob/master/src/main/java/am/ik/ltsv4j/LTSVParser.java

python

https://github.com/hekyou/python-ltsv/blob/master/ltsv/_reader.py

@toyama0919
Copy link
Contributor Author

Why don't we add these options for super class (ValuesParser)?

I'll fix in this way.
Please wait a moment.

@sonots @repeatedly @frsyuki

move this option LabeledTSVParser to ValuesParser.
@toyama0919 toyama0919 changed the title Add null_value_pattern and null_empty_string option for LabeledTSVParser. Add null_value_pattern and null_empty_string option for ValuesParser. Aug 18, 2015
@toyama0919 toyama0919 changed the title Add null_value_pattern and null_empty_string option for ValuesParser. Add null_value_pattern and null_empty_string option for LabeledTSVParser, CSVParser, TSVParser. Aug 18, 2015
@ghost
Copy link

ghost commented Aug 19, 2015

fixed.
add null_value_pattern and null_empty_string option, support format is tsv,csv,ltsv.

@@ -308,11 +310,16 @@ def configure(conf)
end

@time_parser = TimeParser.new(@time_format)

if @null_value_pattern
@null_value_pattern = /#{@null_value_pattern}/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regexp.new is better because users do not need to care of escaping / with it.

@sonots
Copy link
Member

sonots commented Aug 19, 2015

reviewed

@ghost
Copy link

ghost commented Aug 19, 2015

thank you for review.
I 'll fix it.

@toyama0919
Copy link
Contributor Author

Apply this review.
scrub Referenced filter-grep-plugin.

thanks.

@toyama0919
Copy link
Contributor Author

added Fluent::StringUtil, filter-grep-plugin and this feature common procesing.
thanks.

@sonots
Copy link
Member

sonots commented Aug 19, 2015

Looks good to me. Any concerns? > @repeatedly

@repeatedly
Copy link
Member

Nothing.

sonots added a commit that referenced this pull request Aug 27, 2015
Add null_value_pattern and null_empty_string option for LabeledTSVParser, CSVParser, TSVParser.
@sonots sonots merged commit c529038 into fluent:master Aug 27, 2015
@sonots
Copy link
Member

sonots commented Aug 27, 2015

Merged!

@ghost
Copy link

ghost commented Aug 27, 2015

@repeatedly @sonots @frsyuki thank you for many advise! 😊

@repeatedly
Copy link
Member

Released v0.12.16!

@ghost
Copy link

ghost commented Sep 30, 2015

😊

@tagomoris
Copy link
Member

I found that log is not defined in Fluent::StringUtil.match_regexp. So this rescue block will raise NameError every time.

@toyama0919
Copy link
Contributor Author

@tagomoris Sorry and Thanks. I will check it.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 0ec1c53 on toyama0919:master into ** on fluent:master**.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants