Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add null_value_pattern and null_empty_string option for LabeledTSVParser, CSVParser, TSVParser. #657

Merged
merged 7 commits into from
Aug 27, 2015
19 changes: 18 additions & 1 deletion lib/fluent/parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,8 @@ class ValuesParser < Parser
end
config_param :time_key, :string, :default => nil
config_param :time_format, :string, :default => nil
config_param :null_value_pattern, :string, :default => nil
config_param :null_empty_string, :bool, :default => false

def configure(conf)
super
Expand All @@ -308,11 +310,16 @@ def configure(conf)
end

@time_parser = TimeParser.new(@time_format)

if @null_value_pattern
@null_value_pattern = /#{@null_value_pattern}/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regexp.new is better because users do not need to care of escaping / with it.

end

@mutex = Mutex.new
end

def values_map(values)
record = Hash[keys.zip(values)]
record = Hash[keys.zip(values.map { |value| convert_value_to_nil(value) })]

if @time_key
value = @keep_time_key ? record[@time_key] : record.delete(@time_key)
Expand Down Expand Up @@ -345,6 +352,16 @@ def convert_field_type!(record)
end
}
end

def convert_value_to_nil(value)
if @null_empty_string
value = (value == '') ? nil : value
end
if @null_value_pattern
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if value and @null_value_pattern is more clear to tell returning nil when nil =~ @null_value_pattern

value = (value =~ @null_value_pattern) ? nil : value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually does String#scrub in my plugin as https://github.com/fluent/fluentd/blob/master/lib/fluent/plugin/filter_grep.rb#L73-L83.
I am not sure for the case of fluentd itself. What do you think? > @repeatedly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry.

I think this private method add ValuesParser, Is it OK?

      def match(regexp, string)
        begin
          return regexp.match(string)
        rescue ArgumentError => e
          raise e unless e.message.index("invalid byte sequence in".freeze).zero?
          log.info "invalid byte sequence is replaced in `#{string}`"
          string = string.scrub('?')
          retry
        end
        return true
      end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating Fluent::StringUtil should be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK!
thanks.

end
value
end
end

class TSVParser < ValuesParser
Expand Down
90 changes: 90 additions & 0 deletions test/test_parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -529,6 +529,38 @@ def test_parse_with_keep_time_key
assert_equal text, record['time']
end
end

data('array param' => '["a","b","c","d","e","f"]', 'string param' => 'a,b,c,d,e,f')
def test_parse_with_null_value_pattern
parser = TextParser::TSVParser.new
parser.configure(
'keys'=>param,
'time_key'=>'time',
'null_value_pattern'=>'^(-|null|NULL)$'
)
parser.parse("-\tnull\tNULL\t\t--\tnuLL") do |time, record|
assert_nil record['a']
assert_nil record['b']
assert_nil record['c']
assert_equal record['d'], ''
assert_equal record['e'], '--'
assert_equal record['f'], 'nuLL'
end
end

data('array param' => '["a","b"]', 'string param' => 'a,b')
def test_parse_with_null_empty_string
parser = TextParser::TSVParser.new
parser.configure(
'keys'=>param,
'time_key'=>'time',
'null_empty_string'=>true
)
parser.parse("\t ") do |time, record|
assert_nil record['a']
assert_equal record['b'], ' '
end
end
end

class CSVParserTest < ::Test::Unit::TestCase
Expand Down Expand Up @@ -586,6 +618,38 @@ def test_parse_with_keep_time_key
assert_equal text, record['time']
end
end

data('array param' => '["a","b","c","d","e","f"]', 'string param' => 'a,b,c,d,e,f')
def test_parse_with_null_value_pattern
parser = TextParser::CSVParser.new
parser.configure(
'keys'=>param,
'time_key'=>'time',
'null_value_pattern'=>'^(-|null|NULL)$'
)
parser.parse("-,null,NULL,,--,nuLL") do |time, record|
assert_nil record['a']
assert_nil record['b']
assert_nil record['c']
assert_equal record['d'], ''
assert_equal record['e'], '--'
assert_equal record['f'], 'nuLL'
end
end

data('array param' => '["a","b"]', 'string param' => 'a,b')
def test_parse_with_null_empty_string
parser = TextParser::CSVParser.new
parser.configure(
'keys'=>param,
'time_key'=>'time',
'null_empty_string'=>true
)
parser.parse(", ") do |time, record|
assert_nil record['a']
assert_equal record['b'], ' '
end
end
end

class LabeledTSVParserTest < ::Test::Unit::TestCase
Expand Down Expand Up @@ -684,6 +748,32 @@ def test_parse_with_keep_time_key
assert_equal text, record['time']
end
end

def test_parse_with_null_value_pattern
parser = TextParser::LabeledTSVParser.new
parser.configure(
'null_value_pattern'=>'^(-|null|NULL)$'
)
parser.parse("a:-\tb:null\tc:NULL\td:\te:--\tf:nuLL") do |time, record|
assert_nil record['a']
assert_nil record['b']
assert_nil record['c']
assert_equal record['d'], ''
assert_equal record['e'], '--'
assert_equal record['f'], 'nuLL'
end
end

def test_parse_with_null_empty_string
parser = TextParser::LabeledTSVParser.new
parser.configure(
'null_empty_string'=>true
)
parser.parse("a:\tb: ") do |time, record|
assert_nil record['a']
assert_equal record['b'], ' '
end
end
end

class NoneParserTest < ::Test::Unit::TestCase
Expand Down