-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: added parser_options for more control over XML parsing #68
Changes from 2 commits
1aedffb
12151c6
0af5f5a
998048a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -34,6 +34,7 @@ This plugin supports the following configuration options plus the <<plugins-{typ | |||||
| <<plugins-{type}s-{plugin}-force_array>> |<<boolean,boolean>>|No | ||||||
| <<plugins-{type}s-{plugin}-force_content>> |<<boolean,boolean>>|No | ||||||
| <<plugins-{type}s-{plugin}-namespaces>> |<<hash,hash>>|No | ||||||
| <<plugins-{type}s-{plugin}-parser_options>> |<<string,string>>|No | ||||||
| <<plugins-{type}s-{plugin}-remove_namespaces>> |<<boolean,boolean>>|No | ||||||
| <<plugins-{type}s-{plugin}-source>> |<<string,string>>|Yes | ||||||
| <<plugins-{type}s-{plugin}-store_xml>> |<<boolean,boolean>>|No | ||||||
|
@@ -87,6 +88,19 @@ filter { | |||||
} | ||||||
} | ||||||
|
||||||
[id="plugins-{type}s-{plugin}-parser_options"] | ||||||
===== `parser_options` | ||||||
|
||||||
* Value type is <<string,string>> | ||||||
|
||||||
Setting XML parser options allows for more control of the parsing process. | ||||||
By default the parser is non strict and thus accepts some invalid content. | ||||||
Multiple options are separated by a comma (e.g. `'strict,no_warning'`), | ||||||
currently supported options are: | ||||||
|
||||||
- _strict_ - forces the parser to fail early when content is not valid xml | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @karenzone could you please check these for me. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Maybe include the alternative. Something like "forces the parser to fail early instead of accumulating errors when content is not valid xml." |
||||||
- _no_warning_ - allows to parse content when there are only warnings | ||||||
- _no_error_ - allows to parse content on non fatal parser errors | ||||||
|
||||||
[id="plugins-{type}s-{plugin}-remove_namespaces"] | ||||||
===== `remove_namespaces` | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -58,6 +58,13 @@ class LogStash::Filters::Xml < LogStash::Filters::Base | |
# | ||
config :xpath, :validate => :hash, :default => {} | ||
|
||
# Supported XML parsing options are 'strict', 'no_error' and 'no_warning'. | ||
# - strict mode turns on strict parsing rules (non-compliant xml will fail) | ||
# - no_error and no_warning can be used to suppress errors/warnings | ||
config :parse_options, :validate => :string | ||
# NOTE: technically we support more but we purposefully do not document those. | ||
# e.g. setting "strict|recover" will not turn on strict as they're conflicting | ||
|
||
# By default the filter will store the whole parsed XML in the destination | ||
# field as described above. Setting this to false will prevent that. | ||
config :store_xml, :validate => :boolean, :default => true | ||
|
@@ -110,6 +117,7 @@ def register | |
:error => "When the 'store_xml' configuration option is true, 'target' must also be set" | ||
) | ||
end | ||
xml_parse_options # validates parse_options => ... | ||
end | ||
|
||
def filter(event) | ||
|
@@ -141,11 +149,13 @@ def filter(event) | |
|
||
if @xpath | ||
begin | ||
doc = Nokogiri::XML(value, nil, value.encoding.to_s) | ||
doc = Nokogiri::XML::Document.parse(value, nil, value.encoding.to_s, xml_parse_options) | ||
rescue => e | ||
event.tag(XMLPARSEFAILURE_TAG) | ||
@logger.warn("Error parsing xml", :source => @source, :value => value, :exception => e, :backtrace => e.backtrace) | ||
return | ||
else | ||
doc.errors.any? && @logger.debug? && @logger.debug("Parsed xml with #{doc.errors.size} errors") | ||
end | ||
doc.remove_namespaces! if @remove_namespaces | ||
|
||
|
@@ -194,4 +204,26 @@ def filter(event) | |
filter_matched(event) if matched | ||
@logger.debug? && @logger.debug("Event after xml filter", :event => event) | ||
end | ||
|
||
private | ||
|
||
def xml_parse_options | ||
return Nokogiri::XML::ParseOptions::DEFAULT_XML unless @parse_options # (RECOVER | NONET) | ||
@xml_parse_options ||= begin | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Only a question @kares, today in the docs we say that supported options are:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. here's the reasoning: initially I thought about only having so there's more parser options we can set, I purposefully only documented these few. being able to (unoficially) set all of the internal ones is the reason why I decided not to validate against a fixed set of options (that would be enumerated here). going to remove no_error and no_warning from the docs and only document strict for now. does that sound okay or do you have concerns? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's ok, go with the |
||
parse_options = @parse_options.split(/,|\|/).map do |opt| | ||
name = opt.strip.tr('_', '').upcase | ||
if name.empty? | ||
nil | ||
else | ||
begin | ||
Nokogiri::XML::ParseOptions.const_get(name) | ||
rescue NameError | ||
raise LogStash::ConfigurationError, "unsupported parse option: #{opt.inspect}" | ||
end | ||
end | ||
end | ||
parse_options.compact.inject(0, :|) # e.g. NOERROR | NOWARNING | ||
end | ||
end | ||
|
||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -418,4 +418,53 @@ | |
end | ||
end | ||
end | ||
|
||
describe "parsing invalid xml" do | ||
subject { described_class.new(options) } | ||
let(:options) { ({ 'source' => 'xmldata', 'store_xml' => false }) } | ||
let(:xmldata) { "<xml> <sample attr='foo' attr=\"bar\"> <invalid> </sample> </xml>" } | ||
let(:event) { LogStash::Event.new(data) } | ||
let(:data) { { "xmldata" => xmldata } } | ||
|
||
before { subject.register } | ||
after { subject.close } | ||
|
||
it 'does not fail (by default)' do | ||
subject.filter(event) | ||
expect( event.get("tags") ).to be nil | ||
end | ||
|
||
context 'strict option' do | ||
let(:options) { super.merge({ 'parse_options' => 'strict' }) } | ||
|
||
it 'does fail parsing' do | ||
subject.filter(event) | ||
expect( event.get("tags") ).to_not be nil | ||
expect( event.get("tags") ).to include '_xmlparsefailure' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we document this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. according to the docs the tag isn't documented. |
||
end | ||
end | ||
end | ||
|
||
describe "parse_options" do | ||
subject { described_class.new(options) } | ||
let(:options) { ({ 'source' => 'xmldata', 'store_xml' => false, 'parse_options' => parse_options }) } | ||
|
||
context 'valid' do | ||
let(:parse_options) { 'no_error,NOWARNING' } | ||
|
||
it 'registers filter' do | ||
subject.register | ||
expect( subject.send(:xml_parse_options) ). | ||
to eql Nokogiri::XML::ParseOptions::NOERROR | Nokogiri::XML::ParseOptions::NOWARNING | ||
end | ||
end | ||
|
||
context 'invalid' do | ||
let(:parse_options) { 'strict,invalid0' } | ||
|
||
it 'fails to register' do | ||
expect { subject.register }.to raise_error(LogStash::ConfigurationError, 'unsupported parse option: "invalid0"') | ||
end | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding default value here as a bullet, since it sounds like
strict
is not enabled by default. I'm not sure how to notate that. Would something like* Default value is []
work?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, did the same as with rest ... to mention there's no default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We documented
parser_options
, but the actual parameter isparse_options
in the code. I can submit a PR to fix the doc but is it ok to keepparse_options
?