-
-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V3/improves parser performance #412
Conversation
@@ -342,17 +342,15 @@ func ParseRule(options RuleOptions) (*coraza.Rule, error) { | |||
actions := "" | |||
|
|||
if options.WithOperator { | |||
matches := ruleTokenRegex.FindAllString(options.Data, -1) | |||
matches := ruleTokenRegex.FindAllString(options.Data, 3) // we use at most second match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, 1 matches the whole thing, plus the first operator block, then actions block.
if !pattern.MatchString(line) && !inQuotes { | ||
err := p.evaluate(linebuffer) | ||
if line[lineLen-1] == '\\' { | ||
linebuffer.WriteString(strings.TrimSuffix(line, "\\")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use slice instead of TrimSuffix
since we already know the index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was inclined to do that but I wondered if something like
SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|REQUEST_COOKIES_NAMES|ARGS_NAMES|ARGS|XML:/* "@rx [\r\n]\W*?(?:content-(?:type|length)|set-cookie|location):\s*\w" \
"id:921120,\\\
phase:2,\
block,\
capture,\
is possible where you have more than one \
at the end of the line cc @fzipi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good example. I think if it's supported, having spaces inside is also supposed to be supported, e.g. \\ \ \
I don't think either the old or new code handle this.
Is it possible to define a custom split function for the bufio.Scanner that treats newlines and \
the same, removing the need to handle it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry my idea about customizing the scanner doesn't work for \
I think - we want it to combine on that, not separate 😣
continue | ||
} | ||
|
||
if !inQuotes && line[lineLen-1] == '`' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still hoping for an example on this one :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was implemented for SecDataset:
- !inQuote ensures we are not inside an action list
- When we are reading the directive, and we declare a list using "`", the value of the whole line will be "`"
- We can only close opened "`" with a single "`"
It's an ugly code, please if you have a better idea go ahead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks for the info! IIUC, then we could replace to just be line == "`"
- that would clear up the confusion I had nicely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so I think then we should call it inBackticks
instead of inQuotes
and then, what we are trying to support is something like:
SecDataset test `
123
456
`
So the first condition matches the last backtick in SecDataset test `
And as for the last we can simply match it with line == "`"
as rag suggested.
I wonder if there is a case (cc @fzipi @M4tteoP @piyushroshan) where a backtick is at the beginning or at the end and it is not inside a SectDataset
or a similar construct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see where the len-1 comes from now. I think a \
or a #
may be able to break that assumption though. Let's add test cases for these both on first quote and last.
But ok with filing an issue and handling in a separate PR since it's not related to performance which this PR is handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the backtick is a blackhole in that sense, whatever you add inside despite a keyword somewhere else (e.g. #
for comments) lost its ability inside backticks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking about SecDataset
, comments (#
) are evaluated and stripped later on, demanding it to directiveSecDataset and not to the initial parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was thinking mostly about trailing comments, especially of the ending quote. But actually I guess might not handle them at all right now. Basically a line should be
trim(line)[0:LastIndexByte('#')]
type of thing
Codecov ReportBase: 77.00% // Head: 76.95% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## v3/dev #412 +/- ##
==========================================
- Coverage 77.00% 76.95% -0.05%
==========================================
Files 136 136
Lines 5975 5976 +1
==========================================
- Hits 4601 4599 -2
- Misses 1106 1108 +2
- Partials 268 269 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
This PR attempts to improve the parser performance.
Before:
After:
Make sure that you've checked the boxes below before you submit PR:
Thanks for your contribution ❤️