Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V3/improves parser performance #412

Merged
merged 7 commits into from
Sep 12, 2022
Merged

Conversation

jcchavezs
Copy link
Member

@jcchavezs jcchavezs commented Sep 8, 2022

This PR attempts to improve the parser performance.

Before:

➜  coraza git:(v3/improves_parser_performance) go test -benchmem -run=\^$ -bench \^BenchmarkParseFromString$ github.com/corazawaf/coraza/v3/seclang -benchtime=10s
goos: darwin
goarch: arm64
pkg: github.com/corazawaf/coraza/v3/seclang
BenchmarkParseFromString-8   	  894100	     11872 ns/op	   12525 B/op	     240 allocs/op
PASS
ok  	github.com/corazawaf/coraza/v3/seclang	11.088s

After:

➜  coraza git:(v3/improves_parser_performance) go test -benchmem -run=\^$ -bench \^BenchmarkParseFromString$ github.com/corazawaf/coraza/v3/seclang -benchtime=10s
goos: darwin
goarch: arm64
pkg: github.com/corazawaf/coraza/v3/seclang
BenchmarkParseFromString-8   	 1215052	      9866 ns/op	   10072 B/op	     216 allocs/op
PASS
ok  	github.com/corazawaf/coraza/v3/seclang	22.258s

Thank you for contributing to Coraza WAF, your effort is greatly appreciated
Before submitting check if what you want to add to coraza list meets quality standards before sending pull request. Thanks!

Make sure that you've checked the boxes below before you submit PR:

Thanks for your contribution ❤️

@jcchavezs jcchavezs requested review from anuraaga and jptosso and removed request for anuraaga September 8, 2022 09:18
@jcchavezs jcchavezs mentioned this pull request Sep 8, 2022
@@ -342,17 +342,15 @@ func ParseRule(options RuleOptions) (*coraza.Rule, error) {
actions := ""

if options.WithOperator {
matches := ruleTokenRegex.FindAllString(options.Data, -1)
matches := ruleTokenRegex.FindAllString(options.Data, 3) // we use at most second match
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do confirm @jptosso @fzipi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 1 matches the whole thing, plus the first operator block, then actions block.

if !pattern.MatchString(line) && !inQuotes {
err := p.evaluate(linebuffer)
if line[lineLen-1] == '\\' {
linebuffer.WriteString(strings.TrimSuffix(line, "\\"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use slice instead of TrimSuffix since we already know the index

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was inclined to do that but I wondered if something like

SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|REQUEST_COOKIES_NAMES|ARGS_NAMES|ARGS|XML:/* "@rx [\r\n]\W*?(?:content-(?:type|length)|set-cookie|location):\s*\w" \
    "id:921120,\\\
    phase:2,\
    block,\
    capture,\

is possible where you have more than one \ at the end of the line cc @fzipi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good example. I think if it's supported, having spaces inside is also supposed to be supported, e.g. \\ \ \

I don't think either the old or new code handle this.

Is it possible to define a custom split function for the bufio.Scanner that treats newlines and \ the same, removing the need to handle it here?

https://pkg.go.dev/bufio#example-Scanner-Custom

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am up to work this out if people can provide examples to build the test cases @fzipi @M4tteoP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my idea about customizing the scanner doesn't work for \ I think - we want it to combine on that, not separate 😣

continue
}

if !inQuotes && line[lineLen-1] == '`' {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still hoping for an example on this one :)

#398 (comment)

Copy link
Member

@jptosso jptosso Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was implemented for SecDataset:

  • !inQuote ensures we are not inside an action list
  • When we are reading the directive, and we declare a list using "`", the value of the whole line will be "`"
  • We can only close opened "`" with a single "`"

It's an ugly code, please if you have a better idea go ahead

Copy link
Contributor

@anuraaga anuraaga Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for the info! IIUC, then we could replace to just be line == "`" - that would clear up the confusion I had nicely

Copy link
Member Author

@jcchavezs jcchavezs Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I think then we should call it inBackticks instead of inQuotes and then, what we are trying to support is something like:

SecDataset test `
  123
  456
`

So the first condition matches the last backtick in SecDataset test `

And as for the last we can simply match it with line == "`" as rag suggested.

I wonder if there is a case (cc @fzipi @M4tteoP @piyushroshan) where a backtick is at the beginning or at the end and it is not inside a SectDataset or a similar construct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where the len-1 comes from now. I think a \ or a # may be able to break that assumption though. Let's add test cases for these both on first quote and last.

But ok with filing an issue and handling in a separate PR since it's not related to performance which this PR is handling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the backtick is a blackhole in that sense, whatever you add inside despite a keyword somewhere else (e.g. # for comments) lost its ability inside backticks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking about SecDataset, comments (#) are evaluated and stripped later on, demanding it to directiveSecDataset and not to the initial parser.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was thinking mostly about trailing comments, especially of the ending quote. But actually I guess might not handle them at all right now. Basically a line should be

trim(line)[0:LastIndexByte('#')]

type of thing

@codecov-commenter
Copy link

codecov-commenter commented Sep 12, 2022

Codecov Report

Base: 77.00% // Head: 76.95% // Decreases project coverage by -0.04% ⚠️

Coverage data is based on head (42243da) compared to base (53ed3d0).
Patch coverage: 95.65% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           v3/dev     #412      +/-   ##
==========================================
- Coverage   77.00%   76.95%   -0.05%     
==========================================
  Files         136      136              
  Lines        5975     5976       +1     
==========================================
- Hits         4601     4599       -2     
- Misses       1106     1108       +2     
- Partials      268      269       +1     
Flag Coverage Δ
no-tinygo 77.08% <95.65%> (-0.05%) ⬇️
overall 76.95% <95.65%> (-0.05%) ⬇️
tinygo 75.37% <95.65%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
seclang/parser.go 89.91% <95.00%> (-2.40%) ⬇️
seclang/rule_parser.go 82.35% <100.00%> (-0.05%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jptosso jptosso merged commit 91a2918 into v3/dev Sep 12, 2022
@jptosso jptosso deleted the v3/improves_parser_performance branch September 12, 2022 19:09
@M4tteoP M4tteoP mentioned this pull request Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants