V3/improves parser performance #412

jcchavezs · 2022-09-08T08:10:17Z

This PR attempts to improve the parser performance.

Before:

➜  coraza git:(v3/improves_parser_performance) go test -benchmem -run=\^$ -bench \^BenchmarkParseFromString$ github.com/corazawaf/coraza/v3/seclang -benchtime=10s
goos: darwin
goarch: arm64
pkg: github.com/corazawaf/coraza/v3/seclang
BenchmarkParseFromString-8   	  894100	     11872 ns/op	   12525 B/op	     240 allocs/op
PASS
ok  	github.com/corazawaf/coraza/v3/seclang	11.088s

After:

➜  coraza git:(v3/improves_parser_performance) go test -benchmem -run=\^$ -bench \^BenchmarkParseFromString$ github.com/corazawaf/coraza/v3/seclang -benchtime=10s
goos: darwin
goarch: arm64
pkg: github.com/corazawaf/coraza/v3/seclang
BenchmarkParseFromString-8   	 1215052	      9866 ns/op	   10072 B/op	     216 allocs/op
PASS
ok  	github.com/corazawaf/coraza/v3/seclang	22.258s

Thank you for contributing to Coraza WAF, your effort is greatly appreciated
Before submitting check if what you want to add to coraza list meets quality standards before sending pull request. Thanks!

Make sure that you've checked the boxes below before you submit PR:

My code includes positive and negative tests.
I have an appropriate description with correct grammar.
I have read Contribution guidelines, maintainers note and Quality standard.
My code is properly linted and passes pre-commit tests.

Thanks for your contribution ❤️

jcchavezs · 2022-09-08T12:56:58Z

seclang/rule_parser.go

@@ -342,17 +342,15 @@ func ParseRule(options RuleOptions) (*coraza.Rule, error) {
 	actions := ""

 	if options.WithOperator {
-		matches := ruleTokenRegex.FindAllString(options.Data, -1)
+		matches := ruleTokenRegex.FindAllString(options.Data, 3) // we use at most second match


Please do confirm @jptosso @fzipi

Yes, 1 matches the whole thing, plus the first operator block, then actions block.

anuraaga · 2022-09-09T01:00:12Z

seclang/parser.go

-		if !pattern.MatchString(line) && !inQuotes {
-			err := p.evaluate(linebuffer)
+		if line[lineLen-1] == '\\' {
+			linebuffer.WriteString(strings.TrimSuffix(line, "\\"))


Use slice instead of TrimSuffix since we already know the index

I was inclined to do that but I wondered if something like

SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|REQUEST_COOKIES_NAMES|ARGS_NAMES|ARGS|XML:/* "@rx [\r\n]\W*?(?:content-(?:type|length)|set-cookie|location):\s*\w" \ "id:921120,\\\ phase:2,\ block,\ capture,\

is possible where you have more than one \ at the end of the line cc @fzipi

Ah good example. I think if it's supported, having spaces inside is also supposed to be supported, e.g. \\ \ \

I don't think either the old or new code handle this.

Is it possible to define a custom split function for the bufio.Scanner that treats newlines and \ the same, removing the need to handle it here?

https://pkg.go.dev/bufio#example-Scanner-Custom

I am up to work this out if people can provide examples to build the test cases @fzipi @M4tteoP

Sorry my idea about customizing the scanner doesn't work for \ I think - we want it to combine on that, not separate 😣

anuraaga · 2022-09-09T01:00:49Z

seclang/parser.go

+			continue
+		}
+
+		if !inQuotes && line[lineLen-1] == '`' {


Still hoping for an example on this one :)

#398 (comment)

It was implemented for SecDataset:

!inQuote ensures we are not inside an action list

When we are reading the directive, and we declare a list using "`", the value of the whole line will be "`"

We can only close opened "`" with a single "`"

It's an ugly code, please if you have a better idea go ahead

Ah thanks for the info! IIUC, then we could replace to just be line == "`" - that would clear up the confusion I had nicely

Ok so I think then we should call it inBackticks instead of inQuotes and then, what we are trying to support is something like:

SecDataset test ` 123 456 `

So the first condition matches the last backtick in SecDataset test `

And as for the last we can simply match it with line == "`" as rag suggested.

I wonder if there is a case (cc @fzipi @M4tteoP @piyushroshan) where a backtick is at the beginning or at the end and it is not inside a SectDataset or a similar construct.

I see where the len-1 comes from now. I think a \ or a # may be able to break that assumption though. Let's add test cases for these both on first quote and last.

But ok with filing an issue and handling in a separate PR since it's not related to performance which this PR is handling.

I think the backtick is a blackhole in that sense, whatever you add inside despite a keyword somewhere else (e.g. # for comments) lost its ability inside backticks.

Speaking about SecDataset, comments (#) are evaluated and stripped later on, demanding it to directiveSecDataset and not to the initial parser.

Yeah I was thinking mostly about trailing comments, especially of the ending quote. But actually I guess might not handle them at all right now. Basically a line should be

trim(line)[0:LastIndexByte('#')]

type of thing

codecov-commenter · 2022-09-12T16:42:29Z

Codecov Report

Base: 77.00% // Head: 76.95% // Decreases project coverage by -0.04% ⚠️

Coverage data is based on head (42243da) compared to base (53ed3d0).
Patch coverage: 95.65% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           v3/dev     #412      +/-   ##
==========================================
- Coverage   77.00%   76.95%   -0.05%     
==========================================
  Files         136      136              
  Lines        5975     5976       +1     
==========================================
- Hits         4601     4599       -2     
- Misses       1106     1108       +2     
- Partials      268      269       +1

Flag	Coverage Δ
no-tinygo	`77.08% <95.65%> (-0.05%)`	⬇️
overall	`76.95% <95.65%> (-0.05%)`	⬇️
tinygo	`75.37% <95.65%> (-0.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
seclang/parser.go	`89.91% <95.00%> (-2.40%)`	⬇️
seclang/rule_parser.go	`82.35% <100.00%> (-0.05%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

jcchavezs added 3 commits September 8, 2022 10:07

chore: adds parser benchmark.

f358e81

chore: adds parser improvements.

2156019

fixes logic for comments.

717e8c0

jcchavezs requested review from anuraaga and jptosso and removed request for anuraaga September 8, 2022 09:18

jcchavezs mentioned this pull request Sep 8, 2022

V3/parser refactor #398

Closed

chore: improves allocations for rule parser.

525e9b3

jcchavezs commented Sep 8, 2022

View reviewed changes

anuraaga reviewed Sep 9, 2022

View reviewed changes

jcchavezs and others added 2 commits September 9, 2022 11:08

Merge branch 'v3/dev' into v3/improves_parser_performance

ba4e4e4

Merge branch 'v3/dev' into v3/improves_parser_performance

42243da

jptosso approved these changes Sep 12, 2022

View reviewed changes

Merge branch 'v3/dev' into v3/improves_parser_performance

934e655

jptosso merged commit 91a2918 into v3/dev Sep 12, 2022

jptosso deleted the v3/improves_parser_performance branch September 12, 2022 19:09

M4tteoP mentioned this pull request Sep 23, 2022

Fix parser backticks #433

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V3/improves parser performance #412

V3/improves parser performance #412

jcchavezs commented Sep 8, 2022 •

edited

Loading

jcchavezs Sep 8, 2022

jptosso Sep 9, 2022

anuraaga Sep 9, 2022

jcchavezs Sep 9, 2022

anuraaga Sep 9, 2022

jcchavezs Sep 9, 2022

anuraaga Sep 9, 2022

anuraaga Sep 9, 2022

jptosso Sep 9, 2022 •

edited

Loading

anuraaga Sep 9, 2022 •

edited

Loading

jcchavezs Sep 9, 2022 •

edited

Loading

anuraaga Sep 9, 2022

jcchavezs Sep 9, 2022

M4tteoP Sep 9, 2022

anuraaga Sep 9, 2022

codecov-commenter commented Sep 12, 2022 •

edited

Loading

V3/improves parser performance #412

V3/improves parser performance #412

Conversation

jcchavezs commented Sep 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jptosso Sep 9, 2022 • edited Loading

Choose a reason for hiding this comment

anuraaga Sep 9, 2022 • edited Loading

Choose a reason for hiding this comment

jcchavezs Sep 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Sep 12, 2022 • edited Loading

Codecov Report

jcchavezs commented Sep 8, 2022 •

edited

Loading

jptosso Sep 9, 2022 •

edited

Loading

anuraaga Sep 9, 2022 •

edited

Loading

jcchavezs Sep 9, 2022 •

edited

Loading

codecov-commenter commented Sep 12, 2022 •

edited

Loading