Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V3/improves parser performance #412

Merged
merged 7 commits into from
Sep 12, 2022
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 25 additions & 17 deletions seclang/parser.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ import (
"io/fs"
"os"
"path/filepath"
"regexp"
"strings"

"github.com/corazawaf/coraza/v3"
Expand Down Expand Up @@ -77,40 +76,50 @@ func (p *Parser) FromFile(profilePath string) error {
// or arguments are invalid
func (p *Parser) FromString(data string) error {
scanner := bufio.NewScanner(strings.NewReader(data))
var linebuffer = ""
pattern := regexp.MustCompile(`\\(\s+)?$`)
var linebuffer strings.Builder
inQuotes := false
for scanner.Scan() {
p.currentLine++
line := strings.TrimSpace(scanner.Text())
if !inQuotes && len(line) > 0 && line[len(line)-1] == '`' {
lineLen := len(line)
if lineLen == 0 {
continue
}

if !inQuotes && line[lineLen-1] == '`' {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still hoping for an example on this one :)

#398 (comment)

Copy link
Member

@jptosso jptosso Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was implemented for SecDataset:

  • !inQuote ensures we are not inside an action list
  • When we are reading the directive, and we declare a list using "`", the value of the whole line will be "`"
  • We can only close opened "`" with a single "`"

It's an ugly code, please if you have a better idea go ahead

Copy link
Contributor

@anuraaga anuraaga Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for the info! IIUC, then we could replace to just be line == "`" - that would clear up the confusion I had nicely

Copy link
Member Author

@jcchavezs jcchavezs Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I think then we should call it inBackticks instead of inQuotes and then, what we are trying to support is something like:

SecDataset test `
  123
  456
`

So the first condition matches the last backtick in SecDataset test `

And as for the last we can simply match it with line == "`" as rag suggested.

I wonder if there is a case (cc @fzipi @M4tteoP @piyushroshan) where a backtick is at the beginning or at the end and it is not inside a SectDataset or a similar construct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where the len-1 comes from now. I think a \ or a # may be able to break that assumption though. Let's add test cases for these both on first quote and last.

But ok with filing an issue and handling in a separate PR since it's not related to performance which this PR is handling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the backtick is a blackhole in that sense, whatever you add inside despite a keyword somewhere else (e.g. # for comments) lost its ability inside backticks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking about SecDataset, comments (#) are evaluated and stripped later on, demanding it to directiveSecDataset and not to the initial parser.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was thinking mostly about trailing comments, especially of the ending quote. But actually I guess might not handle them at all right now. Basically a line should be

trim(line)[0:LastIndexByte('#')]

type of thing

inQuotes = true
} else if inQuotes && len(line) > 0 && line[0] == '`' {
} else if inQuotes && line[0] == '`' {
inQuotes = false
}

if inQuotes {
linebuffer += line + "\n"
} else {
linebuffer += line
linebuffer.WriteString(line)
linebuffer.WriteString("\n")
continue
}

if line[0] == '#' {
continue
}

// Check if line ends with \
if !pattern.MatchString(line) && !inQuotes {
err := p.evaluate(linebuffer)
if line[lineLen-1] == '\\' {
linebuffer.WriteString(strings.TrimSuffix(line, "\\"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use slice instead of TrimSuffix since we already know the index

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was inclined to do that but I wondered if something like

SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|REQUEST_COOKIES_NAMES|ARGS_NAMES|ARGS|XML:/* "@rx [\r\n]\W*?(?:content-(?:type|length)|set-cookie|location):\s*\w" \
    "id:921120,\\\
    phase:2,\
    block,\
    capture,\

is possible where you have more than one \ at the end of the line cc @fzipi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good example. I think if it's supported, having spaces inside is also supposed to be supported, e.g. \\ \ \

I don't think either the old or new code handle this.

Is it possible to define a custom split function for the bufio.Scanner that treats newlines and \ the same, removing the need to handle it here?

https://pkg.go.dev/bufio#example-Scanner-Custom

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am up to work this out if people can provide examples to build the test cases @fzipi @M4tteoP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my idea about customizing the scanner doesn't work for \ I think - we want it to combine on that, not separate 😣

} else {
linebuffer.WriteString(line)
err := p.evaluateLine(linebuffer.String())
if err != nil {
return err
}
linebuffer = ""
} else if !inQuotes {
linebuffer = strings.TrimSuffix(linebuffer, "\\")
linebuffer.Reset()
}
}
return nil
}

func (p *Parser) evaluate(data string) error {
func (p *Parser) evaluateLine(data string) error {
if data == "" || data[0] == '#' {
return nil
return errors.New("invalid lines")
}
// first we get the directive
spl := strings.SplitN(data, " ", 2)
Expand All @@ -119,12 +128,11 @@ func (p *Parser) evaluate(data string) error {
opts = spl[1]
}
p.options.WAF.Logger.Debug("parsing directive %q", data)
directive := spl[0]
directive := strings.ToLower(spl[0])

if len(opts) >= 3 && opts[0] == '"' && opts[len(opts)-1] == '"' {
opts = strings.Trim(opts, `"`)
}
directive = strings.ToLower(directive)
if directive == "include" {
// this is a special hardcoded case
// we cannot add it as a directive type because there are recursion issues
Expand Down
19 changes: 15 additions & 4 deletions seclang/parser_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,13 @@ import (
"testing"

"github.com/corazawaf/coraza/v3"
engine "github.com/corazawaf/coraza/v3"
)

//go:embed testdata
var testdata embed.FS

func TestInterruption(t *testing.T) {
waf := engine.NewWAF()
waf := coraza.NewWAF()
p := NewParser(waf)
if err := p.FromString(`SecAction "id:1,deny,log,phase:1"`); err != nil {
t.Error("Could not create from string")
Expand All @@ -33,7 +32,7 @@ func TestInterruption(t *testing.T) {
}

func TestDirectivesCaseInsensitive(t *testing.T) {
waf := engine.NewWAF()
waf := coraza.NewWAF()
p := NewParser(waf)
err := p.FromString("seCwEbAppid 15")
if err != nil {
Expand All @@ -42,7 +41,7 @@ func TestDirectivesCaseInsensitive(t *testing.T) {
}

func TestDefaultConfigurationFile(t *testing.T) {
waf := engine.NewWAF()
waf := coraza.NewWAF()
p := NewParser(waf)
err := p.FromFile("../coraza.conf-recommended")
if err != nil {
Expand Down Expand Up @@ -170,3 +169,15 @@ func TestEmbedFS(t *testing.T) {
t.Error("Expected 4 rules loaded using include directive. Found: ", waf.Rules.Count())
}
}

//go:embed testdata/parserbenchmark.conf
var parsingRule string

func BenchmarkParseFromString(b *testing.B) {
waf := coraza.NewWAF()
parser := NewParser(waf)
b.ResetTimer()
for i := 0; i < b.N; i++ {
_ = parser.FromString(parsingRule)
}
}
8 changes: 3 additions & 5 deletions seclang/rule_parser.go
Original file line number Diff line number Diff line change
Expand Up @@ -317,7 +317,7 @@ var ruleTokenRegex = regexp.MustCompile(`"(?:[^"\\]|\\.)*"`)
// In case WithOperator is false, the rule will be parsed without operator
// This function is created for external plugin directives
func ParseRule(options RuleOptions) (*coraza.Rule, error) {
if strings.Trim(options.Data, " ") == "" {
if strings.TrimSpace(options.Data) == "" {
return nil, errors.New("empty rule")
}

Expand All @@ -342,17 +342,15 @@ func ParseRule(options RuleOptions) (*coraza.Rule, error) {
actions := ""

if options.WithOperator {
matches := ruleTokenRegex.FindAllString(options.Data, -1)
matches := ruleTokenRegex.FindAllString(options.Data, 3) // we use at most second match
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do confirm @jptosso @fzipi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 1 matches the whole thing, plus the first operator block, then actions block.

if len(matches) == 0 {
return nil, fmt.Errorf("invalid rule with no transformation matches: %q", options.Data)
}
operator := utils.RemoveQuotes(matches[0])
if utils.InSlice(operator, disabledRuleOperators) {
return nil, fmt.Errorf("%s rule operator is disabled", operator)
}

rulePieces := strings.SplitN(options.Data, " ", 2)
vars := rulePieces[0]
vars, _, _ := strings.Cut(options.Data, " ")
err = rp.ParseVariables(vars)
if err != nil {
return nil, err
Expand Down
Loading