Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest Pipeline tester using Simulate API #6261

Merged
merged 14 commits into from
Feb 19, 2018

Conversation

kvch
Copy link
Contributor

@kvch kvch commented Feb 1, 2018

This script is tests a single log line or multiple lines against an Ingest pipeline using the Simulate API. It creates a request and displays the response from Elasticsearch.

Usage

Usage of /tmp/go-build407861954/b001/exe/main:
  -elasticsearch string
        Elasticsearch URL (default "http://localhost:9200")
  -log string
        Single log line to test
  -logfile string
        Path to log file
  -maxbytes int
        Number of max bytes to be read (default 10485760)
  -modules string
        Path to modules (default "./modules")
  -multiline.mode string
        Multiline mode (default "before")
  -multiline.negate
        Multiline negate
  -multiline.pattern string
        Multiline pattern
  -pipeline string
        Path to pipeline
  -simulate.verbose
        Call Simulate API with verbose option
  -strict.perms
        Strict permission checking on config files (default true)
  -verbose
        Print full output of Simulate API

Finding pipelines

  1. specify path to one pipeline
  2. specify a directory to look for pipelines
  3. specify a module/fileset pair and find its pipeline

Examples

Correct pipeline

Input pipeline: PostgreSQL/log
Input log: 2017-04-03 22:32:14.322 CEST [31225] postgres@mydb LOG: could not receive data from client: Connection reset by peer

$ go run main.go -pipeline "../../module/postgresql/log/ingest/pipeline.json" -log "2017-04-03 22:32:14.322 CEST [31225] postgres@mydb LOG:  could not receive data from client: Connection reset by peer" -verbose
{
  "docs": [
    {
      "doc": {
        "_id": "id",
        "_index": "index",
        "_ingest": {
          "timestamp": "2018-02-01T18:54:00.450Z"
        },
        "_source": {
          "@timestamp": "2017-04-03T22:32:14.322Z",
          "message": "2017-04-03 22:32:14.322 CEST [31225] postgres@mydb LOG:  could not receive data from client: Connection reset by peer",
          "postgresql": {
            "log": {
              "database": "mydb",
              "level": "LOG",
              "message": "could not receive data from client: Connection reset by peer",
              "thread_id": "31225",
              "timestamp": "2017-04-03 22:32:14.322",
              "timezone": "CEST",
              "user": "postgres"
            }
          }
        },
        "_type": "doc"
      }
    }
  ]
}

Incorrect pipeline

Input pipeline: PostgreSQL/log
Input log: 2017-04-03 22:32:14.322 [31225] postgres@mydb LOG: could not receive data from cl ient: Connection reset by peer

> $ go run main.go -path "../../module/postgresql/log/ingest/pipeline.json" \-log "2017-04-03 22:32:14.322 [31225] postgres@mydb LOG:  could not receive data f
rom client: Connection reset by peer"
{
  "docs": [
    {
      "doc": {
        "_id": "id",
        "_index": "index",
        "_ingest": {
          "timestamp": "2018-02-01T18:56:47.338Z"
        },
        "_source": {
          "error": {
            "message": "Provided Grok expressions do not match field value: [2017-04-03 22:32:14.322 [31225] postgres@mydb LOG:  could not receive data from cl
ient: Connection reset by peer]"
          },
          "message": "2017-04-03 22:32:14.322 [31225] postgres@mydb LOG:  could not receive data from client: Connection reset by peer"
        },
        "_type": "doc"
      }
    }
  ]
}

TODO

  • test the tester
  • test a single line
  • test multiple lines from a file
  • multiline handling
    • negate
    • before/after
  • handle panics in case of invalid multiline pattern

Questions

  • What do you think about the interface?
  • Do you need more features?

@kvch kvch added in progress Pull request is currently in progress. discuss Issue needs further discussion. review Filebeat Filebeat labels Feb 1, 2018
@kvch kvch force-pushed the filebeat/feature/pipeline-tester branch from 633aa23 to 4effa77 Compare February 2, 2018 09:01
Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool as it should make testing a pipeline much easier. Having this code in Golang means it could probably also replace parts of our slow "system module tests".

For the naming of the variables / options I suggest to keep them identical to what we have in filebeat. So for multiline it is for example -multiline.match.

The PR reminded me also of #5028 from @urso

func getMultiline(s *bufio.Scanner, line string, multiNegate bool, regex *regexp.Regexp) []string {
matches := regex.MatchString(line)
fullLine := line
if matches || !matches && multiNegate {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we reuse here the logic of our multiline reader to make sure we have the exact same behaviour?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@urso
Copy link

urso commented Feb 5, 2018

💯

The difference between this and #5028 is, I wanted to run tests based on a user module/prospector configuration. Right from config file. This PR requires you to name the pipeline.json file under test. Both have slightly different use-cases, user vs. module developer. Still this command can still be used by users for very custom pipeline configurations.

A mix of both would be nice. E.g.:

  • if input is pipeline json file -> use it
  • I pass a directory, lookup the json file:
    • if multiple files exist, check all and output which ones fail and which ones succeed
  • If I pass a modules name only -> resolved module directory and treat like directory

As one can push multiple events via SimulateAPI at once, it might be somewhat daunting to read/interpret the output. It's too easy to miss an error. Plus, for debugging purposes it's helpful to add the verbose flag to the API call (in order get collect per processor input/output). E.g. how about: - only show errored events by default (only failed documents, not all)

  • have verbose flag to show all contents
  • have double-verbose flag to set SimulateAPI verbose flag as wel

r = reader.NewStripNewline(r)

if multiPattern != "" {
p := match.MustCompile(multiPattern)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this panics on invalid pattern. Let's create a somewhat user-friendly error message.

return nil, err
}
}
r = reader.NewLimit(r, 10485760)
Copy link

@urso urso Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limit should be configurable. Think, users with very big xml documents :/

@kvch kvch force-pushed the filebeat/feature/pipeline-tester branch from 651c301 to ece6d39 Compare February 9, 2018 12:48
@kvch kvch removed discuss Issue needs further discussion. in progress Pull request is currently in progress. labels Feb 9, 2018
}
}
}
func getLogsFromFile(logfile, multiPattern string, multiNegate bool, matchMode string, maxBytes int) ([]string, error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda scary considering users potentially using a 1GB test file :)

}
defer f.Close()

encFactory, ok := encoding.FindEncoding("utf8")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encoding should be configurable. Especially windows users will oftentimes have utf16

@kvch kvch force-pushed the filebeat/feature/pipeline-tester branch from 4f8464f to c12449e Compare February 13, 2018 15:44
@kvch
Copy link
Contributor Author

kvch commented Feb 13, 2018

@urso I addressed your review notes.
The CI jobs fail due to ES problems.


encFactory, ok := encoding.FindEncoding(conf.encoding)
if !ok {
return nil, fmt.Errorf("unable to find 'utf8' encoding")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error message should not say utf8

@kvch kvch force-pushed the filebeat/feature/pipeline-tester branch from 3343785 to 79ddb1a Compare February 19, 2018 12:25
@tsg tsg merged commit 1d7e325 into elastic:master Feb 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants