Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] Add basic json support into s3 input #15370

Merged
merged 14 commits into from
Jan 10, 2020
Merged

[Filebeat] Add basic json support into s3 input #15370

merged 14 commits into from
Jan 10, 2020

Conversation

kaiyan-sheng
Copy link
Contributor

@kaiyan-sheng kaiyan-sheng commented Jan 7, 2020

This PR is to add basic json support for s3 input so the work of adding cloudtrail metricset can continue. expand_event_list_from_field is added in s3 input config for users to specify the top level key name in the JSON object. For example, in cloudtrail log example below, Records should be the expand_event_list_from_field in order for s3 input to parse the log correctly.

This PR is only for adding expand_event_list_from_field to implement #15357

cloudtrail log example:

{"Records": [{
    "eventVersion": "1.0",
    "userIdentity": {
        "type": "IAMUser",
        "principalId": "EX_PRINCIPAL_ID",
        "arn": "arn:aws:iam::123456789012:user/Alice",
        "accessKeyId": "EXAMPLE_KEY_ID",
        "accountId": "123456789012",
        "userName": "Alice"
    },
    "eventTime": "2014-03-06T21:22:54Z",
    "eventSource": "ec2.amazonaws.com",
    "eventName": "StartInstances",
    "awsRegion": "us-east-2",
    "sourceIPAddress": "205.251.233.176",
    "userAgent": "ec2-api-tools 1.6.12.2",
    "requestParameters": {"instancesSet": {"items": [{"instanceId": "i-ebeaf9e2"}]}},
    "responseElements": {"instancesSet": {"items": [{
        "instanceId": "i-ebeaf9e2",
        "currentState": {
            "code": 0,
            "name": "pending"
        },
        "previousState": {
            "code": 80,
            "name": "stopped"
        }
    }]}}
}]}

How to test this

In order to get json log files into S3 bucket, you can manually upload json file or you can enable cloudtrail log in AWS to send logs to a specific S3 bucket.
Also you need to create an SQS queue and setup notifications for any new object created in S3 bucket.
Run filebeat with s3 input enabled in filebeat.yml:

filebeat.inputs:
- type: s3
  queue_url:   https://sqs.us-east-1.amazonaws.com/428152502467/test-fb-ks
  credential_profile_name: elastic-beats
  expand_event_list_from_field: Records

This will enable Filebeat s3 input to parse json logs.

Output

{
  "_index": "filebeat-8.0.0-2019.12.23-000001",
  "_type": "_doc",
  "_id": "52a0364d21-000000001506",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2020-01-07T23:31:18.846Z",
    "cloud": {
      "region": "us-east-1",
      "provider": "aws"
    },
    "agent": {
      "hostname": "KaiyanMacBookPro",
      "id": "7578d49c-6588-4843-85cc-ad3859f99ed1",
      "version": "8.0.0",
      "type": "filebeat",
      "ephemeral_id": "16bf080d-b115-46b4-8caa-c272fb7f4f88"
    },
    "ecs": {
      "version": "1.2.0"
    },
    "container": {
      "id": "CloudTrail"
    },
    "message": "{\"additionalEventData\":{\"AuthenticationMethod\":\"AuthHeader\",\"CipherSuite\":\"ECDHE-RSA-AES128-SHA\",\"SSEApplied\":\"SSE_S3\",\"SignatureVersion\":\"SigV4\",\"bytesTransferredIn\":712,\"bytesTransferredOut\":0,\"x-amz-id-2\":\"9n9oPkU2VnSa3NBi095vtVm0yQeTtVdCLbUlmjN9W5vYuyhn/FyLlNzhga3YEtwz7qQiEBcYoWs=\"},\"awsRegion\":\"us-east-1\",\"eventID\":\"dd7d6dd1-e0fb-4dcc-959a-394980524786\",\"eventName\":\"PutObject\",\"eventSource\":\"s3.amazonaws.com\",\"eventTime\":\"2020-01-07T23:18:03Z\",\"eventType\":\"AwsApiCall\",\"eventVersion\":\"1.05\",\"readOnly\":false,\"recipientAccountId\":\"428152502467\",\"requestID\":\"4D8B8F6053BDBA1E\",\"requestParameters\":{\"Host\":\"test-fb-ks.s3.amazonaws.com\",\"bucketName\":\"test-fb-ks\",\"key\":\"AWSLogs/428152502467/CloudTrail-Digest/ap-south-1/2020/01/07/428152502467_CloudTrail-Digest_ap-south-1_test-cloudtrail-ks_us-east-1_20200107T222151Z.json.gz\",\"x-amz-acl\":\"bucket-owner-full-control\",\"x-amz-server-side-encryption\":\"AES256\"},\"resources\":[{\"ARN\":\"arn:aws:s3:::test-fb-ks/AWSLogs/428152502467/CloudTrail-Digest/ap-south-1/2020/01/07/428152502467_CloudTrail-Digest_ap-south-1_test-cloudtrail-ks_us-east-1_20200107T222151Z.json.gz\",\"type\":\"AWS::S3::Object\"},{\"ARN\":\"arn:aws:s3:::test-fb-ks\",\"accountId\":\"428152502467\",\"type\":\"AWS::S3::Bucket\"}],\"responseElements\":{\"x-amz-server-side-encryption\":\"AES256\"},\"sharedEventID\":\"370d7d2e-10a1-4a29-994b-c51add403ad2\",\"sourceIPAddress\":\"cloudtrail.amazonaws.com\",\"userAgent\":\"cloudtrail.amazonaws.com\",\"userIdentity\":{\"invokedBy\":\"cloudtrail.amazonaws.com\",\"type\":\"AWSService\"}}",
    "log": {
      "offset": 1506,
      "file.path": "https://test-fb-ks.s3-us-east-1.amazonaws.com/AWSLogs/428152502467/CloudTrail/us-east-1/2020/01/07/428152502467_CloudTrail_us-east-1_20200107T2320Z_KRw46eDNsf54qTpu.json.gz"
    },
    "aws": {
      "s3": {
        "object.key": "AWSLogs/428152502467/CloudTrail/us-east-1/2020/01/07/428152502467_CloudTrail_us-east-1_20200107T2320Z_KRw46eDNsf54qTpu.json.gz",
        "bucket": {
          "name": "test-fb-ks",
          "arn": "arn:aws:s3:::test-fb-ks"
        }
      }
    },
    "input": {
      "type": "s3"
    },
    "host": {
      "name": "KaiyanMacBookPro",
      "hostname": "KaiyanMacBookPro",
      "architecture": "x86_64",
      "os": {
        "build": "17G10021",
        "platform": "darwin",
        "version": "10.13.6",
        "family": "darwin",
        "name": "Mac OS X",
        "kernel": "17.7.0"
      },
      "id": "9C7FAB7B-29D1-5926-8E84-158A9CA3E25D"
    }
  },
  "fields": {
    "suricata.eve.timestamp": [
      "2020-01-07T23:31:18.846Z"
    ],
    "@timestamp": [
      "2020-01-07T23:31:18.846Z"
    ]
  },
  "sort": [
    1578439878846
  ]
}

@kaiyan-sheng kaiyan-sheng self-assigned this Jan 7, 2020
@kaiyan-sheng kaiyan-sheng added Filebeat Filebeat in progress Pull request is currently in progress. needs_backport PR is waiting to be backported to other branches. Team:Integrations Label for the Integrations team test-plan Add this PR to be manual test plan labels Jan 7, 2020
@andrewkroh
Copy link
Member

We need a way to be able to tell it to iterate over nested objects and create one document per object. Like if the message_key were Records[] then it would iterate over each element in Records and generate an event. (That syntax I borrowed from http://jmespath.org/).

Copy link
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with cloudtrail and it worked for both single event in Records array and multiple events in Records array. :-)

I did see a panic if json.message_key: Records wasn't included with Cloudtrail S3 input.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x571a5e8]

goroutine 259 [running]:
github.com/elastic/beats/x-pack/filebeat/input/s3.(*s3Input).handleS3Objects(0xc000118000, 0x618c820, 0xc0007f6010, 0xc0007e5e80, 0x1, 0x1, 0xc00005a420, 0x0, 0x0)
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/s3/input.go:384 +0xa98
github.com/elastic/beats/x-pack/filebeat/input/s3.(*s3Input).processMessage(0xc000118000, 0x618c820, 0xc0007f6010, 0x0, 0xc000946020, 0xc000946050, 0x0, 0x0, 0xc000946080, 0xc0009460c0, ...)
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/s3/input.go:257 +0x2f6
created by github.com/elastic/beats/x-pack/filebeat/input/s3.(*s3Input).processor
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/s3/input.go:240 +0x19c

@kaiyan-sheng
Copy link
Contributor Author

We need a way to be able to tell it to iterate over nested objects and create one document per object. Like if the message_key were Records[] then it would iterate over each element in Records and generate an event. (That syntax I borrowed from http://jmespath.org/).

@andrewkroh Thanks for the review. With this PR, if json.message_key is given, then it will iterate over each element and generate one event per element under the message key, such as Records. Is there any use case that when we specify a json.message_key but still only want to report one event for all elements under the key?

@kaiyan-sheng
Copy link
Contributor Author

I tested with cloudtrail and it worked for both single event in Records array and multiple events in Records array. :-)

I did see a panic if json.message_key: Records wasn't included with Cloudtrail S3 input.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x571a5e8]

goroutine 259 [running]:
github.com/elastic/beats/x-pack/filebeat/input/s3.(*s3Input).handleS3Objects(0xc000118000, 0x618c820, 0xc0007f6010, 0xc0007e5e80, 0x1, 0x1, 0xc00005a420, 0x0, 0x0)
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/s3/input.go:384 +0xa98
github.com/elastic/beats/x-pack/filebeat/input/s3.(*s3Input).processMessage(0xc000118000, 0x618c820, 0xc0007f6010, 0x0, 0xc000946020, 0xc000946050, 0x0, 0x0, 0xc000946080, 0xc0009460c0, ...)
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/s3/input.go:257 +0x2f6
created by github.com/elastic/beats/x-pack/filebeat/input/s3.(*s3Input).processor
	/go/src/github.com/elastic/beats/x-pack/filebeat/input/s3/input.go:240 +0x19c

@leehinman Thanks for testing it! Panic bug is fixed now!

@kaiyan-sheng kaiyan-sheng added review and removed in progress Pull request is currently in progress. labels Jan 8, 2020
x-pack/filebeat/input/s3/config.go Outdated Show resolved Hide resolved
@kaiyan-sheng
Copy link
Contributor Author

I dont think CI failures are related. Merging this PR.

@kaiyan-sheng kaiyan-sheng merged commit 8962224 into elastic:master Jan 10, 2020
@kaiyan-sheng kaiyan-sheng deleted the add_json_s3 branch January 10, 2020 17:09
@kaiyan-sheng kaiyan-sheng added v7.6.0 and removed needs_backport PR is waiting to be backported to other branches. labels Jan 10, 2020
kaiyan-sheng added a commit that referenced this pull request Jan 13, 2020
…input (#15477)

* [Filebeat] Add basic json support into s3 input (#15370)

* Add basic support for json format logs with message_key
* Change to use expand_event_list_from_field

(cherry picked from commit 8962224)

* update changelog
@kaiyan-sheng kaiyan-sheng removed their assignment Jan 14, 2020
@jsoriano jsoriano self-assigned this Jan 16, 2020
@jsoriano jsoriano added the test-plan-ok This PR passed manual testing label Jan 16, 2020
@jsoriano jsoriano assigned jsoriano and unassigned jsoriano Jan 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Filebeat Filebeat review Team:Integrations Label for the Integrations team test-plan Add this PR to be manual test plan test-plan-ok This PR passed manual testing v7.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants