diff --git a/README.md b/README.md index 5055d87..498bf2c 100644 --- a/README.md +++ b/README.md @@ -38,6 +38,8 @@ $ s3-object-router \ Usage of s3-object-router: -bucket string destination S3 bucket name + -format string + convert the s3 object format. choices are json|none (default "none") -gzip compress destination object by gzip (default true) -keep-original-name @@ -48,6 +50,8 @@ Usage of s3-object-router: set time zone to localtime for parsed time -no-put do not put to s3 + -parser string + object record parser. choices are json|cloudfront (default "json") -replacer string wildcard string replacer JSON. e.g. {"foo.bar.*":"foo"} -time-format string @@ -128,6 +132,49 @@ The first line will be routed to `path/to/app.normal/`, the second and third lin `-replacer` takes a definition as JSON string. The key defines matcher(may includes wildcard `*` and `?`) and the value defines replacement. The matchers works with an order that appears in JSON. When a matcher matches to a string, replace it to replacement and breaks (will not try other matchers). +### record parser + +`-parser` specifies the Parser for the object record. In defualt, `json` is selected, and the S3 object parse as one JSON object for each record. + +#### `cloudfront` + +If "cloudfront" is selected, the S3 object will be parsed as CloudFront standard logs. +(cf. https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/AccessLogs.html#AccessLogsFileNaming) + +For example, + +- time-parse: true +- time-key: `datetime` +- time-format: `2006-01-02T15:04:05Z` +- key-prefix: `path/to/{{ .x_edge_location }}/{{ .datetime.Format "2006-01-02-15" }}` +- parser: `cloudfront` +- Source S3 object + ```tsv + #Version: 1.0 + #Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields c-port time-to-first-byte x-edge-detailed-result-type sc-content-type sc-content-len sc-range-start sc-range-end + 2019-12-04 21:02:31 LAX1 392 192.0.2.100 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit SOX4xwn4XV6Q4rgb7XiVGOHms_BGlTAC4KyHmureZmBNrjGdRLiNIQ== d111111abcdef8.cloudfront.net https 23 0.001 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.001 Hit text/html 78 - - + 2019-12-04 21:02:31 LAX1 392 192.0.2.100 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit k6WGMNkEzR5BEM_SaF47gjtX9zBDO2m349OY2an0QPEaUum1ZOLrow== d111111abcdef8.cloudfront.net https 23 0.000 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.000 Hit text/html 78 - - + 2019-12-04 21:02:31 LAX1 392 192.0.2.100 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit f37nTMVvnKvV2ZSvEsivup_c2kZ7VXzYdjC-GUQZ5qNs-89BlWazbw== d111111abcdef8.cloudfront.net https 23 0.001 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.001 Hit text/html 78 - - + 2019-12-13 22:36:27 SEA19-C1 900 192.0.2.200 GET d111111abcdef8.cloudfront.net /favicon.ico 502 http://www.example.com/ Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Error 1pkpNfBQ39sYMnjjUQjmH2w1wdJnbHYTbag21o_3OfcQgPzdL2RSSQ== www.example.com http 675 0.102 - - - Error HTTP/1.1 - - 25260 0.102 OriginDnsError text/html 507 - - + 2019-12-13 22:36:26 SEA19-C1 900 192.0.2.200 GET d111111abcdef8.cloudfront.net / 502 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Error 3AqrZGCnF_g0-5KOvfA7c9XLcf4YGvMFSeFdIetR1N_2y8jSis8Zxg== www.example.com http 735 0.107 - - - Error HTTP/1.1 - - 3802 0.107 OriginDnsError text/html 507 - - + 2019-12-13 22:37:02 SEA19-C2 900 192.0.2.200 GET d111111abcdef8.cloudfront.net / 502 - curl/7.55.1 - - Error kBkDzGnceVtWHqSCqBUqtA_cEs2T3tFUBbnBNkB9El_uVRhHgcZfcw== www.example.com http 387 0.103 - - - Error HTTP/1.1 - - 12644 0.103 OriginDnsError text/html 507 - - + ``` + +The 3rd line will be routed to `path/to/LAX1/2019-12-04-21/`, the 4th line will be routed to `path/to/SEA19-C1/2019-12-13-22/`. + +`cloudfront` parser parses two header lines in S3 object. +At that time, the field name is converted according to the following rules. + + 1. Replace `(` `)` `-` to `_` + 1. Replace to all lowercase + 1. Trim the right `_` + +So can render any field in the key-prefix.; `cs(User-Agent)` can be rendered with `cs_user_agent`. + + +It also provides an RFC3399-formatted `datetime` field that combines the `date` and `time` fields of CloudFront's standard logs. Use with `-time-parse`,`-time-key`, `-time-format`. + +If want to convert the routed S3 object format to JSON, please use `-format json`. ## LICENSE MIT diff --git a/cmd/s3-object-router/main.go b/cmd/s3-object-router/main.go index d2edade..7252d12 100644 --- a/cmd/s3-object-router/main.go +++ b/cmd/s3-object-router/main.go @@ -50,20 +50,22 @@ func lambdaHandler(r *router.Router) func(context.Context, events.S3Event) error func setup() (*router.Router, error) { var ( - bucket, keyPrefix, replacer string - timeKey, timeFormat string - gzip, timeParse, localTime, noPut, keep bool + bucket, keyPrefix, replacer, parser, objFromat string + timeKey, timeFormat string + gzip, timeParse, localTime, noPut, keep bool ) flag.StringVar(&bucket, "bucket", "", "destination S3 bucket name") flag.StringVar(&keyPrefix, "key-prefix", "", "prefix of S3 key") flag.BoolVar(&gzip, "gzip", true, "compress destination object by gzip") flag.StringVar(&replacer, "replacer", "", `wildcard string replacer JSON. e.g. {"foo.bar.*":"foo"}`) + flag.StringVar(&parser, "parser", "json", "object record parser. choices are json|cloudfront") flag.BoolVar(&timeParse, "time-parse", false, "parse record value as time.Time with -time-format") flag.StringVar(&timeFormat, "time-format", time.RFC3339Nano, "format of time-parse") flag.StringVar(&timeKey, "time-key", router.DefaultTimeKey, "record key name for time-parse") flag.BoolVar(&localTime, "local-time", false, "set time zone to localtime for parsed time") flag.BoolVar(&noPut, "no-put", false, "do not put to s3") flag.BoolVar(&keep, "keep-original-name", false, "keep original object base name") + flag.StringVar(&objFromat, "format", "none", `convert the s3 object format. choices are json|none`) flag.VisitAll(envToFlag) flag.Parse() @@ -72,12 +74,14 @@ func setup() (*router.Router, error) { KeyPrefix: keyPrefix, Gzip: gzip, Replacer: replacer, + Parser: parser, TimeParse: timeParse, TimeKey: timeKey, TimeFormat: timeFormat, LocalTime: localTime, PutS3: !noPut, KeepOriginalName: keep, + ObjectFormat: objFromat, } log.Printf("[debug] option: %#v", opt) return router.New(&opt) diff --git a/encoder.go b/encoder.go new file mode 100644 index 0000000..e2d30dc --- /dev/null +++ b/encoder.go @@ -0,0 +1,59 @@ +package router + +import ( + "encoding/json" + "log" +) + +// LF represents LineFeed \n +var LF = []byte("\n") + +type noneEncoder struct { + body buffer +} + +func newNoneEncoder(body buffer) encoder { + return &noneEncoder{ + body: body, + } +} + +func (e *noneEncoder) Encode(_ record, recordBytes []byte) error { + if _, err := e.body.Write(recordBytes); err != nil { + return err + } + _, err := e.body.Write(LF) + return err +} + +func (e *noneEncoder) Buffer() buffer { + return e.body +} + +type jsonEncoder struct { + body buffer +} + +func newJSONEncoder(body buffer) encoder { + return &jsonEncoder{ + body: body, + } +} + +func (e *jsonEncoder) Encode(rec record, _ []byte) error { + + bytes, err := json.Marshal(rec) + if err != nil { + log.Println("[warn] failed to generate json record", err) + return err + } + if _, err := e.body.Write(bytes); err != nil { + return err + } + _, err = e.body.Write(LF) + return err +} + +func (e *jsonEncoder) Buffer() buffer { + return e.body +} diff --git a/option.go b/option.go index b75e9cb..ef22282 100644 --- a/option.go +++ b/option.go @@ -14,25 +14,32 @@ var DefaultTimeKey = "time" // Option represents option values of router type Option struct { - Bucket string - KeyPrefix string - TimeParse bool - TimeKey string - TimeFormat string - LocalTime bool - Gzip bool - Replacer string - PutS3 bool - KeepOriginalName bool + Bucket string `json:"bucket,omitempty"` + KeyPrefix string `json:"key_prefix,omitempty"` + TimeParse bool `json:"time_parse,omitempty"` + TimeKey string `json:"time_key,omitempty"` + TimeFormat string `json:"time_format,omitempty"` + LocalTime bool `json:"local_time,omitempty"` + Gzip bool `json:"gzip,omitempty"` + Replacer string `json:"replacer,omitempty"` + Parser string `json:"parser,omitempty"` + PutS3 bool `json:"put_s3,omitempty"` + ObjectFormat string `json:"object_format,omitempty"` + KeepOriginalName bool `json:"keep_original_name,omitempty"` - replacer replacer - timeParser timeParser + replacer replacer + recordParser recordParser + newEncoder func(buffer) encoder + timeParser timeParser } type replacer interface { Replace(string) string } +type recordParser interface { + Parse([]byte, *record) error +} type timeParser struct { layout string loc *time.Location @@ -76,6 +83,16 @@ func (opt *Option) Init() error { } else { opt.replacer = strings.NewReplacer() // nop replacer } + switch opt.Parser { + case "", "json": + opt.recordParser = recordParserFunc(func(b []byte, r *record) error { + return json.Unmarshal(b, r) + }) + case "cloudfront": + opt.recordParser = &cloudfrontParser{} + default: + return errors.New("parser must be string any of json|cloudfront") + } if opt.TimeParse { p := timeParser{layout: opt.TimeFormat} if opt.LocalTime { @@ -88,5 +105,13 @@ func (opt *Option) Init() error { if opt.TimeKey == "" { opt.TimeKey = DefaultTimeKey } + switch opt.ObjectFormat { + case "", "none": + opt.newEncoder = newNoneEncoder + case "json": + opt.newEncoder = newJSONEncoder + default: + return errors.New("format must be string any of json|none") + } return nil } diff --git a/parser.go b/parser.go new file mode 100644 index 0000000..afdbb49 --- /dev/null +++ b/parser.go @@ -0,0 +1,78 @@ +package router + +import ( + "fmt" + "strings" + + "github.com/pkg/errors" +) + +const ( + defaultCloudFrontNumColumns = 33 +) + +//recordParser predefined errors +var ( + SkipLine = errors.New("Please skip this line.") +) + +type recordParserFunc func([]byte, *record) error + +func (p recordParserFunc) Parse(bs []byte, r *record) error { + return p(bs, r) +} + +type cloudfrontParser struct { + version string + fields []string +} + +func (p *cloudfrontParser) Parse(bs []byte, r *record) error { + str := string(bs) + rec := make(record, defaultCloudFrontNumColumns) + *r = rec + if str[0] == '#' { + part := strings.SplitN(str[1:], ":", 2) + if len(part) != 2 { + return SkipLine + } + key := strings.TrimSpace(part[0]) + value := strings.TrimSpace(part[1]) + switch key { + case "Version": + p.version = value + case "Fields": + rawFields := strings.Split(value, " ") + //convert to snake case + fields := make([]string, 0, len(rawFields)) + replaceTargets := []string{"(", ")", "-"} + for _, rawField := range rawFields { + field := rawField + for _, target := range replaceTargets { + field = strings.ReplaceAll(field, target, "_") + } + field = strings.ToLower(field) + field = strings.TrimRight(field, "_") + fields = append(fields, field) + } + p.fields = fields + } + return SkipLine + } + values := strings.Split(str, "\t") + if len(values) > len(p.fields) { + return fmt.Errorf("this row has more values ​​than fields, num of values = %d, num of feilds = %d", len(values), len(p.fields)) + } + var dateValue, timeValue string + for i, field := range p.fields { + rec[field] = values[i] + if field == "date" { + dateValue = values[i] + } + if field == "time" { + timeValue = values[i] + } + } + rec["datetime"] = dateValue + "T" + timeValue + "Z" + return nil +} diff --git a/parser_test.go b/parser_test.go new file mode 100644 index 0000000..138931d --- /dev/null +++ b/parser_test.go @@ -0,0 +1,148 @@ +package router_test + +import ( + "encoding/json" + "io" + "io/ioutil" + "mime/multipart" + "os" + "path/filepath" + "reflect" + "testing" + + router "github.com/kayac/s3-object-router" +) + +type testParserConfig struct { + router.Option + Sources []string `json:"sources"` +} + +func TestParser(t *testing.T) { + cases, err := ioutil.ReadDir("testdata") + if err != nil { + t.Logf("can not read testdata:%s", err) + t.FailNow() + } + for _, c := range cases { + if !c.IsDir() { + continue + } + t.Run(c.Name(), func(t *testing.T) { + testParser(t, c.Name()) + }) + } +} + +func testParser(t *testing.T, caseDirName string) { + fp, err := os.Open(filepath.Join("testdata", caseDirName, "config.json")) + if err != nil { + t.Logf("can not open test config:%s", err) + t.FailNow() + } + defer fp.Close() + decoder := json.NewDecoder(fp) + var config testParserConfig + if err := decoder.Decode(&config); err != nil { + t.Logf("can not parse test config:%s", err) + t.FailNow() + } + + r, err := router.New(&config.Option) + if err != nil { + t.Error(err) + t.FailNow() + } + sfps := make(map[string]io.ReadCloser, len(config.Sources)) + defer func() { + for _, sfp := range sfps { + sfp.Close() + } + }() + for _, src := range config.Sources { + path := filepath.Join("testdata", src) + sfp, err := os.Open(path) + if err != nil { + t.Error(err) + t.FailNow() + } + sfps[path] = sfp + } + for path, sfp := range sfps { + res, err := router.DoTestRoute(r, sfp, "s3://example-bucket/path/to/example-object") + if err != nil { + t.Error(err) + continue + } + goldenFile := filepath.Join("testdata", caseDirName, filepath.Base(path)+".golden") + if *updateFlag { + writeParserGolden(t, goldenFile, res) + } + expected := readParserGolden(t, goldenFile) + if !reflect.DeepEqual(expected, res) { + t.Error("unexpected routed data") + for u, expectedContent := range expected { + if expectedContent != res[u] { + t.Errorf("expected %s got %s", expectedContent, res[u]) + } + } + } + } +} + +const ( + parserGoldenBoundary = "----s3-object-router-parser-test----" +) + +func writeParserGolden(t *testing.T, goldenFile string, res map[string]string) { + t.Helper() + fp, err := os.OpenFile( + goldenFile, + os.O_CREATE|os.O_WRONLY|os.O_TRUNC, + 0644, + ) + if err != nil { + t.Logf("can not create golden file: %s", err) + t.FailNow() + } + defer fp.Close() + w := multipart.NewWriter(fp) + w.SetBoundary(parserGoldenBoundary) + for dest, content := range res { + if err := w.WriteField(dest, content); err != nil { + t.Logf("can not write golden data: %s", err) + t.FailNow() + } + } + w.Close() +} + +func readParserGolden(t *testing.T, goldenFile string) map[string]string { + t.Helper() + fp, err := os.Open(goldenFile) + res := map[string]string{} + if err != nil { + t.Logf("can not open golden file: %s", err) + t.FailNow() + } + defer fp.Close() + r := multipart.NewReader(fp, parserGoldenBoundary) + for { + part, err := r.NextPart() + if err == io.EOF { + break + } + if err != nil { + t.Logf("can not get golden next part: %s", err) + t.FailNow() + } + content, err := ioutil.ReadAll(part) + if err != nil { + t.Logf("can not read golden part: %s", err) + t.FailNow() + } + res[part.FormName()] = string(content) + part.Close() + } + return res +} diff --git a/router.go b/router.go index fbdfb61..f17ca23 100644 --- a/router.go +++ b/router.go @@ -6,7 +6,6 @@ import ( "compress/gzip" "context" "crypto/sha256" - "encoding/json" "errors" "fmt" "html/template" @@ -23,9 +22,6 @@ import ( "golang.org/x/sync/semaphore" ) -// LF represents LineFeed \n -var LF = []byte("\n") - // MaxConcurrency represents maximum concurrency for uploading to S3 var MaxConcurrency = 10 @@ -146,16 +142,19 @@ func (r *Router) route(src io.Reader, s3url string) (map[destination]buffer, err return nil, err } scanner := bufio.NewScanner(src) + recordParser := r.option.recordParser buf := make([]byte, initialBufSize) scanner.Buffer(buf, maxBufSize) - dests := make(map[destination]buffer) + encs := make(map[destination]encoder) for scanner.Scan() { recordBytes := scanner.Bytes() var rec record - if err := json.Unmarshal(recordBytes, &rec); err != nil { - log.Println("[warn] failed to parse record", err) + if err := recordParser.Parse(recordBytes, &rec); err != nil { + if err != SkipLine { + log.Println("[warn] failed to parse record", err) + } continue } if r.option.TimeParse { @@ -171,21 +170,26 @@ func (r *Router) route(src io.Reader, s3url string) (map[destination]buffer, err log.Println("[warn] failed to generate destination", err) continue } - body := dests[d] - if body == nil { + enc := encs[d] + if enc == nil { + var body buffer if r.option.Gzip { body = newGzipBuffer() } else { body = new(bytes.Buffer) } + enc = r.option.newEncoder(body) } - body.Write(recordBytes) - body.Write(LF) - dests[d] = body + enc.Encode(rec, recordBytes) + encs[d] = enc } if err := scanner.Err(); err != nil { return nil, err } + dests := make(map[destination]buffer, len(encs)) + for d, enc := range encs { + dests[d] = enc.Buffer() + } return dests, nil } @@ -256,6 +260,11 @@ type buffer interface { Bytes() []byte } +type encoder interface { + Encode(record, []byte) error + Buffer() buffer +} + type gzBuffer struct { bytes.Buffer gz *gzip.Writer diff --git a/router_test.go b/router_test.go index 38bd6ff..5d47d78 100644 --- a/router_test.go +++ b/router_test.go @@ -3,6 +3,7 @@ package router_test import ( "bytes" "compress/gzip" + "flag" "io" "strings" "testing" @@ -39,7 +40,10 @@ func concat(strs ...string) string { return b.String() } +var updateFlag = flag.Bool("update", false, "update golden files") + func TestMain(t *testing.T) { + flag.Parse() var b bytes.Buffer w := gzip.NewWriter(&b) w.Write(testSrcBytes) diff --git a/testdata/cloudfront/config.json b/testdata/cloudfront/config.json new file mode 100644 index 0000000..f44fdab --- /dev/null +++ b/testdata/cloudfront/config.json @@ -0,0 +1,15 @@ +{ + "bucket": "dummy", + "key_prefix": "foo/{{ replace .x_edge_location }}/{{ .datetime.Format `2006-01-02` }}/", + "gzip": false, + "parser": "cloudfront", + "time_parse": true, + "time_key": "datetime", + "time_format": "2006-01-02T15:04:05Z07:00", + "put_s3": false, + "keep_original_name": false, + "object_format": "none", + "sources":[ + "cloudfront/example" + ] +} diff --git a/testdata/cloudfront/example b/testdata/cloudfront/example new file mode 100644 index 0000000..ecbed6c --- /dev/null +++ b/testdata/cloudfront/example @@ -0,0 +1,9 @@ +#from: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/AccessLogs.html#AccessLogsFileNaming +#Version: 1.0 +#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields c-port time-to-first-byte x-edge-detailed-result-type sc-content-type sc-content-len sc-range-start sc-range-end +2019-12-04 21:02:31 LAX1 392 192.0.2.100 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit SOX4xwn4XV6Q4rgb7XiVGOHms_BGlTAC4KyHmureZmBNrjGdRLiNIQ== d111111abcdef8.cloudfront.net https 23 0.001 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.001 Hit text/html 78 - - +2019-12-04 21:02:31 LAX1 392 192.0.2.100 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit k6WGMNkEzR5BEM_SaF47gjtX9zBDO2m349OY2an0QPEaUum1ZOLrow== d111111abcdef8.cloudfront.net https 23 0.000 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.000 Hit text/html 78 - - +2019-12-04 21:02:31 LAX1 392 192.0.2.100 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit f37nTMVvnKvV2ZSvEsivup_c2kZ7VXzYdjC-GUQZ5qNs-89BlWazbw== d111111abcdef8.cloudfront.net https 23 0.001 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.001 Hit text/html 78 - - +2019-12-13 22:36:27 SEA19-C1 900 192.0.2.200 GET d111111abcdef8.cloudfront.net /favicon.ico 502 http://www.example.com/ Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Error 1pkpNfBQ39sYMnjjUQjmH2w1wdJnbHYTbag21o_3OfcQgPzdL2RSSQ== www.example.com http 675 0.102 - - - Error HTTP/1.1 - - 25260 0.102 OriginDnsError text/html 507 - - +2019-12-13 22:36:26 SEA19-C1 900 192.0.2.200 GET d111111abcdef8.cloudfront.net / 502 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Error 3AqrZGCnF_g0-5KOvfA7c9XLcf4YGvMFSeFdIetR1N_2y8jSis8Zxg== www.example.com http 735 0.107 - - - Error HTTP/1.1 - - 3802 0.107 OriginDnsError text/html 507 - - +2019-12-13 22:37:02 SEA19-C2 900 192.0.2.200 GET d111111abcdef8.cloudfront.net / 502 - curl/7.55.1 - - Error kBkDzGnceVtWHqSCqBUqtA_cEs2T3tFUBbnBNkB9El_uVRhHgcZfcw== www.example.com http 387 0.103 - - - Error HTTP/1.1 - - 12644 0.103 OriginDnsError text/html 507 - - diff --git a/testdata/cloudfront/example.golden b/testdata/cloudfront/example.golden new file mode 100644 index 0000000..c6680cc --- /dev/null +++ b/testdata/cloudfront/example.golden @@ -0,0 +1,19 @@ +------s3-object-router-parser-test---- +Content-Disposition: form-data; name="s3://dummy/foo/SEA19-C1/2019-12-13/f7ec2b7eb299d99468ff797fba836fa6cfc4389e21562f50a7d41ddcf43bfd01" + +2019-12-13 22:36:27 SEA19-C1 900 192.0.2.200 GET d111111abcdef8.cloudfront.net /favicon.ico 502 http://www.example.com/ Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Error 1pkpNfBQ39sYMnjjUQjmH2w1wdJnbHYTbag21o_3OfcQgPzdL2RSSQ== www.example.com http 675 0.102 - - - Error HTTP/1.1 - - 25260 0.102 OriginDnsError text/html 507 - - +2019-12-13 22:36:26 SEA19-C1 900 192.0.2.200 GET d111111abcdef8.cloudfront.net / 502 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Error 3AqrZGCnF_g0-5KOvfA7c9XLcf4YGvMFSeFdIetR1N_2y8jSis8Zxg== www.example.com http 735 0.107 - - - Error HTTP/1.1 - - 3802 0.107 OriginDnsError text/html 507 - - + +------s3-object-router-parser-test---- +Content-Disposition: form-data; name="s3://dummy/foo/SEA19-C2/2019-12-13/f7ec2b7eb299d99468ff797fba836fa6cfc4389e21562f50a7d41ddcf43bfd01" + +2019-12-13 22:37:02 SEA19-C2 900 192.0.2.200 GET d111111abcdef8.cloudfront.net / 502 - curl/7.55.1 - - Error kBkDzGnceVtWHqSCqBUqtA_cEs2T3tFUBbnBNkB9El_uVRhHgcZfcw== www.example.com http 387 0.103 - - - Error HTTP/1.1 - - 12644 0.103 OriginDnsError text/html 507 - - + +------s3-object-router-parser-test---- +Content-Disposition: form-data; name="s3://dummy/foo/LAX1/2019-12-04/f7ec2b7eb299d99468ff797fba836fa6cfc4389e21562f50a7d41ddcf43bfd01" + +2019-12-04 21:02:31 LAX1 392 192.0.2.100 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit SOX4xwn4XV6Q4rgb7XiVGOHms_BGlTAC4KyHmureZmBNrjGdRLiNIQ== d111111abcdef8.cloudfront.net https 23 0.001 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.001 Hit text/html 78 - - +2019-12-04 21:02:31 LAX1 392 192.0.2.100 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit k6WGMNkEzR5BEM_SaF47gjtX9zBDO2m349OY2an0QPEaUum1ZOLrow== d111111abcdef8.cloudfront.net https 23 0.000 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.000 Hit text/html 78 - - +2019-12-04 21:02:31 LAX1 392 192.0.2.100 GET d111111abcdef8.cloudfront.net /index.html 200 - Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36 - - Hit f37nTMVvnKvV2ZSvEsivup_c2kZ7VXzYdjC-GUQZ5qNs-89BlWazbw== d111111abcdef8.cloudfront.net https 23 0.001 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Hit HTTP/2.0 - - 11040 0.001 Hit text/html 78 - - + +------s3-object-router-parser-test------ diff --git a/testdata/cloudfront_json/config.json b/testdata/cloudfront_json/config.json new file mode 100644 index 0000000..7a84bf6 --- /dev/null +++ b/testdata/cloudfront_json/config.json @@ -0,0 +1,15 @@ +{ + "bucket": "dummy", + "key_prefix": "foo/{{ replace .x_edge_location }}/{{ .datetime.Format `2006-01-02` }}/", + "gzip": false, + "parser": "cloudfront", + "time_parse": true, + "time_key": "datetime", + "time_format": "2006-01-02T15:04:05Z07:00", + "put_s3": false, + "keep_original_name": false, + "object_format": "json", + "sources":[ + "cloudfront/example" + ] +} diff --git a/testdata/cloudfront_json/example.golden b/testdata/cloudfront_json/example.golden new file mode 100644 index 0000000..1d8eb90 --- /dev/null +++ b/testdata/cloudfront_json/example.golden @@ -0,0 +1,19 @@ +------s3-object-router-parser-test---- +Content-Disposition: form-data; name="s3://dummy/foo/LAX1/2019-12-04/f7ec2b7eb299d99468ff797fba836fa6cfc4389e21562f50a7d41ddcf43bfd01" + +{"c_ip":"192.0.2.100","c_port":"11040","cs_bytes":"23","cs_cookie":"-","cs_host":"d111111abcdef8.cloudfront.net","cs_method":"GET","cs_protocol":"https","cs_protocol_version":"HTTP/2.0","cs_referer":"-","cs_uri_query":"-","cs_uri_stem":"/index.html","cs_user_agent":"Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36","date":"2019-12-04","datetime":"2019-12-04T21:02:31Z","fle_encrypted_fields":"-","fle_status":"-","sc_bytes":"392","sc_content_len":"78","sc_content_type":"text/html","sc_range_end":"-","sc_range_start":"-","sc_status":"200","ssl_cipher":"ECDHE-RSA-AES128-GCM-SHA256","ssl_protocol":"TLSv1.2","time":"21:02:31","time_taken":"0.001","time_to_first_byte":"0.001","x_edge_detailed_result_type":"Hit","x_edge_location":"LAX1","x_edge_request_id":"SOX4xwn4XV6Q4rgb7XiVGOHms_BGlTAC4KyHmureZmBNrjGdRLiNIQ==","x_edge_response_result_type":"Hit","x_edge_result_type":"Hit","x_forwarded_for":"-","x_host_header":"d111111abcdef8.cloudfront.net"} +{"c_ip":"192.0.2.100","c_port":"11040","cs_bytes":"23","cs_cookie":"-","cs_host":"d111111abcdef8.cloudfront.net","cs_method":"GET","cs_protocol":"https","cs_protocol_version":"HTTP/2.0","cs_referer":"-","cs_uri_query":"-","cs_uri_stem":"/index.html","cs_user_agent":"Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36","date":"2019-12-04","datetime":"2019-12-04T21:02:31Z","fle_encrypted_fields":"-","fle_status":"-","sc_bytes":"392","sc_content_len":"78","sc_content_type":"text/html","sc_range_end":"-","sc_range_start":"-","sc_status":"200","ssl_cipher":"ECDHE-RSA-AES128-GCM-SHA256","ssl_protocol":"TLSv1.2","time":"21:02:31","time_taken":"0.000","time_to_first_byte":"0.000","x_edge_detailed_result_type":"Hit","x_edge_location":"LAX1","x_edge_request_id":"k6WGMNkEzR5BEM_SaF47gjtX9zBDO2m349OY2an0QPEaUum1ZOLrow==","x_edge_response_result_type":"Hit","x_edge_result_type":"Hit","x_forwarded_for":"-","x_host_header":"d111111abcdef8.cloudfront.net"} +{"c_ip":"192.0.2.100","c_port":"11040","cs_bytes":"23","cs_cookie":"-","cs_host":"d111111abcdef8.cloudfront.net","cs_method":"GET","cs_protocol":"https","cs_protocol_version":"HTTP/2.0","cs_referer":"-","cs_uri_query":"-","cs_uri_stem":"/index.html","cs_user_agent":"Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36","date":"2019-12-04","datetime":"2019-12-04T21:02:31Z","fle_encrypted_fields":"-","fle_status":"-","sc_bytes":"392","sc_content_len":"78","sc_content_type":"text/html","sc_range_end":"-","sc_range_start":"-","sc_status":"200","ssl_cipher":"ECDHE-RSA-AES128-GCM-SHA256","ssl_protocol":"TLSv1.2","time":"21:02:31","time_taken":"0.001","time_to_first_byte":"0.001","x_edge_detailed_result_type":"Hit","x_edge_location":"LAX1","x_edge_request_id":"f37nTMVvnKvV2ZSvEsivup_c2kZ7VXzYdjC-GUQZ5qNs-89BlWazbw==","x_edge_response_result_type":"Hit","x_edge_result_type":"Hit","x_forwarded_for":"-","x_host_header":"d111111abcdef8.cloudfront.net"} + +------s3-object-router-parser-test---- +Content-Disposition: form-data; name="s3://dummy/foo/SEA19-C1/2019-12-13/f7ec2b7eb299d99468ff797fba836fa6cfc4389e21562f50a7d41ddcf43bfd01" + +{"c_ip":"192.0.2.200","c_port":"25260","cs_bytes":"675","cs_cookie":"-","cs_host":"d111111abcdef8.cloudfront.net","cs_method":"GET","cs_protocol":"http","cs_protocol_version":"HTTP/1.1","cs_referer":"http://www.example.com/","cs_uri_query":"-","cs_uri_stem":"/favicon.ico","cs_user_agent":"Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36","date":"2019-12-13","datetime":"2019-12-13T22:36:27Z","fle_encrypted_fields":"-","fle_status":"-","sc_bytes":"900","sc_content_len":"507","sc_content_type":"text/html","sc_range_end":"-","sc_range_start":"-","sc_status":"502","ssl_cipher":"-","ssl_protocol":"-","time":"22:36:27","time_taken":"0.102","time_to_first_byte":"0.102","x_edge_detailed_result_type":"OriginDnsError","x_edge_location":"SEA19-C1","x_edge_request_id":"1pkpNfBQ39sYMnjjUQjmH2w1wdJnbHYTbag21o_3OfcQgPzdL2RSSQ==","x_edge_response_result_type":"Error","x_edge_result_type":"Error","x_forwarded_for":"-","x_host_header":"www.example.com"} +{"c_ip":"192.0.2.200","c_port":"3802","cs_bytes":"735","cs_cookie":"-","cs_host":"d111111abcdef8.cloudfront.net","cs_method":"GET","cs_protocol":"http","cs_protocol_version":"HTTP/1.1","cs_referer":"-","cs_uri_query":"-","cs_uri_stem":"/","cs_user_agent":"Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36","date":"2019-12-13","datetime":"2019-12-13T22:36:26Z","fle_encrypted_fields":"-","fle_status":"-","sc_bytes":"900","sc_content_len":"507","sc_content_type":"text/html","sc_range_end":"-","sc_range_start":"-","sc_status":"502","ssl_cipher":"-","ssl_protocol":"-","time":"22:36:26","time_taken":"0.107","time_to_first_byte":"0.107","x_edge_detailed_result_type":"OriginDnsError","x_edge_location":"SEA19-C1","x_edge_request_id":"3AqrZGCnF_g0-5KOvfA7c9XLcf4YGvMFSeFdIetR1N_2y8jSis8Zxg==","x_edge_response_result_type":"Error","x_edge_result_type":"Error","x_forwarded_for":"-","x_host_header":"www.example.com"} + +------s3-object-router-parser-test---- +Content-Disposition: form-data; name="s3://dummy/foo/SEA19-C2/2019-12-13/f7ec2b7eb299d99468ff797fba836fa6cfc4389e21562f50a7d41ddcf43bfd01" + +{"c_ip":"192.0.2.200","c_port":"12644","cs_bytes":"387","cs_cookie":"-","cs_host":"d111111abcdef8.cloudfront.net","cs_method":"GET","cs_protocol":"http","cs_protocol_version":"HTTP/1.1","cs_referer":"-","cs_uri_query":"-","cs_uri_stem":"/","cs_user_agent":"curl/7.55.1","date":"2019-12-13","datetime":"2019-12-13T22:37:02Z","fle_encrypted_fields":"-","fle_status":"-","sc_bytes":"900","sc_content_len":"507","sc_content_type":"text/html","sc_range_end":"-","sc_range_start":"-","sc_status":"502","ssl_cipher":"-","ssl_protocol":"-","time":"22:37:02","time_taken":"0.103","time_to_first_byte":"0.103","x_edge_detailed_result_type":"OriginDnsError","x_edge_location":"SEA19-C2","x_edge_request_id":"kBkDzGnceVtWHqSCqBUqtA_cEs2T3tFUBbnBNkB9El_uVRhHgcZfcw==","x_edge_response_result_type":"Error","x_edge_result_type":"Error","x_forwarded_for":"-","x_host_header":"www.example.com"} + +------s3-object-router-parser-test------