Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] Allow replace_all_patterns function to accept not only StringGetter interface function but also StringLikeGetter #32895

Closed
krokwen opened this issue May 7, 2024 · 6 comments

Comments

@krokwen
Copy link

krokwen commented May 7, 2024

Component(s)

pkg/ottl

What happened?

Description

Is want just to replace value in matching keys with 'redacted' string (sanitizing logs), But i can only replace it with hashsum of the replacement string. I want to use the 'String' function instead 'SHA1' to just implement this operation with one string without wasting time on hash calculation

Steps to Reproduce

using replace_all_patterns(attributes, "key", ".*token.*", "redacted", String) with transform processor crashes the collector when operation executes (nested bug: such things have to be caught on config validation step) that it's not possible to use StringLikeGetter interface function and it's have to be StringGetter interface

Expected Result

replace_all_patterns accepts StringLikeGetter interface function (String function) and just replaces all values in matching keys

Actual Result

panic: reflect.Set: value of type ottl.StringGetter[github.com/open-telemetry/opentelemetry-collector-contrib/pkg/ottl/contexts/ottllog.TransformContext] is not assignable to type ottl.StringLikeGetter[github.com/open-telemetry/opentelemetry-collector-contrib/pkg/ottl/contexts/ottllog.TransformContext]

Collector version

v0.99.0

Environment information

Environment

OS: Ubuntu 22.04.3

OpenTelemetry Collector configuration

receivers:
  filelog/nginx_access_json_logs_mywebsite:
    resource:
      service: nginx
      kind: access-log
      app_name: mywebsite
    include:
      - /var/log/nginx/mywebsite.access.log.json
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.time_local
          layout: '%d/%b/%Y:%H:%M:%S %z'
      # try to parse json request body
      - type: json_parser
        parse_from: attributes.request_body
        parse_to: attributes.request_body
        on_error: send_quiet
      # try to parse url-encoded request body
      - type: key_value_parser
        parse_from: attributes.request_body
        parse_to: attributes.request_body
        delimiter: "="
        pair_delimiter: "&"
        on_error: send_quiet
      - type: uri_parser
        parse_from: attributes.request_url
        parse_to: attributes.request_args
      - type: move
        from: attributes.request_args.path
        to: attributes.request_uri
      - type: move
        from: attributes["log.file.name"]
        to: resource.log-filename
      - type: remove
        field: attributes.time_local

processors:
  transform/nginx_json_cleanup:
    error_mode: ignore
    log_statements:
      - context: log
        statements:
          - flatten(attributes["request_body"], "request_body") where IsMap(attributes["request_body"])
          # cleanup sensitive data from request body
          - replace_all_patterns(attributes["request_body"], "key", ".*token.*", "redacted", String) where IsMap(attributes["request_body"])
          # these two lines actually work:
          # - replace_all_patterns(attributes["request_body"], "key", ".*token.*", "redacted", SHA1, "redacted %s") where IsMap(attributes["request_body"])
          # - replace_all_patterns(attributes["request_body"], "value", "^redacted.*", "<redacted>") where IsMap(attributes["request_body"])
          - merge_maps(attributes, attributes["request_body"], "insert") where IsMap(attributes["request_body"])
          - delete_key(attributes, "request_body")

          # merge request_args.query into attributes
          - flatten(attributes["request_args"]["query"], "request_args")
          # cleanup sensitive data from request args
          - replace_all_patterns(attributes["request_args"]["query"], "key", ".*token.*", "redacted", String) where IsMap(attributes["request_args"]["query"])
          # these two lines actually work:
          # - replace_all_patterns(attributes["request_args"]["query"], "key", ".*token.*", "redacted", SHA1, "redacted %s") where IsMap(attributes["request_args"]["query"])
          # - replace_all_patterns(attributes["request_args"]["query"], "value", "^redacted.*", "<redacted>") where IsMap(attributes["request_args"]["query"])
          - merge_maps(attributes, attributes["request_args"]["query"], "insert")
          - delete_key(attributes, "request_args")

          # replace patterns in urls
          - replace_pattern(attributes["request_uri"], "/[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}", "/{uuid}")
          - replace_pattern(attributes["request_uri"], "/\\d+/", "/{id}/")
          - replace_pattern(attributes["request_uri"], "/\\d+$", "/{id}")
          - replace_pattern(attributes["request_uri"], "/[a-z0-9]{64}", "/{hexid64}")
          - replace_pattern(attributes["request_uri"], "/[a-z0-9]{48}", "/{hexid48}")
          - replace_pattern(attributes["request_uri"], "/[a-z0-9]{32}", "/{hexid32}")
          - replace_pattern(attributes["request_uri"], "/[a-z0-9]{24}", "/{hexid24}")
          - replace_pattern(attributes["request_uri"], "/[a-z0-9]{16}", "/{hexid16}")
          - replace_pattern(attributes["request_uri"], "^/[a-z]{2}(|-[a-z]{2})/", "/{lang}/")

          # cleanup sensitive data from request url
          - replace_pattern(attributes["request_url"], "token=[a-f0-9]{32}", "token=<redacted>")
          - replace_pattern(attributes["request_url"], "trade_token=[a-zA-Z0-9]+", "trade_token=<redacted>")

          # set compacted human-readable body
          - set(body, Concat([attributes["remote_addr"], "Status:", attributes["status"], attributes["request_method"], attributes["request_url"], attributes["request_time"]], " "))

extensions: {}


exporters:
  otlp:
    endpoint: "141.95.152.98:4317"
    tls:
      insecure: true
    compression: snappy
    retry_on_failure:
      enabled: True
      initial_interval: 5s
      max_interval: 5s
      max_elapsed_time: 12s
    sending_queue:
      queue_size: 100

service:
  telemetry:
    metrics:
      level: none
    logs:
      level: error
  pipelines:
    logs/filelog_nginx_access_json_logs:
      receivers:
        - filelog/nginx_access_json_logs_mywebsite
      processors:
        - transform/nginx_json_cleanup
      exporters:
        - otlp

Log output

otelcol-contrib[1934773]: panic: reflect.Set: value of type ottl.StringGetter[github.com/open-telemetry/opentelemetry-collector-contrib/pkg/ottl/contexts/ottllog.TransformContext] is not assignable to type ottl.StringLikeGetter[github.com/open-telemetry/opentelemetry-collector-contrib/pkg/ottl/contexts/ottllog.TransformContext]
otelcol-contrib[1934773]: goroutine 200 [running]:
otelcol-contrib[1934773]: reflect.Value.assignTo({0x75cfc60?, 0xc0021a6370?, 0x411c6c?}, {0x86859bd, 0xb}, 0x75cfe60, 0xc00202f310)
otelcol-contrib[1934773]:         reflect/value.go:3356 +0x299
otelcol-contrib[1934773]: reflect.Value.Set({0x75cfe60?, 0xc00202f310?, 0xc0019f6460?}, {0x75cfc60?, 0xc0021a6370?, 0x8?})
otelcol-contrib[1934773]:         reflect/value.go:2325 +0xe6
otelcol-contrib[1934773]: github.com/open-telemetry/opentelemetry-collector-contrib/pkg/ottl.StandardFunctionGetter[...].Get(0x7f15aa92ea38?, {0x6ccf440, 0xc0021a6370})
otelcol-contrib[1934773]:         github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/expression.go:325 +0x47c
otelcol-contrib[1934773]: github.com/open-telemetry/opentelemetry-collector-contrib/pkg/ottl/ottlfuncs.applyOptReplaceFunction[...]({0x9779280, 0xc001e45680}, {{0xc003916000, 0xc0021899d0}, {0xc001aa1f10, 0xc0021899d0}, {0xc004253db8, 0xc0021899d0}, {0xc00373bc08, 0xc0021899d4}}, ...)
otelcol-contrib[1934773]:         github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/ottlfuncs/func_replace_pattern.go:79 +0x32e
otelcol-contrib[1934773]: github.com/open-telemetry/opentelemetry-collector-contrib/pkg/ottl/ottlfuncs.replaceAllPatterns[...].func1.1({0xc0025575d0, 0xc0021899d0})
otelcol-contrib[1934773]:         github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/ottlfuncs/func_replace_all_patterns.go:85 +0x23d
otelcol-contrib[1934773]: go.opentelemetry.io/collector/pdata/pcommon.Map.Range({0xc00373bbd8?, 0xc0021899d0?}, 0xc002200e80)
otelcol-contrib[1934773]:         go.opentelemetry.io/collector/[email protected]/pcommon/map.go:222 +0x97
otelcol-contrib[1934773]: github.com/open-telemetry/opentelemetry-collector-contrib/pkg/ottl/ottlfuncs.replaceAllPatterns[...].func1({{0xc003916000, 0xc0021899d0}, {0xc001aa1f10, 0xc0021899d0}, {0xc004253db8, 0xc0021899d0}, {0xc00373bc08, 0xc0021899d4}})
otelcol-contrib[1934773]:         github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/ottlfuncs/func_replace_all_patterns.go:65 +0x3be
otelcol-contrib[1934773]: github.com/open-telemetry/opentelemetry-collector-contrib/pkg/ottl.Expr[...].Eval(...)
otelcol-contrib[1934773]:         github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/expression.go:27
otelcol-contrib[1934773]: github.com/open-telemetry/opentelemetry-collector-contrib/pkg/ottl.(*Statement[...]).Execute(0x0, {0x9779280?, 0xc001e45680}, {{0xc003916000, 0xc0021899d0}, {0xc001aa1f10, 0xc0021899d0}, {0xc004253db8, 0xc0021899d0}, {0xc00373bc08, ...}})
otelcol-contrib[1934773]:         github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/parser.go:35 +0xcc
otelcol-contrib[1934773]: github.com/open-telemetry/opentelemetry-collector-contrib/pkg/ottl.(*StatementSequence[...]).Execute(0x97271c0, {0x9779280, 0xc001e45680}, {{0xc003916000, 0xc0021899d0}, {0xc001aa1f10, 0xc0021899d0}, {0xc004253db8, 0xc0021899d0}, {0xc00373bc08, ...}})
otelcol-contrib[1934773]:         github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/parser.go:266 +0xdb
otelcol-contrib[1934773]: github.com/open-telemetry/opentelemetry-collector-contrib/processor/transformprocessor/internal/common.logStatements.ConsumeLogs({{{0xc0035b4b00, 0x16, 0x16}, {0xc00349a3f8, 0x6}, {0xc003750f00, {0x9725cc8, 0xc003466d80}, {0x9722348, 0xc00361a7b0}, ...}}}, ...)
otelcol-contrib[1934773]:         github.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/internal/common/logs.go:39 +0x252
otelcol-contrib[1934773]: github.com/open-telemetry/opentelemetry-collector-contrib/processor/transformprocessor/internal/logs.(*Processor).ProcessLogs(0xc001dbb080, {0x9779280, 0xc001e45680}, {0xc00373bba8?, 0xc0021899d0?})
otelcol-contrib[1934773]:         github.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/internal/logs/processor.go:52 +0xc4
otelcol-contrib[1934773]: go.opentelemetry.io/collector/processor/processorhelper.NewLogsProcessor.func1({0x9779280, 0xc001e45680}, {0xc00373bba8?, 0xc0021899d0?})
otelcol-contrib[1934773]:         go.opentelemetry.io/collector/[email protected]/processorhelper/logs.go:48 +0x10b
otelcol-contrib[1934773]: go.opentelemetry.io/collector/consumer.ConsumeLogsFunc.ConsumeLogs(...)
otelcol-contrib[1934773]:         go.opentelemetry.io/collector/[email protected]/logs.go:25
otelcol-contrib[1934773]: go.opentelemetry.io/collector/consumer.ConsumeLogsFunc.ConsumeLogs(...)
otelcol-contrib[1934773]:         go.opentelemetry.io/collector/[email protected]/logs.go:25
otelcol-contrib[1934773]: go.opentelemetry.io/collector/internal/fanoutconsumer.(*logsConsumer).ConsumeLogs(0xc0025d7e60, {0x9779280, 0xc001e45680}, {0xc00373bba8?, 0xc0021899d0?})
otelcol-contrib[1934773]:         go.opentelemetry.io/[email protected]/internal/fanoutconsumer/logs.go:62 +0x1e7
otelcol-contrib[1934773]: github.com/open-telemetry/opentelemetry-collector-contrib/internal/coreinternal/consumerretry.(*logsConsumer).ConsumeLogs(0xc001a4bd40, {0x9779280?, 0xc001e45680?}, {0xc00373bba8?, 0xc0021899d0?})
otelcol-contrib[1934773]:         github.com/open-telemetry/opentelemetry-collector-contrib/internal/[email protected]/consumerretry/logs.go:37 +0x1a3
otelcol-contrib[1934773]: github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/adapter.(*receiver).consumerLoop(0xc00264b440, {0x9779280, 0xc001e45680})
otelcol-contrib[1934773]:         github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/adapter/receiver.go:125 +0x1d5
otelcol-contrib[1934773]: created by github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/adapter.(*receiver).Start in goroutine 1
otelcol-contrib[1934773]:         github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/adapter/receiver.go:68 +0x256
systemd[1]: otelcol-contrib.service: Main process exited, code=exited, status=2/INVALIDARGUMENT

Additional context

I have to use the converting function in replace_all_patterns due to it's behavior - it just renaming keys if function is not specified, but with function specified it replacing value in the matching keys. See #32896

@krokwen krokwen added bug Something isn't working needs triage New item requiring triage labels May 7, 2024
Copy link
Contributor

github-actions bot commented May 7, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@krokwen krokwen changed the title Allow replace_all_patterns function to accept not only StringGetter interface function but also StringLikeGetter [pkg/ottl] Allow replace_all_patterns function to accept not only StringGetter interface function but also StringLikeGetter May 7, 2024
@TylerHelmuth
Copy link
Member

This issue highlights a downside to our current type system. While it makes it really easy for functions to enforce specific types are passed in via Getters and error appropriately, it is difficult to chain converters if the return types don't match of, or in this case where we strictly expect a StringGetter.

If we update

to be a StringLikeGetter then we have to update the hashing functions to use StringLikeGetter which isn't great either.

We need a way for StandardStringLikeGetter to implement StringGetter so it can be used in both places since it returns a string.

@TylerHelmuth
Copy link
Member

But also I would consider the way you're using function a bug. I think we should patch that once we have a real solution for #32896. In my opinion the expected outcome of using function or replacePattern in replace_all_patterns with mode=key should be to run the new key value through the function.

@TylerHelmuth TylerHelmuth added priority:p2 Medium and removed needs triage New item requiring triage labels May 7, 2024
@TylerHelmuth
Copy link
Member

@rnishtala-sumo curious on your thoughts.

Copy link
Contributor

github-actions bot commented Jul 8, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jul 8, 2024
Copy link
Contributor

github-actions bot commented Sep 6, 2024

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants