Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contrib/go-redis/redis.v8: Allow Option To Remove Raw Tag #1022

Closed
sjmtan opened this issue Sep 29, 2021 · 16 comments · Fixed by #1091
Closed

contrib/go-redis/redis.v8: Allow Option To Remove Raw Tag #1022

sjmtan opened this issue Sep 29, 2021 · 16 comments · Fixed by #1091

Comments

@sjmtan
Copy link
Contributor

sjmtan commented Sep 29, 2021

I believe there's a case where if the command is sufficiently large, this will cause serious memory issues on the client. Not only that, I believe this should be optional because in certain cases, this could be a security issue. Therefore, we should allow an option to remove the raw tag.

To refresh, see snippet from the contrib hook. See the raw_command part.

	raw := cmd.String()
	length := strings.Count(raw, " ")
	p := ddh.params
	opts := make([]ddtrace.StartSpanOption, 0, 5+len(ddh.additionalTags)+1) // 5 options below + for additionalTags + for analyticsRate
	opts = append(opts,
		tracer.SpanType(ext.SpanTypeRedis),
		tracer.ServiceName(p.config.serviceName),
		tracer.ResourceName(raw[:strings.IndexByte(raw, ' ')]),
		tracer.Tag("redis.raw_command", raw),
		tracer.Tag("redis.args_length", strconv.Itoa(length)),
	)

The implementation of cmd.String() (I believe) boils down to this. You can see that it contains args.

func cmdString(cmd Cmder, val interface{}) string {
	b := make([]byte, 0, 64)

	for i, arg := range cmd.Args() {
		if i > 0 {
			b = append(b, ' ')
		}
		b = internal.AppendArg(b, arg)
	}

	if err := cmd.Err(); err != nil {
		b = append(b, ": "...)
		b = append(b, err.Error()...)
	} else if val != nil {
		b = append(b, ": "...)
		b = internal.AppendArg(b, val)
	}

	return internal.String(b)
}

Interestingly, there's been a similar feature request and subsequent revert for memcache:

We've been able to isolate the problem to tracing by removing Redis tracing from the codepath that OOM'ed our servers without issue. But we haven't been able to dive into it in a more granular fashion than that. Happy to be told I'm wrong.

@sjmtan
Copy link
Contributor Author

sjmtan commented Oct 7, 2021

@knusbaum, @felixge - only raising because this actually could produce security issues for customers.

@gbbr
Copy link
Contributor

gbbr commented Nov 5, 2021

Sorry for the late reply. This is fine. Thanks for opening the PR.

@gbbr
Copy link
Contributor

gbbr commented Nov 5, 2021

Can you explain in more detail how this causes problems? The agent quantizes and obfuscates these values, so ultimately a command will look exactly like this (including the ellipsis): SET .... Everything is truncated in the resource, and the redis.raw_command tag is obfuscated to eliminate all sensitive information. Is this not the case for you? What is the security issue you are seeing?

Regarding long commands being a problem: the only place where a "long" command would travel is locally between the tracing client (app) and the agent. Is that an issue? Or, is the problem memory usage in the application because of this string? Can you share a snippet of code? It might be that we can obtain the span from context and change it's raw command or resource today, programmatically, without introducing new options and having you wait for a new release.

Regardless, it is fine if you want to skip the setting of this tag by means of an option, but I don't think we should skip the resource name. If we do that, everything will just be called redis.command and I'm unsure how useful that will be for you when looking at data.

@gbbr
Copy link
Contributor

gbbr commented Nov 8, 2021

From @shannontan-bolt in #1026:

The real issue for us was that we were trying to save an object of data in a Redis cache and that meant this large blob was putting put as a tag with tracing.

Let's solve this issue then, if this is the real issue. First, I'd like to understand what about it is the issue:

  • Is it your concern about sensitive data? In which case, it shouldn't be because this blob should get obfuscated by the agent. You should've already noticed if you looked at this data in Datadog. Is this not the case? If not, that's a bug in the agent and we can fix it after some debugging. Basically, a command like SET key 123 should show up as SET key ? in the UI.
  • Is your concern that this tag name is too big? Does it cause any performance problems like memory usage?

Let's better define the issue and then we can find the right solution.

@sjmtan
Copy link
Contributor Author

sjmtan commented Nov 8, 2021

From @shannontan-bolt in #1026:

The real issue for us was that we were trying to save an object of data in a Redis cache and that meant this large blob was putting put as a tag with tracing.

Let's solve this issue then, if this is the real issue. First, I'd like to understand what about it is the issue:

  • Is it your concern about sensitive data, in which case, it shouldn't be because this blob should get obfuscated by the agent. You should've already noticed if you looked at this data in Datadog. Is this not the case? If not, that's a bug in the agent and we can fix it after some debugging.

  • Is your concern that this tag name is too big? Does it cause any performance problems like memory usage?

Let's better define the issue and then we can find the right solution.

Both are/were issues.

I'm on my phone, apologies for the terseness. To be clear, if it is obfuscated for resource name, that's fine. I'm not sure how it would intelligently obfuscate based on the raw command, but if it does great.

  1. The current release took down multiple instances with tracing on, and there were no memory issues once we disabled this specifically redis tracing.

  2. I saw the presence of IP addresses within the Datadog UI because we use Redis for rate limiting by IP. I can imagine that if we were to rate limit by more sensitive data, this would be problematic.

So both are very real issues to me, and I've both disabled the tracing library and filtered out ingested spans to avoid any security issues. I only included the resource name change because it uses the same full command whereas I believe the operation name just uses the type of Redis command.

@sjmtan
Copy link
Contributor Author

sjmtan commented Nov 8, 2021

I didn't get notified for your comment a few days ago.

Within the UI, I suppose it is possible that the resource name is fine with the raw command, but I'm not certain now since I've had to disable the Redis trace. Maybe it was Redis.raw_command that was the source of both memory issues and presenting sensitive data.

I also can't provide a snippet but I would believe that a sufficiently large JSON should trigger this issue. Our code caches responses of variable size from an HTTP call we make to a vendor, so not really much to share other than these fit just fine within our Redis instance.

@gbbr
Copy link
Contributor

gbbr commented Nov 8, 2021

Ok. Thanks for explaining. I am also sorry for taking so long to respond. I can assure you it was not intentional.

I'd really appreciate it if we could try a few things first so we can clearly define the issue before we start thinking about the right solution.

The resource name should be quantized in the Datadog Agent, basically if you have something like SET key value, the UI will only show SET .... Regardless of what the command is, you will only see the command name, followed by ellipsis (...). This works by default.

However, for the raw command (I was wrong here), it only gets obfuscated if this setting is enabled in the Datadog Agent. The config is:

apm_config:
  obfuscation:
    redis:
      enabled: true

Can you give this a try? Unfortunately this setting can not be applied via an env. var. right now, so I'm hoping you can alter the datadog.yaml setting of your Datadog Agent. If not, and you need an env. var., we could potentially include it into the next release which goes into RC testing next week.

That should resolve the security issue.

The current release took down multiple instances with tracing on, and there were no memory issues once we disabled this specifically redis tracing.

This is the concerning part that I'd like to better understand. I can only imagine this could've been due to memory usage. In which case, we could introduce something like you already have in your PR: DisableRawCommand setting which causes that tag to not be set, to resolve the redis.raw_command issue.

As for the resource, we could try something like WithResourceNamer, where there would be a callback intercepting the command and allowing it to be renamed before it gets assigned as a resource. There, you could do something like (naive example): strings.Split(cmd, " ")[0], getting only the command name. This will still allow you to get separation between various commands and also skip the long parts.

WDYT?

@sjmtan
Copy link
Contributor Author

sjmtan commented Nov 8, 2021

@gbbr, I can't test the resource name part without the change to the DisableRawCommand, at least not in any live env. I suppose I can make it work in a testing/local env but that requires work, so first I'd like to ask the following:

tracer.ResourceName(raw[:strings.IndexByte(raw, ' ')]), is already in the code snippet I linked above and is unchanged in my PR. Why do these two have different behaviors? They seem to update the same tag too.

@gbbr
Copy link
Contributor

gbbr commented Nov 9, 2021

@gbbr, I can't test the resource name part without the change to the DisableRawCommand, at least not in any live env. I suppose I can make it work in a testing/local env but that requires work, so first I'd like to ask the following:

What I'm proposing is that your PR adds two one options:

  • DisableRawCommand which completely removes the setting of redis.raw_command
  • WithResourceNamer which takes a callback allowing to modify the resource that gets set, before it is set. Similar to the ones we already have in our other integrations (you can get inspired from any of them). Sorry, I just realised what you said. It seems like you don't need this, because only the first token (up to the first space) is taken.

I suspect this would solve the issue for you. What do you think?

Why do these two have different behaviors? They seem to update the same tag too.

They evolved differently as Datadog grew and developed as a product. We use the resource name as an aggregate as you may have already seen. To reduce cardinality and to make more sense, it was quantized by truncating everything after the command and adding the ellipsis (...). Later on, the need came for the raw command to be obfuscated due to security reasons, so we introduced that too, along with the intention to reveal more than just the command itself and only hide user data. This behaviour was not reproduced in the resource name because it would be considered a breaking change and we would have risked breaking time series and stats continuity for existing customers using these integrations. I hope this makes sense.

@gbbr
Copy link
Contributor

gbbr commented Nov 9, 2021

Sorry if my previous answer is confusing, to summarise: I'm proposing to change your PR to only remove the raw command, but leave the resource as it is. Would that be fine for you?

@sjmtan
Copy link
Contributor Author

sjmtan commented Nov 26, 2021

This is still confusing :)

ResourceName tag appears to be set twice.

  1. https://github.com/DataDog/dd-trace-go/blob/v1/contrib/go-redis/redis.v8/redis.go#L141
  2. https://github.com/DataDog/dd-trace-go/blob/v1/contrib/go-redis/redis.v8/redis.go#L144

tracer.ResourceName is:

// ResourceName sets the given resource name on the started span. A resource could
// be an SQL query, a URL, an RPC method or something else.
func ResourceName(name string) StartSpanOption {
	return Tag(ext.ResourceName, name)
}

I think my end state here would be:

  1. Only include the resource set in L141 since that only includes up to the first space. L144 already doesn't make sense as is.
  2. Have an option to exclude the raw command

@sjmtan
Copy link
Contributor Author

sjmtan commented Dec 8, 2021

@gbbr, I reread your comment - maybe I'm confused because you edited your comment. Are you saying that we should still remove the second ResourceName set completely and use the flag as originally proposed on whether to send the raw command?

@gbbr
Copy link
Contributor

gbbr commented Dec 9, 2021

ResourceName tag appears to be set twice.

Oh my! I think we've likely missed this and it is most definitely a (dangerous) error! I agree with your end state proposal. I think we're on the same page. To reiterate, what I'm understanding:

  • We remove L144
  • We add DisableRawCommand

IIUC, let's do it 🚀

@sjmtan
Copy link
Contributor Author

sjmtan commented Dec 10, 2021

I agree - would it be possible if you could put the PR together or edit my PR? I still find it a bit of struggle to get my local system working and it keeps insisting on a ton of changes to go.mod/sum + not running tests. I'm not sure why but don't quite have the time to address it whereas I'm sure it is trivial for you.

gbbr added a commit that referenced this issue Dec 14, 2021
…urce

This change adds a new option called `WithSkippedRawCommand` which
determines the instrumentation code to skip adding the
`redis.raw_command` tag to spans. This is useful in cases when the
Datadog Agent Redis obfuscation is disabled (which it is, by default)
and commands may contain sensitive information.

Additionally, duplicate entries and an incosistency was found when
setting the resource name for Redis spans, which was fixed as part of
this PR.

Closes #1022
@gbbr
Copy link
Contributor

gbbr commented Dec 14, 2021

@shannontan-bolt please review it: #1091

gbbr added a commit that referenced this issue Dec 15, 2021
…urce

This change adds a new option called `WithSkippedRawCommand` which
determines the instrumentation code to skip adding the
`redis.raw_command` tag to spans. This is useful in cases when the
Datadog Agent Redis obfuscation is disabled (which it is, by default)
and commands may contain sensitive information.

Additionally, duplicate entries and an incosistency was found when
setting the resource name for Redis spans, which was fixed as part of
this PR.

Closes #1022
gbbr added a commit that referenced this issue Dec 16, 2021
…urce (#1091)

This change adds a new option called `WithSkippedRawCommand` which
determines the instrumentation code to skip adding the
`redis.raw_command` tag to spans. This is useful in cases when the
Datadog Agent Redis obfuscation is disabled (which it is, by default)
and commands may contain sensitive information.

Additionally, duplicate entries and an incosistency was found when
setting the resource name for Redis spans, which was fixed as part of
this PR.

Closes #1022
@zball
Copy link

zball commented Feb 8, 2023

@gbbr per this comment above: #1022 (comment)

Unfortunately this setting can not be applied via an env. var. right now, so I'm hoping you can alter the datadog.yaml setting of your Datadog Agent. If not, and you need an env. var., we could potentially include it into the next release which goes into RC testing next week.

Has an env var been added yet for this? I haven't yet been able to find one. If it has been added is it possible to point me in the direction for it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants