-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/ottl] Allow users to specify the format of the hashed replacement string in the replace_pattern* editors #27820
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This isn't the most efficient solution, but I believe it works - merge_maps(cache, ExtractPatterns(message, "passwd\=(?P<password>[^\s]*)"), "upsert")
- set(cache["password"], SHA256(cache["password"]))
- set(cache["password"], Concat(["passwd=", cache["password"]], ""))
- replace_pattern(message, passwd\=([^\s]*), cache["password"]) I am a little worried about the complexity we're adding to the replace pattern funcs, but I am willing to discuss further. |
Outside of a defining an additional optional argument for replace_pattern, the proposed solution would only need to check for a prefix as seen in this code block With this code change the replace_pattern func would have the following arguments
A total of 5 arguments including two optional arguments. Doing this allows us to accomplish hashing with a prefix in one statement as below
IMO if we're supporting hashing with We already merged the changes that allow this today
where the hash function is optional. But if users decide to use this, they would lose the "passwd=" prefix which seems valuable. Which is where the replacement prefix comes in. |
If we're going to add an option for providing a prefix, I think we should also allow specifying a suffix. In my opinion the easiest way to handle this would be to add an option to provide a format string that handles both at once. That said, I share the concern @TylerHelmuth brought up around complexity. I wonder if there's a pattern or mechanism we could use here so this could be generalized to other functions. |
Posting comments shared via the Collector SIG meeting today: This is a valid usability concern and use case that users will experience. There is concern about the complexity and edge cases it introduces into the already complex landscape of OTTL replace functions. A couple questions I'd like to think about:
|
As a user, I think this kind of solution would be something I'd understand:
I'm not sure it's possible to implement though. 😅 |
yes I agree that the using the same approach, another way this could look like is
so the |
supporting something like this will need changes to the ottl grammar to use Converters inside a string which seems like a complex thing to do. |
I went ahead and changed the replacement prefix to be a format string in this commit and added tests for it. It includes some basic validation that the argument |
I think an optional |
yes I can make this update to the PR |
@rnishtala-sumo we should also validate as early as possible that format can only be set if function is set. I am hoping we can do that during startup and not during hot-path execution. |
From #27686 (review) by @MovieStoreGuy:
This is a pretty interesting idea that I'd like to consider because it keeps the function arguments simpler. Are there any use cases/capabilities we would be losing if we forced the optional function to only be applied on a capture group? |
It seems like a good idea, will need understand how the match group is evaluated in |
I think we should be fine with limiting the optional function to a capture group, as one mostly tries to identify sensitive data using a prefix. |
@TylerHelmuth I don't have any real-world cases to point to here, but I'm a little worried we would lose too much control over how the hash is created if we do this. For example if I'm trying to create a hash that is based on a username and client, I may want to combine the values in a certain way before hashing them so its consistent throughout my system. So I might have |
It would require a lot of work in the language, but I had an idea that would allow us to do this in a way that I think would provide a consistent user experience at the cost of being more technically sophisticated. If we change the optional function parameter of the replace_pattern functions to be a list of functions that are applied in sequence, and add support for currying with arbitrary parameters to OTTL (indicated by a character like
So the resulting set of calls to get the replacement string would look something like
This would require adding support for curried functions in OTTL. So any function call with an I think an advantage of this approach is that it would let us format the string as we wish, support more sophisticated replacements because of the list of functions, and wouldn't require us to have an optional argument that depends on another. Support for currying would also allow users to massage existing OTTL functions into new ones that have similar signatures. On the other hand, providing a |
To summarize here, I think we're considering the following options
At this point, I'm leaning towards the first option for its simplicity/UX. We could also potentially hash a multiple capture group combo if the user enters something like |
@TylerHelmuth @evan-bradley this is a draft PR that uses option 1. It doesn't cover multiple capture groups yet, but all the existing test cases pass and it supports adding a prefix to a hashed capture group. |
The following PR now uses Option 1 and is marked ready for review. New tests have been added for this and all the existing test cases pass too. Open to discussing other options. IMO this option offers UX consistent with masking. |
Hi all, I was asked by @TylerHelmuth to post my use-case after messaging in the otelcol slack. I am trying to rename all datapoint attributes from CamelCase to snake_case. My two approaches and neither one works. - context: datapoint
statements:
- replace_all_patterns(attributes, "key", "(.*)", ConvertCase("$$1", "snake")) This doesn't work because I think the $$1 is substituted after the ConvertCase function has already executed. - context: datapoint
statements:
- replace_all_patterns(attributes, "key", "(.*)", "$$1", ConvertCase)) This doesn't work because ConvertCase has multiple parameters, not just one. I believe we would need something like a partial function/currying in order to use this here. Thanks! |
Referencing the syntax explained here by @evan-bradley - #27820 (comment) This is what Function currying could potentially look like in the use cases below -
There is a caveat here in that, the first function used must only accept one input argument (SHA256, NoOp). For example, using the above approach one cannot use |
One small correction, after reading up on partial function application and currying, it turns out we're actually talking about partial function application here: taking a function of a given arity and returning a function of lesser arity by "fixing" some of the arguments. Currying is similar in concept but would technically produce a series of functions that still take all the arguments. Apparently this a common mistake, my bad.
@rnishtala-sumo This would actually be okay. The optional function passed to So the following would be possible:
Again, this is mostly to allow us to more concisely create single-use functions without requiring users to do something like this in their Collector config.
|
Closed by #30837. Any future |
Component(s)
pkg/ottl
Is your feature request related to a problem? Please describe.
This function/editor would be used to replace substrings (identified with a regex pattern) with a hash value/digest
Let's assume we have the following log message
"username = user, passwd = password"
To replace the password with its hash value, one would do the following
replace_pattern(message, passwd\=([^\s]*), "$$1", SHA256)
However the above ottl expression would replace the entire password substring with the hash value of "password" like so
"username = user, <hash digest of "password">"
However the end user might need it to be something like below:
"username = user, passwd = <hash digest of "password">"
Note the "passwd=" prefix
This isn't a problem with masking because one could do something like
replace_pattern(message, passwd\=([^\s]*), "passwd=*****")
This includes the prefix "passwd=" in the replacement string. This isn't viable for hashing since we hash the entire replacement string including any prefix.
Describe the solution you'd like
Proposing that we add another optional argument, which allows the end-user to specify a format string for the replacement
replace_pattern(message, passwd\=([^\s]*), "$$1", "passwd=%s", SHA256)
which results in adding a prefix to the hashed password
"username = user, passwd = <hash digest of "password">"
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: