Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logging: add a filter for query parameters #4424

Merged
merged 11 commits into from
Nov 23, 2021

Conversation

dunglas
Copy link
Collaborator

@dunglas dunglas commented Nov 17, 2021

Even if it's considered a bad security practice, it's common that query parameters contain sensitive information. Some examples:

This PR adds the possibility to redact such sensitive information from the logs.

Copy link
Member

@francislavoie francislavoie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a couple comments

modules/logging/filters.go Outdated Show resolved Hide resolved
modules/logging/filters.go Outdated Show resolved Hide resolved
@francislavoie francislavoie added feature ⚙️ New feature or request under review 🧐 Review is pending before merging labels Nov 17, 2021
@francislavoie francislavoie added this to the v2.5.0 milestone Nov 17, 2021
@francislavoie francislavoie changed the title feat(logging): add a filter for query parameters logging: add a filter for query parameters Nov 17, 2021
Copy link
Contributor

@wiese wiese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cha­peau bas for the idea and initiative. I just saw you mention the PR on twitter, am not really involved with caddy – feel free to ignore my comment.

Maybe that's taking the second step before the first, but my first reaction when seeing the "REDACTED" value was thinking: this is going to be confusing sooner or later. Maybe the better way would be to anonymize/pseudonymize the data (i.e. not having the same log value for all incoming values, as seen in graylog, elastic). Does that resonate with you? Would you see this as an extra feature or an alternative, indeed?

modules/logging/filters_test.go Outdated Show resolved Hide resolved
@dunglas
Copy link
Collaborator Author

dunglas commented Nov 17, 2021

Maybe the better way would be to anonymize/pseudonymize the data

@wiese What would be the use case? Other similar software seems to redact and not to anonymize:

Also, it's how the current filters (replace and delete) work.
Maybe could we add more advanced anonymization/hashing features later if there is a use case?

@wiese
Copy link
Contributor

wiese commented Nov 18, 2021

What would be the use case? Other similar software seems to redact and not to anonymize

The use case, like for the current implementation, would be making sure the original values are not contained in the logs but (for pseudonymization) with the added benefit that

  • different queries don't look alike, so as to avoid that wrong conclusions are drawn from that (e.g. two queries looking the same in the logs even when they were not, or spotting an illusive hot spot when grouping them)
  • and vice versa, in queries which originally were the same, one can be found based on another and they can be grouped (think: by user JWT)

Of course, this functionality could totally exist in addition to redacting as is and added later if there is a demand. No worries, I just wanted to run the thought in case you say "oh, that's actually closer to what I want".

@francislavoie
Copy link
Member

We could do a thing where if the replacement value is the string HASHED then it would SHA-256 hash the existing value (maybe truncated to 8 chars? I dunno) to deal with anonymizing without making different requests look alike.

Obviously it should be noted that this isn't a good idea for passwords for example, i.e. anything where brute-forcing the value would be problematic. In which case, simply redacting is better.

@dunglas
Copy link
Collaborator Author

dunglas commented Nov 18, 2021

Or maybe could we just add an extra action called hash in addition to replace and delete? It would be very easy to implement.

@dunglas
Copy link
Collaborator Author

dunglas commented Nov 18, 2021

Maybe could we add this feature in a follow-up PR? (I will do it, but it will be easier to add hash support everywhere when the pending PRs will be merged).

Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Kevin, thanks for the contribution! I like where this is going.

Just wondering... could both replace and delete actions be reduced to a single replace action? Since deleting could be replacing with an empty string, and you wouldn't need different action types and stuff. It'd be simpler. You could then just list the replacements you want to perform on the query string.

Edit: I submitted this as "request changes" but I guess I should have just posted this as a regular comment. Sorry.

modules/logging/filters.go Outdated Show resolved Hide resolved
modules/logging/filters.go Outdated Show resolved Hide resolved
@dunglas
Copy link
Collaborator Author

dunglas commented Nov 20, 2021

Hi Matt,

This is possible but it could get messy if we add more actions such as hash. On the other hand, we could use @francislavoie's suggestion of using special strings.

Currently:

format filter {
    fields {
	uri query {
		replace foo1 REDACTED
                replace foo2 FOO2
		delete bar
		hash baz
	    }
    }
}

With your suggestion:

format filter {
     fields {
         uri query foo1 REDACTED foo2 FOO bar `` baz HASH
     }
}

I find the first one more readable but I'm ok to update the PR to use the second option if you prefer.

Also, in we implement your solution, maybe should we also deprecate the existing delete filter for consistency (using an empty string as a parameter of the replace directive also works).

WDYT?

@mholt
Copy link
Member

mholt commented Nov 22, 2021

Thanks for the extra information. That's very interesting! As discussed in Slack, I think the current approach is fine, since it allows other actions to be configured like hashing. It's also easier to read. Thanks!

Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one thing I noticed. Looking good though.

modules/logging/filters.go Outdated Show resolved Hide resolved
francislavoie
francislavoie previously approved these changes Nov 22, 2021
Copy link
Member

@francislavoie francislavoie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

mholt
mholt previously requested changes Nov 22, 2021
Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! LGTM except a little nit on naming / exporting one identifier.

modules/logging/filters.go Outdated Show resolved Hide resolved
modules/logging/filters.go Outdated Show resolved Hide resolved
Copy link
Member

@francislavoie francislavoie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@francislavoie francislavoie merged commit bcac2be into caddyserver:master Nov 23, 2021
@francislavoie francislavoie removed the under review 🧐 Review is pending before merging label Nov 23, 2021
@dunglas dunglas deleted the feat/query-filter branch November 23, 2021 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature ⚙️ New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants