Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved PII stripping. #1

Merged
merged 4 commits into from
Nov 2, 2022

Conversation

antonpirker
Copy link

I have now created a function, that does nicer PII stripping, where most of the keys are still there, but the values are gone.

@Agalin
Copy link
Collaborator

Agalin commented Oct 28, 2022

TBH I'm not sure if it's safe to do it like that - those keys depend on the protocol version used by mongo client and can change in the future so PII may be leaked.

Need to check how does it look with older queries but probably won't be able to do it this week.

@antonpirker
Copy link
Author

Good point! I added this, because when we redact all the values then the resulting query does not show anything, and the developer can not figure out what query is the slow one. See this screenshot:

Screenshot 2022-10-28 at 16 18 50

Having more information is very helpful:

Screenshot 2022-10-28 at 16 20 38

But you are right about old versions. It could leak PII and that is not good.

So we could:
a) go back to redacting everything.
b) only keep the first element (to have at least the collection on that the query is performed)
c) go through some older version of MongoDB (so maybe the last 2 major version or such) and make proper stripping for them and limit the SDK to those versions.

(side note: this code will come in handy if we do a Elasticsearch integration in the future, because those queries are also json)

@Agalin
Copy link
Collaborator

Agalin commented Oct 31, 2022

I'm worried about future ones, not the past. Monitoring API was only introduced in 3.1 so there is not much to check in that part. And bounding upper version is a really bad idea.

There is also an option of adding an optional callback but with it being a kind of an integration that should enable itself by default (in the future) it's not a great solution either.

TBH maybe just adding a warning in docs "PII stripping tested up to Mongo x.y.z" is enough?

Copy link
Collaborator

@Agalin Agalin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also rerun your test code (that one spans from which you've included) with pymongo 3.1?

sentry_sdk/integrations/pymongo.py Outdated Show resolved Hide resolved
sentry_sdk/integrations/pymongo.py Show resolved Hide resolved
@antonpirker
Copy link
Author

TBH maybe just adding a warning in docs "PII stripping tested up to Mongo x.y.z" is enough?

Yes, I think this should be enough.
I have now changed the stripping code to make it easier to read and comprehend. And I guess it is good enough. If new fields are added to the command they will be stripped by default. One will have to add them to SAFE_COMMAND_ATTRIBUTES or to one of the special stripping ifs [1] for them not to be stripped away in their entirety.

1: https://github.com/antonpirker/sentry-python/blob/pymonog-strip-pii/sentry_sdk/integrations/pymongo.py#L58-L79

@antonpirker
Copy link
Author

antonpirker commented Nov 2, 2022

Could you please merge this into your branch @Agalin
We then can continue in your PR. (Fixing the broken tests in Python 2.7 and so on)
Thanks!

@Agalin Agalin merged commit 116c40a into operasoftware:pymongo Nov 2, 2022
@Agalin
Copy link
Collaborator

Agalin commented Nov 2, 2022

Yeah, LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants