Skip to content
This repository has been archived by the owner on Jul 16, 2020. It is now read-only.

Optionally obscure sensitive information in subset #1

Open
catherinedevlin opened this issue Oct 31, 2014 · 4 comments
Open

Optionally obscure sensitive information in subset #1

catherinedevlin opened this issue Oct 31, 2014 · 4 comments

Comments

@catherinedevlin
Copy link
Contributor

For protecting PII, etc. Should be able to integrate with an existing library to obscure data while preserving its overall "flavor".

@twekberg
Copy link

This would be useful in my group (Laboratory Medicine department in the University of Washington Medical Center) which has databases with PHI (http://en.wikipedia.org/wiki/Protected_health_information). After extracting test data from such a database, the PHI must be mangled prior to storing in a repository.

Perhaps by specifying the PHI columns and their data types, the program could generate random data for that data type. This could work well for scalar types.

@dstufft
Copy link

dstufft commented Apr 16, 2015

I would find this useful too.

@dstufft
Copy link

dstufft commented Apr 16, 2015

To be specific, my use case is that I'm the primary developer of Warehouse, which will replace the software that powers PyPI, and one of the challenges of that (as an OSS project itself) is how do we create a public dataset that is representative of the real data without being the entire set of real data and without exposing anything sensitive. Currently my method of doing this is basically manually copying some data and then going in and manually sanitizing it to remove data. It would be great to be able to rely on rdbms-subsetter to automate this for me though.

@brki
Copy link
Contributor

brki commented Nov 12, 2015

I agree that this would be a great feature.

For the moment, I'm first extracting a subset, then using another tool to do the anonymization.

catherinedevlin pushed a commit that referenced this issue Dec 13, 2015
Add `--table` argument + misc bug fixes and improvements
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants