Optionally obscure sensitive information in subset #1

catherinedevlin · 2014-10-31T03:30:18Z

For protecting PII, etc. Should be able to integrate with an existing library to obscure data while preserving its overall "flavor".

twekberg · 2015-01-14T16:01:08Z

This would be useful in my group (Laboratory Medicine department in the University of Washington Medical Center) which has databases with PHI (http://en.wikipedia.org/wiki/Protected_health_information). After extracting test data from such a database, the PHI must be mangled prior to storing in a repository.

Perhaps by specifying the PHI columns and their data types, the program could generate random data for that data type. This could work well for scalar types.

dstufft · 2015-04-16T19:43:19Z

I would find this useful too.

dstufft · 2015-04-16T20:40:21Z

To be specific, my use case is that I'm the primary developer of Warehouse, which will replace the software that powers PyPI, and one of the challenges of that (as an OSS project itself) is how do we create a public dataset that is representative of the real data without being the entire set of real data and without exposing anything sensitive. Currently my method of doing this is basically manually copying some data and then going in and manually sanitizing it to remove data. It would be great to be able to rely on rdbms-subsetter to automate this for me though.

brki · 2015-11-12T10:04:04Z

I agree that this would be a great feature.

For the moment, I'm first extracting a subset, then using another tool to do the anonymization.

Add `--table` argument + misc bug fixes and improvements

catherinedevlin added the enhancement label Oct 31, 2014

catherinedevlin pushed a commit that referenced this issue Dec 13, 2015

Merge pull request #1 from birdonfire/table_arg_and_schema_support

ecb1549

Add `--table` argument + misc bug fixes and improvements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optionally obscure sensitive information in subset #1

Optionally obscure sensitive information in subset #1

catherinedevlin commented Oct 31, 2014

twekberg commented Jan 14, 2015

dstufft commented Apr 16, 2015

dstufft commented Apr 16, 2015

brki commented Nov 12, 2015

Optionally obscure sensitive information in subset #1

Optionally obscure sensitive information in subset #1

Comments

catherinedevlin commented Oct 31, 2014

twekberg commented Jan 14, 2015

dstufft commented Apr 16, 2015

dstufft commented Apr 16, 2015

brki commented Nov 12, 2015