Anonymization

Text anonymization in many languages for python3.6+ using Faker.

Install

pip install anonymization

Example

Replace emails and named entities in english

This example use NamedEntitiesAnonymizer which require spacy and a spacy model.

pip install spacy
python -m spacy download en_core_web_lg

>>> from anonymization import Anonymization, AnonymizerChain, EmailAnonymizer, NamedEntitiesAnonymizer

>>> text = "Hi John,\nthanks for you for subscribing to Superprogram, feel free to ask me any question at [email protected] \n Superprogram the best program!"
>>> anon = AnonymizerChain(Anonymization('en_US'))
>>> anon.add_anonymizers(EmailAnonymizer, NamedEntitiesAnonymizer('en_core_web_lg'))
>>> anon.anonymize(text)
'Hi Holly,\nthanks for you for subscribing to Ariel, feel free to ask me any question at [email protected] \n Ariel the best program!'

Or make it reversible with pseudonymize:

>>> from anonymization import Anonymization, AnonymizerChain, EmailAnonymizer, NamedEntitiesAnonymizer

>>> text = "Hi John,\nthanks for you for subscribing to Superprogram, feel free to ask me any question at [email protected] \n Superprogram the best program!"
>>> anon = AnonymizerChain(Anonymization('en_US'))
>>> anon.add_anonymizers(EmailAnonymizer, NamedEntitiesAnonymizer('en_core_web_lg'))
>>> clean_text, patch = anon.pseudonymize(text)

>>> print(clean_text)
'Christopher, \nthanks for you for subscribing to Audrey, feel free to ask me any question at [email protected] \n Audrey the best program!'

revert_text = anon.revert(clean_text, patch)

>>> print(text == revert_text)
true

Replace a french phone number with a fake one

Our solution supports many languages along with their specific information formats.

For example, we can generate a french phone number:

>>> from anonymization import Anonymization, PhoneNumberAnonymizer
>>>
>>> text = "C'est bien le 0611223344 ton numéro ?"
>>> anon = Anonymization('fr_FR')
>>> phoneAnonymizer = PhoneNumberAnonymizer(anon)
>>> phoneAnonymizer.anonymize(text)
"C'est bien le 0144939332 ton numéro ?"

More examples in /examples

Included anonymizers

Files

name	lang
FilePathAnonymizer	-

Internet

name	lang
EmailAnonymizer	-
UriAnonymizer	-
MacAddressAnonymizer	-
Ipv4Anonymizer	-
Ipv6Anonymizer	-

Phone numbers

name	lang
PhoneNumberAnonymizer	47+
msisdnAnonymizer	47+

Date

name	lang
DateAnonymizer	-

Other

name	lang
NamedEntitiesAnonymizer	7+
DictionaryAnonymizer	-
SignatureAnonymizer	7+

Custom anonymizers

Custom anonymizers can be easily created to fit your needs:

class CustomAnonymizer():
    def __init__(self, anonymization: Anonymization):
        self.anonymization = anonymization

    def anonymize(self, text: str) -> str:
        return modified_text
        # or replace by regex patterns in text using a faker provider
        return self.anonymization.regex_anonymizer(text, pattern, provider)
        # or replace all occurences using a faker provider
        return self.anonymization.replace_all(text, matchs, provider)

You may also add new faker provider with the helper Anonymization.add_provider(FakerProvider) or access the faker instance directly Anonymization.faker.

Benchmark

This module is benchmarked on synth_dataset from presidio-research and returns accuracy result(0.79) better than Microsoft's solution(0.75)

You can run the benchmark using docker:

docker build . -f ./benchmark/dockerfile -t anonbench
docker run -it --rm --name anonbench anonbench

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
anonymization		anonymization
benchmark		benchmark
examples		examples
.gitignore		.gitignore
BACKLOG.txt		BACKLOG.txt
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anonymization

Install

Example

Replace emails and named entities in english

Replace a french phone number with a fake one

Included anonymizers

Files

Internet

Phone numbers

Date

Other

Custom anonymizers

Benchmark

License

About

Releases

Packages

Languages

License

Smile-SA/anonymization

Folders and files

Latest commit

History

Repository files navigation

Anonymization

Install

Example

Replace emails and named entities in english

Replace a french phone number with a fake one

Included anonymizers

Files

Internet

Phone numbers

Date

Other

Custom anonymizers

Benchmark

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages