scrubadub_stanford

scrubadub removes personally identifiable information from text. scrubadub_stanford is an extension that uses Stanford's NER model to remove personal information from text.

This package contains three flavours of interfacing with Stanford's NER models that can be used as a detector:

scrubadub_stanford.detectors.StanfordEntityDetector - A detector that uses the Stanford NER model to find locations, names and organizations. Download size circa 250MB.
scrubadub_stanford.detectors.CoreNlpEntityDetector - The same interface as the StanfordEntityDetector, but using Stanza's CoreNLPClient to interface with the CoreNLP Java Server. Download size circa 510MB.
scrubadub_stanford.detectors.StanzaEntityDetector - Similar to the above but using Stanza's native Python pipelines. Download size circa 210MB. No Java required. This is the recommended detector for speed and footprint.

Prerequisites

A minimum version of Java Runtime Environment 8 is required for StanfordEntityDetector and CoreNlpEntityDetector. Check which version by running:

$ java -version

It should be at least version 1.8, but if not, please run the following commands:

Linux:

$ sudo apt update
$ sudo apt install openjdk-8-jre

MacOS:

$ brew tap adoptopenjdk/openjdk
$ brew install adoptopenjdk8-jre

For more information on how to use this package see the scrubadub stanford documentation and the scrubadub repository.

New maintainers

LeapBeyond are excited to be supporting scrubadub with ongoing maintenance and development. Thanks to all of the contributors who made this package a success, but especially @deanmalmgren, IDEO and Datascope.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
requirements		requirements
scrubadub_stanford		scrubadub_stanford
tests		tests
.coveragerc		.coveragerc
.gitattributes		.gitattributes
.gitignore		.gitignore
.pylintrc		.pylintrc
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scrubadub_stanford

Prerequisites

New maintainers

About

Releases 5

Contributors 3

Languages

License

LeapBeyond/scrubadub_stanford

Folders and files

Latest commit

History

Repository files navigation

scrubadub_stanford

Prerequisites

New maintainers

About

Resources

License

Stars

Watchers

Forks

Releases 5

Contributors 3

Languages