Insert new logo here
π« Working on an NLP project and tired of always looking for the same silly preprocessing functions on the web?
π₯ Need to efficiently extract email adresses from a document? Hashtags from tweets? Remove accents from a French post?
Insert new lib name here got you covered! π
Insert new lib name here packages in a unique library all the text preprocessing functions you need to ease your NLP project.
π Quickly explore below our functions referential.
Cannot find a new one? Feel free to open an issue).
This package has been tested on Python 3.7.
To install this library you should first clone the repository:
git clone git@github.com:artefactory/nautilus-nlp.git && cd nautilus_nlp/
We strongly advise you to do the remaining steps in a virtual environnement.
First install the required files:
pip install -r requirements.txt
then install the library with pip:
pip install -e .
This library uses Spacy as tokenizer. Current models supported are en_core_web_sm
and fr_core_news_sm
.
example = "I have forwarded this email to obama@whitehouse.gov"
example = replace_emails(replace_with="*EMAIL*")
print(example)
# "I have forwarded this email to *EMAIL*"
Insert example here
Insert example here
Insert example here
Γ updater
In order to make the html Sphinx documentation, you need to run at the nautilus_nlp root path:
sphinx-apidoc -f nautilus_nlp -o docs/
This will generate the .rst files.
You can generate the doc with
cd docs && make html
You can now open the file index.html located in the build folder.
Γ updater
βββ LICENSE
βββ Makefile <- Makefile with commands like `make data` or `make train`
βββ README.md <- The top-level README for developers using this project.
βββ data <- Scripts & bits to download datasets to try nautilus
β βββ external
β βββ interim
β βββ processed
β βββ raw
βββ docker <- Where to build a docker image using this lib
βββ docs <- Sphinx HTML documentation
β βββ _build
β β βββ html
β βββ source
βββ models
βββ nautilus_nlp <- Main Nautilus Package. This is where the code lives
β βββ config
β βββ data
β βββ models
β βββ preprocessing
β βββ scripts
β βββ utils
βββnotebooks <- Various notebooks explaining how to use Nautilus_NLP library
βββ tests <- Where the tests lives
β βββ testfolder_fileloader
βββ wiki <- Where the Markdown for the Wiki lives
βββ setup.py <- makes project pip installable (pip install -e .) so nautilus_nlp can be imported
βββ requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
generated with `pip freeze > requirements.txt`