Insert new logo here
😫 Working on an NLP project and tired of always looking for the same silly preprocessing functions on the web?
😥 Need to efficiently extract email adresses from a document? Hashtags from tweets? Remove accents from a French post?
Insert new lib name here got you covered! 🚀
Insert new lib name here packages in a unique library all the text preprocessing functions you need to ease your NLP project.
🔍 Quickly explore below our functions referential.
Cannot find a new one? Feel free to open an issue).
This package has been tested on Python 3.7.
To install this library you should first clone the repository:
git clone [email protected]:artefactory/nautilus-nlp.git && cd nautilus_nlp/
We strongly advise you to do the remaining steps in a virtual environnement.
First install the required files:
pip install -r requirements.txt
then install the library with pip:
pip install -e .
This library uses Spacy as tokenizer. Current models supported are en_core_web_sm
and fr_core_news_sm
.
example = "I have forwarded this email to [email protected]"
example = replace_emails(replace_with="*EMAIL*")
print(example)
# "I have forwarded this email to *EMAIL*"
Insert example here
Insert example here
Insert example here
à updater
In order to make the html Sphinx documentation, you need to run at the nautilus_nlp root path:
sphinx-apidoc -f nautilus_nlp -o docs/
This will generate the .rst files.
You can generate the doc with
cd docs && make html
You can now open the file index.html located in the build folder.
à updater
├── LICENSE
├── Makefile <- Makefile with commands like `make data` or `make train`
├── README.md <- The top-level README for developers using this project.
├── data <- Scripts & bits to download datasets to try nautilus
│ ├── external
│ ├── interim
│ ├── processed
│ └── raw
├── docker <- Where to build a docker image using this lib
├── docs <- Sphinx HTML documentation
│ ├── _build
│ │ └── html
│ ├── source
├── models
├── nautilus_nlp <- Main Nautilus Package. This is where the code lives
│ ├── config
│ ├── data
│ ├── models
│ ├── preprocessing
│ ├── scripts
│ └── utils
├──notebooks <- Various notebooks explaining how to use Nautilus_NLP library
├── tests <- Where the tests lives
│ └── testfolder_fileloader
├── wiki <- Where the Markdown for the Wiki lives
├── setup.py <- makes project pip installable (pip install -e .) so nautilus_nlp can be imported
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
generated with `pip freeze > requirements.txt`