Skip to content

artefactory/NLPretext

Repository files navigation

Insert new lib name here

Insert new logo here

😫 Working on an NLP project and tired of always looking for the same silly preprocessing functions on the web?

πŸ˜₯ Need to efficiently extract email adresses from a document? Hashtags from tweets? Remove accents from a French post?

Insert new lib name here got you covered! πŸš€

Insert new lib name here packages in a unique library all the text preprocessing functions you need to ease your NLP project.

πŸ” Quickly explore below our functions referential.

Cannot find a new one? Feel free to open an issue).

Installation

This package has been tested on Python 3.7.

To install this library you should first clone the repository:

git clone git@github.com:artefactory/nautilus-nlp.git && cd nautilus_nlp/

We strongly advise you to do the remaining steps in a virtual environnement.

First install the required files:

pip install -r requirements.txt

then install the library with pip:

pip install -e .

This library uses Spacy as tokenizer. Current models supported are en_core_web_sm and fr_core_news_sm.

Functions

Replacing emails

example = "I have forwarded this email to obama@whitehouse.gov"
example = replace_emails(replace_with="*EMAIL*")
print(example)
# "I have forwarded this email to *EMAIL*"

Replacing phone numbers

Insert example here

Removing Hashtags

Insert example here

Extracting emojis

Insert example here

Make HTML documentation

Γ  updater

In order to make the html Sphinx documentation, you need to run at the nautilus_nlp root path: sphinx-apidoc -f nautilus_nlp -o docs/ This will generate the .rst files. You can generate the doc with cd docs && make html

You can now open the file index.html located in the build folder.

Project Organization


Γ  updater

β”œβ”€β”€ LICENSE
β”œβ”€β”€ Makefile           <- Makefile with commands like `make data` or `make train`
β”œβ”€β”€ README.md          <- The top-level README for developers using this project.
β”œβ”€β”€ data               <- Scripts & bits to download datasets to try nautilus
β”‚   β”œβ”€β”€ external
β”‚   β”œβ”€β”€ interim
β”‚   β”œβ”€β”€ processed
β”‚   └── raw
β”œβ”€β”€ docker             <- Where to build a docker image using this lib
β”œβ”€β”€ docs               <- Sphinx HTML documentation
β”‚   β”œβ”€β”€ _build
β”‚   β”‚   └── html
β”‚   β”œβ”€β”€ source
β”œβ”€β”€ models
β”œβ”€β”€ nautilus_nlp       <- Main Nautilus Package. This is where the code lives
β”‚   β”œβ”€β”€ config
β”‚   β”œβ”€β”€ data
β”‚   β”œβ”€β”€ models
β”‚   β”œβ”€β”€ preprocessing
β”‚   β”œβ”€β”€ scripts
β”‚   └── utils
β”œβ”€β”€notebooks           <- Various notebooks explaining how to use Nautilus_NLP library
β”œβ”€β”€ tests <- Where the tests lives
β”‚   └── testfolder_fileloader
β”œβ”€β”€ wiki               <- Where the Markdown for the Wiki lives
β”œβ”€β”€ setup.py           <- makes project pip installable (pip install -e .) so nautilus_nlp can be imported
β”œβ”€β”€ requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
                          generated with `pip freeze > requirements.txt`