Insert new lib name here

Insert new logo here

😫 Working on an NLP project and tired of always looking for the same silly preprocessing functions on the web?

😥 Need to efficiently extract email adresses from a document? Hashtags from tweets? Remove accents from a French post?

Insert new lib name here got you covered! 🚀

Insert new lib name here packages in a unique library all the text preprocessing functions you need to ease your NLP project.

🔍 Quickly explore below our functions referential.

Replacing emails
Replacing phone numbers
Removing hashtags
Extracting emojis

Cannot find a new one? Feel free to open an issue).

Installation

This package has been tested on Python 3.7.

To install this library you should first clone the repository:

git clone git@github.com:artefactory/nautilus-nlp.git && cd nautilus_nlp/

We strongly advise you to do the remaining steps in a virtual environnement.

First install the required files:

pip install -r requirements.txt

then install the library with pip:

pip install -e .

This library uses Spacy as tokenizer. Current models supported are en_core_web_sm and fr_core_news_sm.

Functions

Replacing emails

example = "I have forwarded this email to obama@whitehouse.gov"
example = replace_emails(replace_with="*EMAIL*")
print(example)
# "I have forwarded this email to *EMAIL*"

Replacing phone numbers

Insert example here

Removing Hashtags

Insert example here

Extracting emojis

Insert example here

Make HTML documentation

à updater

In order to make the html Sphinx documentation, you need to run at the nautilus_nlp root path: sphinx-apidoc -f nautilus_nlp -o docs/ This will generate the .rst files. You can generate the doc with cd docs && make html

You can now open the file index.html located in the build folder.

Project Organization

à updater

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data               <- Scripts & bits to download datasets to try nautilus
│   ├── external
│   ├── interim
│   ├── processed
│   └── raw
├── docker             <- Where to build a docker image using this lib
├── docs               <- Sphinx HTML documentation
│   ├── _build
│   │   └── html
│   ├── source
├── models
├── nautilus_nlp       <- Main Nautilus Package. This is where the code lives
│   ├── config
│   ├── data
│   ├── models
│   ├── preprocessing
│   ├── scripts
│   └── utils
├──notebooks           <- Various notebooks explaining how to use Nautilus_NLP library
├── tests <- Where the tests lives
│   └── testfolder_fileloader
├── wiki               <- Where the Markdown for the Wiki lives
├── setup.py           <- makes project pip installable (pip install -e .) so nautilus_nlp can be imported
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
                          generated with `pip freeze > requirements.txt`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Insert new lib name here

Installation

Functions

Replacing emails

Replacing phone numbers

Removing Hashtags

Extracting emojis

Make HTML documentation

Project Organization

Files

README.md

Latest commit

History

README.md

File metadata and controls

Insert new lib name here

Installation

Functions

Replacing emails

Replacing phone numbers

Removing Hashtags

Extracting emojis

Make HTML documentation

Project Organization