GitHub - nikolaypavlov/spark-nlp-workshop

Spark NLP Workshop

Docker version 18.09.2+

git clone https://github.com/nikolaypavlov/spark-nlp-workshop.git
cd spark-nlp-workshop

Download the News Category Dataset from Kaggle and unzip it into data directory.
Build the container and start it.

If you have make utility in your environment:

make build
make run

If you don't have make utility:

docker build -t spark-nlp-workshop .
docker run -it --rm -e LANG=C.UTF-8 -e LC_ALL=C.UTF-8 -v `pwd`/data:/app/data -p 8888:8888 spark-nlp-workshop

Note: for Windows use %cd% or full path to the data directory instead of `pwd`/ to start the container

Open Jupyter notebook in the browser: http://localhost:8888 and paste session token from the Terminal to login form.

Open spark-spacy.ipynb notebook and try to run the code blocks to test that everything works fine.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
img		img
.bumpversion.cfg		.bumpversion.cfg
.gitignore		.gitignore
.version		.version
Dockerfile		Dockerfile
Makefile		Makefile
Readme.md		Readme.md
requirements.txt		requirements.txt