Skip to content

Commit

Permalink
Merge pull request #163 from artefactory/fix/docker_and_publish
Browse files Browse the repository at this point in the history
[FIX] Removed direct dependency and changed docker registry
  • Loading branch information
Cedric-Magnan authored Sep 16, 2021
2 parents 6426e2b + adb8f84 commit c538a8a
Show file tree
Hide file tree
Showing 8 changed files with 20 additions and 74 deletions.
10 changes: 5 additions & 5 deletions .github/workflows/cd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,16 +37,16 @@ jobs:
file: ./docker/Dockerfile
push: true
tags: |
ghcr.io/artefactory/NLPretext:${{ steps.tag.outputs.tag_name }}
ghcr.io/artefactory/NLPretext:latest
cache-from: type=registry,ref=ghcr.io/artefactory/NLPretext:latest
ghcr.io/artefactory/nlpretext:${{ steps.tag.outputs.tag_name }}
ghcr.io/artefactory/nlpretext:latest
cache-from: type=registry,ref=ghcr.io/artefactory/nlpretext:latest
cache-to: type=inline

- name: Scan image
uses: anchore/scan-action@v2
id: scan
with:
image: "ghcr.io/artefactory/NLPretext:${{ steps.tag.outputs.tag_name }}"
image: "ghcr.io/artefactory/nlpretext:${{ steps.tag.outputs.tag_name }}"
acs-report-enable: true
- name: upload Anchore scan SARIF report
uses: github/codeql-action/upload-sarif@v1
Expand Down Expand Up @@ -87,7 +87,7 @@ jobs:
- name: Install dependencies
run: |
poetry install
poetry install -E torch
- name: Publish to PyPI
env:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ jobs:
- name: Install dependencies
run: |
poetry run pip install --upgrade pip
poetry install -E spacy-tokenizer -E torch
poetry install -E torch
- name: Run safety checks
run: |
Expand Down
9 changes: 0 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,15 +71,6 @@ or with `Poetry`
poetry add nlpretext
```

This library uses Spacy as tokenizer. Current models supported are `en_core_web_sm` and `fr_core_news_sm`. If not installed, run the following commands:
```bash
pip install nlpretext[spacy-tokenizer]
```

```bash
poetry add nlpretext -E spacy-tokenizer
```


# Usage

Expand Down
3 changes: 0 additions & 3 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,6 @@ To install this library you should first clone the repository:

pip install nlpretext

This library uses Spacy as tokenizer. Current models supported are `en_core_web_sm` and `fr_core_news_sm`. If not installed, run the following commands:

pip install nlpretext[spacy-tokenizer]

.. toctree::
:maxdepth: 4
Expand Down
16 changes: 13 additions & 3 deletions nlpretext/token/tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,15 @@
# Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
from typing import Any, List, Optional, Union

import os
import re

import nltk
import spacy
from sacremoses import MosesDetokenizer, MosesTokenizer

MODEL_REGEX = re.compile(r"^[a-z]{2}_(?:core|dep|ent|sent)_(?:web|news|wiki|ud)_(?:sm|md|lg|trf)$")


class LanguageNotHandled(Exception):
pass
Expand Down Expand Up @@ -62,9 +67,14 @@ def _load_spacy_model(model: str) -> Any:
try:
return spacy.load(model)
except OSError:
raise LanguageNotInstalledError(
f"Model {model} is not installed. " f"To install, run: python -m spacy download {model}"
)
if MODEL_REGEX.match(model):
os.system(f"python -m spacy download {model}") # nosec
return spacy.load(model)
else:
raise LanguageNotInstalledError(
f"Model {model} is not installed. "
f"To install, run: python -m spacy download {model}"
)


def _get_spacy_tokenizer(lang: str) -> Optional[spacy.tokenizer.Tokenizer]:
Expand Down
33 changes: 1 addition & 32 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 0 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,6 @@ fastparquet = ">=0.4.1"
dask = {version = ">=2021.5.0", extras = ["complete"]}
distributed = ">=2021.5.0"
tornado = ">=6.0.3"
fr-core-news-sm = {url = "https://github.com/explosion/spacy-models/releases/download/fr_core_news_sm-3.1.0/fr_core_news_sm-3.1.0.tar.gz", optional = true}
en-core-web-sm = {url = "https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.1.0/en_core_web_sm-3.1.0.tar.gz", optional = true}
torch = {version = "^1.9.0", optional = true}

[tool.poetry.dev-dependencies]
Expand Down Expand Up @@ -99,7 +97,6 @@ types-chardet = ">=0.1.3"
types-click = ">=7.1.2"

[tool.poetry.extras]
spacy-tokenizer = ["fr-core-news-sm", "en-core-web-sm"]
torch = ["torch"]

[tool.black]
Expand Down
18 changes: 0 additions & 18 deletions tests/test_tokenizer.py

This file was deleted.

0 comments on commit c538a8a

Please sign in to comment.