Skip to content

Commit

Permalink
Download spacy model if not found
Browse files Browse the repository at this point in the history
- add install info to README
- remove explicit spacy download step from CI
- uncomment data_manager_test cells
  • Loading branch information
lkeegan committed Mar 20, 2023
1 parent 4f624b9 commit e56486a
Show file tree
Hide file tree
Showing 4 changed files with 36 additions and 23 deletions.
5 changes: 2 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,20 +20,19 @@ jobs:
uses: actions/checkout@v3

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install -e .
python -m spacy download de_core_news_sm
- name: Run pytest
run: |
cd moralization
python -m pytest -s --cov=. --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v1
uses: codecov/codecov-action@v3
with:
fail_ci_if_error: true
files: moralization/coverage.xml
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,14 @@ Automated identification of text structures related to moralization.

**_This project is currently under development!_**

## Installation

```
pip install git+https://github.com/ssciwr/moralization.git
```

## Example notebooks
Here you can find afew short introduction notebooks on google colab.
Here you can find a few short introduction notebooks on google colab.
- [DemoNotebook_statistics](https://colab.research.google.com/github/ssciwr/moralization/blob/main/notebooks/DemoNotebook_statistics.ipynb)

- [DemoNotebook_interactive_plots](https://colab.research.google.com/github/ssciwr/moralization/blob/main/notebooks/DemoNotebook_interactive_plots.ipynb)
Expand Down
10 changes: 9 additions & 1 deletion moralization/input_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@
from lxml.etree import XMLSyntaxError
import spacy

try:
import de_core_news_sm
except ImportError:
logging.warning(
"Required Spacy model 'de_core_news_sm' was not found. Attempting to download it.."
)
spacy.cli.download("de_core_news_sm")
import de_core_news_sm

pkg = importlib_resources.files("moralization")

Expand Down Expand Up @@ -127,7 +135,7 @@ def cas_to_doc(cas, ts):
# "KAT5Ausformulierung": "KAT5-Forderung implizit",
# "Kommentar": "KOMMENTAR",
}
nlp = spacy.load("de_core_news_sm")
nlp = de_core_news_sm.load()
doc = nlp(cas.sofa_string)

doc_train = nlp(cas.sofa_string)
Expand Down
36 changes: 18 additions & 18 deletions notebooks/data_manager_test.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@
{
"cell_type": "code",
"execution_count": null,
"id": "aab49754-7b6b-479d-aa49-ae2564199337",
"id": "65049175",
"metadata": {},
"outputs": [],
"source": [
"from moralization.data_manager import DataManager\n"
"from moralization.data_manager import DataManager"
]
},
{
Expand All @@ -17,7 +17,7 @@
"metadata": {},
"outputs": [],
"source": [
"data_manager = DataManager(\"./../moralization/data/\")\n"
"data_manager = DataManager(\"./../moralization/data/\")"
]
},
{
Expand All @@ -27,7 +27,7 @@
"metadata": {},
"outputs": [],
"source": [
"# data_manager.return_analyzer_result(\"frequency\")\n"
"data_manager.return_analyzer_result(\"frequency\")"
]
},
{
Expand All @@ -37,7 +37,7 @@
"metadata": {},
"outputs": [],
"source": [
"# data_manager.return_analyzer_result(\"length\")"
"data_manager.return_analyzer_result(\"length\")"
]
},
{
Expand All @@ -47,7 +47,7 @@
"metadata": {},
"outputs": [],
"source": [
"# data_manager.return_analyzer_result(\"span_distinctiveness\")\n"
"data_manager.return_analyzer_result(\"span_distinctiveness\")"
]
},
{
Expand All @@ -57,7 +57,7 @@
"metadata": {},
"outputs": [],
"source": [
"# data_manager.return_analyzer_result(\"boundary_distinctiveness\")\n"
"data_manager.return_analyzer_result(\"boundary_distinctiveness\")"
]
},
{
Expand All @@ -67,7 +67,7 @@
"metadata": {},
"outputs": [],
"source": [
"# data_manager.occurence_analysis(\"table\")"
"data_manager.occurence_analysis(\"table\")"
]
},
{
Expand All @@ -77,7 +77,7 @@
"metadata": {},
"outputs": [],
"source": [
"# data_manager.occurence_analysis(\"corr\")"
"data_manager.occurence_analysis(\"corr\")"
]
},
{
Expand All @@ -87,7 +87,7 @@
"metadata": {},
"outputs": [],
"source": [
"# data_manager.occurence_analysis(\"heatmap\")"
"data_manager.occurence_analysis(\"heatmap\")"
]
},
{
Expand All @@ -97,7 +97,7 @@
"metadata": {},
"outputs": [],
"source": [
"# data_manager.interactive_analysis().show()"
"data_manager.interactive_analysis().show()"
]
},
{
Expand All @@ -107,7 +107,7 @@
"metadata": {},
"outputs": [],
"source": [
"# data_manager.visualize_data(\"all\", spans_key=\"task1\")"
"data_manager.visualize_data(\"all\", spans_key=\"task1\")"
]
},
{
Expand All @@ -117,7 +117,7 @@
"metadata": {},
"outputs": [],
"source": [
"# data_manager.export_data_DocBin()"
"data_manager.export_data_DocBin()"
]
},
{
Expand All @@ -127,8 +127,8 @@
"metadata": {},
"outputs": [],
"source": [
"# data_manager.spacy_train(working_dir = \".\", config=\"../moralization/data/config.cfg\", n_epochs = 20)\n",
"# data_manager.spacy_import_model(\"./output/model-best\")"
"data_manager.spacy_train(working_dir=\".\", config=\"../moralization/data/config.cfg\", n_epochs=20)\n",
"data_manager.spacy_import_model(\"./output/model-best\")"
]
},
{
Expand All @@ -138,7 +138,7 @@
"metadata": {},
"outputs": [],
"source": [
"# data_manager.spacy_validation()"
"data_manager.spacy_validation()"
]
},
{
Expand All @@ -148,7 +148,7 @@
"metadata": {},
"outputs": [],
"source": [
"# data_manager.spacy_test_string(\"Dies ist ein toller test\")"
"data_manager.spacy_test_string(\"Dies ist ein toller test\")"
]
},
{
Expand Down Expand Up @@ -176,7 +176,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
"version": "3.10.10"
}
},
"nbformat": 4,
Expand Down

0 comments on commit e56486a

Please sign in to comment.