Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spaCy 3.0 #7869

Merged
merged 80 commits into from
Mar 23, 2021
Merged
Show file tree
Hide file tree
Changes from 50 commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
b39b116
spacy-version-upgrade
koaning Feb 2, 2021
1c07102
Merge branch 'main' into spacy-3-dot-0-support
koaning Feb 2, 2021
ef3d267
added-lock-file-change
koaning Feb 2, 2021
64d1e51
lock
koaning Feb 2, 2021
d1532b2
spacy-install-en-container
koaning Feb 2, 2021
880b84b
removed-more-link-moments
koaning Feb 2, 2021
45360a7
tokenizer-tests-pass
koaning Feb 2, 2021
2e967b1
touche-lint-touche
koaning Feb 2, 2021
06b0d13
crf-entity-extractor-tests-pass
koaning Feb 2, 2021
00c7857
test-train-spacy-tests-green
koaning Feb 2, 2021
a281c06
updated-conftest
koaning Feb 2, 2021
be612b4
add-base-setting-moodbot
koaning Feb 2, 2021
5407ef3
more-config-updates
koaning Feb 2, 2021
4da984f
added-german-model
koaning Feb 2, 2021
a80ca7f
one-more
koaning Feb 2, 2021
e9f9679
lint
koaning Feb 2, 2021
6f70d6a
Merge branch 'main' into spacy-3-dot-0-support
koaning Feb 2, 2021
68f5360
docstring
koaning Feb 2, 2021
72dd034
Merge branch 'spacy-3-dot-0-support' of github.com:RasaHQ/rasa into s…
koaning Feb 2, 2021
32ba805
no-longer-assume-link
koaning Feb 17, 2021
c54751a
Merge branch 'main' into spacy-3-dot-0-support
koaning Feb 26, 2021
f92970a
added spaCy config to example bots
koaning Mar 4, 2021
75b8d08
added warnings
koaning Mar 4, 2021
29d1a98
Merge branch 'spacy-3-dot-0-support' of github.com:RasaHQ/rasa into s…
koaning Mar 4, 2021
3e9815a
Merge branch 'main' into spacy-3-dot-0-support
koaning Mar 4, 2021
65473b2
using warnings.warn now
koaning Mar 4, 2021
55cf91a
Merge branch 'spacy-3-dot-0-support' of github.com:RasaHQ/rasa into s…
koaning Mar 4, 2021
3b3a782
docker-and-vincent-todo-fix
koaning Mar 4, 2021
7e37f02
remove **
koaning Mar 4, 2021
717196e
added missing docstrings
koaning Mar 4, 2021
904a077
tests added
koaning Mar 4, 2021
96166f7
first-few-doc-changes
koaning Mar 4, 2021
d1437e5
fix-documentation
koaning Mar 4, 2021
7ce6d5c
Added changelog as well as links to the docs
koaning Mar 5, 2021
7470824
expanded-changelog
koaning Mar 5, 2021
d1af3e6
Merge branch 'main' into spacy-3-dot-0-support
koaning Mar 5, 2021
63d90da
Update changelog/7869.feature.md
koaning Mar 9, 2021
4a526d0
Update changelog/7869.feature.md
koaning Mar 9, 2021
e93b899
Update changelog/7869.feature.md
koaning Mar 9, 2021
c7e7b0e
Apply suggestions from code review
koaning Mar 9, 2021
215f7c8
Update rasa/nlu/utils/spacy_utils.py
koaning Mar 15, 2021
3cfdbe6
remove-headers-changelog
koaning Mar 15, 2021
7b08794
Merge branch 'spacy-3-dot-0-support' of github.com:RasaHQ/rasa into s…
koaning Mar 15, 2021
4e0d227
added tobis comments
koaning Mar 15, 2021
d7c1bc0
added-migration-guide
koaning Mar 15, 2021
de84aed
added-migration
koaning Mar 15, 2021
05ca149
update-poetry
koaning Mar 15, 2021
6494d08
Following Tobias' advice. Delete and rebuild Poetry.lock
koaning Mar 15, 2021
c9eb5cd
Update docs/docs/migration-guide.mdx
koaning Mar 17, 2021
2166f10
Update docs/docs/components.mdx
koaning Mar 17, 2021
5cbe23c
Added Johannes' comments
koaning Mar 18, 2021
8b633b8
added-comments
koaning Mar 18, 2021
2846e1f
update-lock
koaning Mar 18, 2021
b5a5fb6
Merge branch 'main' into spacy-3-dot-0-support
koaning Mar 18, 2021
c72dcea
lock
koaning Mar 18, 2021
8d069bb
Update docs/docs/migration-guide.mdx
koaning Mar 18, 2021
b0ec44c
changelog
koaning Mar 18, 2021
af6c037
oh flake, oh you
koaning Mar 18, 2021
a0c05ae
shake shake, out out the flake flake
koaning Mar 18, 2021
55c5b2f
docstrings are silly things sometimes
koaning Mar 18, 2021
e663090
Apply suggestions from code review
koaning Mar 22, 2021
12899ad
class -> static method
koaning Mar 22, 2021
03125b7
deprecationwarning
koaning Mar 22, 2021
7776d34
added-link-to-docs
koaning Mar 22, 2021
3ee065a
added-another-doc-link
koaning Mar 22, 2021
19877c1
Merge branch 'main' into spacy-3-dot-0-support
koaning Mar 22, 2021
0eee985
updated-lock-file
koaning Mar 22, 2021
0bc3cee
Merge branch 'spacy-3-dot-0-support' of github.com:RasaHQ/rasa into s…
koaning Mar 22, 2021
8c00da0
tests
koaning Mar 22, 2021
499cdca
Merge branch 'main' into spacy-3-dot-0-support
koaning Mar 22, 2021
f053fce
Merge branch 'main' into spacy-3-dot-0-support
koaning Mar 22, 2021
99ab6b7
Update rasa/nlu/utils/spacy_utils.py
koaning Mar 22, 2021
4713e89
Update rasa/nlu/utils/spacy_utils.py
koaning Mar 22, 2021
cf56c96
Update rasa/nlu/utils/spacy_utils.py
koaning Mar 22, 2021
6267f60
Merge branch 'main' into spacy-3-dot-0-support
koaning Mar 22, 2021
5c3039e
could it be?
koaning Mar 22, 2021
107f9a2
Apply suggestions from code review
koaning Mar 23, 2021
67ae129
Merge branch 'main' into spacy-3-dot-0-support
koaning Mar 23, 2021
1febc48
Make imports pretty
koaning Mar 23, 2021
7376835
Merge branch 'main' into spacy-3-dot-0-support
koaning Mar 23, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -131,8 +131,6 @@ prepare-spacy:
poetry install -E spacy
poetry run python -m spacy download en_core_web_md
poetry run python -m spacy download de_core_news_sm
poetry run python -m spacy link en_core_web_md en --force
poetry run python -m spacy link de_core_news_sm de --force

prepare-mitie:
wget --progress=dot:giga -N -P data/ https://github.com/mit-nlp/MITIE/releases/download/v0.4/MITIE-models-v0.2.tar.bz2
Expand Down
41 changes: 41 additions & 0 deletions changelog/7869.feature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
Upgraded Rasa to be compatible with spaCy 3.0.
koaning marked this conversation as resolved.
Show resolved Hide resolved

This means that we can support more features for more languages but there are also a few changes.

SpaCy 3.0 deprecated the `spacy link <language model>` command so that means that from now on [the
full model name](https://spacy.io/models) needs to be used in the `config.yml` file.

**Before**

Before you could run `spacy link en en_core_web_md` and then we would be able
to pick up the correct model from the `language` parameter.

```yaml
language: en

pipeline:
- name: SpacyNLP
```

**Now**

This behavior will be deprecated and instead you'll want to be explicit in `config.yml`.

```yaml
language: en

pipeline:
- name: SpacyNLP
model: en_core_web_md
```

**Fallback**

To make the transition easier, Rasa will try to fall back to a medium spaCy model when-ever
a compatible language is configured for the entire pipeline in `config.yml` even if you don't
specify a `model`. This fallback behavior is temporary and will be deprecated in spaCy 3.0.
koaning marked this conversation as resolved.
Show resolved Hide resolved

We've updated our docs to reflect these changes. All examples now show a direct link to the
correct spaCy model. We've also added a warning to the [SpaCyNLP](components.mdx#spacynlp)
docs that explains the fallback behavior.

1 change: 1 addition & 0 deletions data/test_config/config_pretrained_embeddings_spacy_de.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ language: "de"

pipeline:
- name: SpacyNLP
model: "de_core_news_sm"
- name: SpacyTokenizer
- name: SpacyFeaturizer
- name: RegexFeaturizer
Expand Down
1 change: 1 addition & 0 deletions data/test_config/config_spacy_entity_extractor.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
language: en
pipeline:
- name: "SpacyNLP"
model: "en_core_web_md"
koaning marked this conversation as resolved.
Show resolved Hide resolved
- name: "SpacyTokenizer"
- name: "SpacyFeaturizer"
- name: "RegexFeaturizer"
Expand Down
4 changes: 1 addition & 3 deletions docker/Dockerfile.pretrained_embeddings_spacy_de
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,7 @@ RUN . /opt/venv/bin/activate && poetry build -f wheel -n && \
# make sure we use the virtualenv
ENV PATH="/opt/venv/bin:$PATH"

# spacy link
RUN python -m spacy download de_core_news_sm && \
python -m spacy link de_core_news_sm de
RUN python -m spacy download de_core_news_sm

# start a new build stage
FROM ${IMAGE_BASE_NAME}:base-${BASE_IMAGE_HASH} as runner
Expand Down
4 changes: 1 addition & 3 deletions docker/Dockerfile.pretrained_embeddings_spacy_en
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,7 @@ RUN . /opt/venv/bin/activate && poetry build -f wheel -n && \
# make sure we use the virtualenv
ENV PATH="/opt/venv/bin:$PATH"

# spacy link
RUN python -m spacy download en_core_web_md && \
python -m spacy link en_core_web_md en
RUN python -m spacy download en_core_web_md

# start a new build stage
FROM ${IMAGE_BASE_NAME}:base-${BASE_IMAGE_HASH} as runner
Expand Down
1 change: 1 addition & 0 deletions docker/configs/config_pretrained_embeddings_spacy_de.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ language: "de"

pipeline:
- name: SpacyNLP
model: "de_core_news_md"
- name: SpacyTokenizer
- name: SpacyFeaturizer
- name: RegexFeaturizer
Expand Down
1 change: 1 addition & 0 deletions docker/configs/config_pretrained_embeddings_spacy_en.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ language: "en"

pipeline:
- name: SpacyNLP
model: "en_core_web_md"
- name: SpacyTokenizer
- name: SpacyFeaturizer
- name: RegexFeaturizer
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ language: "en"

pipeline:
- name: SpacyNLP
model: "en_core_web_md"
- name: SpacyTokenizer
- name: SpacyFeaturizer
- name: RegexFeaturizer
Expand Down
17 changes: 7 additions & 10 deletions docs/docs/components.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -106,11 +106,8 @@ word vectors in your pipeline.

* **Configuration**

You need to specify the language model to use.
By default the language configured in the pipeline will be used as the language model name.
If the spaCy model to be used has a name that is different from the language tag (`"en"`, `"de"`, etc.),
the model name can be specified using the configuration variable `model`.
The name will be passed to `spacy.load(name)`.
You need to specify the language model to use. The name will be passed to `spacy.load(name)`.
You can find more information on the available models on the [spaCy documentation](https://spacy.io/usage/models).

```yaml-rasa
pipeline:
Expand All @@ -130,12 +127,12 @@ word vectors in your pipeline.
[installing SpaCy](./installation.mdx#dependencies-for-spacy).

In addition to SpaCy's pretrained language models, you can also use this component to
load fastText vectors, which are available for [hundreds of languages](https://github.com/facebookresearch/fastText/blob/master/docs/crawl-vectors.md).
If you want to incorporate a custom model you've found into spaCy, check out their page on
[adding languages](https://spacy.io/usage/adding-languages/). As described in the documentation, you need to
register your language model and link it to the language identifier, which will allow Rasa to load and use your new language
by passing in your language identifier as the `language` option.
attach spaCy models that you've trained yourself.

:::caution Fallback
Rasa Open Source will try to fallback to a common model on your behalf if you don't pass a `model` setting. This is a
temporary feature we've introduced as part of the spaCy 3.0 migration but the fallback will be removed in Rasa Open Source 3.0.0.
koaning marked this conversation as resolved.
Show resolved Hide resolved
:::

### HFTransformersNLP

Expand Down
5 changes: 2 additions & 3 deletions docs/docs/installation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -223,12 +223,11 @@ You can install it with the following commands:
```bash
pip3 install rasa[spacy]
python3 -m spacy download en_core_web_md
python3 -m spacy link en_core_web_md en
```

This will install Rasa Open Source as well as spaCy and its language model
for the English language. We recommend using at least the
medium sized models (`_md`) instead of the spaCy's
for the English language, but there are many other languages availabe too.
koaning marked this conversation as resolved.
Show resolved Hide resolved
We recommend using at least the "medium" sized models (`_md`) instead of the spaCy's
default small `en_core_web_sm` model. Small models require less
memory to run, but will somewhat reduce intent classification performance.
koaning marked this conversation as resolved.
Show resolved Hide resolved

Expand Down
45 changes: 45 additions & 0 deletions docs/docs/migration-guide.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,51 @@ description: |
This page contains information about changes between major versions and
how you can migrate from one version to another.

## Rasa 2.4 to 2.5

### Machine Learning Components

#### SpaCy 3.0

Rasa now supports spaCy 3.0. This means that we can support more features for more
languages but this also introduced a breaking change. SpaCy 3.0 deprecated the
`spacy link <language model>` command so that means that from now on
koaning marked this conversation as resolved.
Show resolved Hide resolved
[the full model name](https://spacy.io/models) needs to be used in the `config.yml` file.

**Before**

Before you could run `spacy link en en_core_web_md` and then we would be able
to pick up the correct model from the `language` parameter.

```yaml
language: en

pipeline:
- name: SpacyNLP
```

**Now**

This behavior will be deprecated and instead you'll want to be explicit in `config.yml`.

```yaml
language: en

pipeline:
- name: SpacyNLP
model: en_core_web_md
```

**Fallback**

To make the transition easier, Rasa will try to fall back to a medium spaCy model when-ever
koaning marked this conversation as resolved.
Show resolved Hide resolved
a compatible language is configured for the entire pipeline in `config.yml` even if you don't
koaning marked this conversation as resolved.
Show resolved Hide resolved
specify a `model`. This fallback behavior is temporary and will be deprecated in Rasa Open Source 3.0.0.

We've updated our docs to reflect these changes. All examples now show a direct link to the
correct spaCy model. We've also added a warning to the [SpaCyNLP](components.mdx#spacynlp)
docs that explains the fallback behavior.

## Rasa 2.3.3 to Rasa 2.3.4

:::caution
Expand Down
1 change: 1 addition & 0 deletions docs/docs/reaching-out-to-user.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -283,6 +283,7 @@ names in your training data, since Spacy has a `PERSON` dimension:
pipeline:
# other components
- name: SpacyNLP
model: "en_core_web_md"
- name: SpacyEntityExtractor
dimensions: ["PERSON"]
```
Expand Down
3 changes: 1 addition & 2 deletions docs/docs/tuning-your-model.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,7 @@ we recommend the following pipeline:
```

It uses the [SpacyFeaturizer](./components.mdx#spacyfeaturizer), which provides
pre-trained word embeddings from either GloVe or fastText in many different languages
(see [Language Models](./components.mdx#language-models)).
pre-trained word embeddings (see [Language Models](./components.mdx#language-models)).

If you don't use any pre-trained word embeddings inside your pipeline, you are not bound to a specific language
and can train your model to be more domain specific.
Expand Down
1 change: 1 addition & 0 deletions examples/moodbot/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ language: en

pipeline:
- name: "SpacyNLP"
model: "en_core_web_md"
- name: "SpacyTokenizer"
- name: "SpacyFeaturizer"
- name: "DIETClassifier"
Expand Down
2 changes: 1 addition & 1 deletion examples/reminderbot/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ pipeline:
- name: "DIETClassifier"
epochs: 100
- name: SpacyNLP
model: "en"
model: "en_core_web_md"
- name: SpacyEntityExtractor
dimensions: ["PERSON"]
- name: "EntitySynonymMapper"
Expand Down
Loading