Merge branch 'master' into docs-links-updates

RasaHQ · Sep 15, 2020 · 383cfda · 383cfda
2 parents 4f2ac2c + 570dea7
commit 383cfda
Show file tree

Hide file tree

Showing 135 changed files with 5,488 additions and 3,250 deletions.
diff --git a/.github/workflows/continous-integration.yml b/.github/workflows/continous-integration.yml
@@ -16,6 +16,13 @@ on:
 #               RasaHQ/rasa on pypi (account credentials in 1password)
 # - DOCKERHUB_PASSWORD: password for an account with write access to the rasa
 #                       repo on hub.docker.com. used to pull and upload containers
+# - RASA_OSS_TELEMETRY_WRITE_KEY: key to write to segment. Used to report telemetry.
+#                                 The key will be added to the distributions
+# - RASA_OSS_EXCEPTION_WRITE_KEY: key to write to sentry. Used to report exceptions.
+#                                 The key will be added to the distributions.
+#                                 Key can be found at https://sentry.io/settings/rasahq/projects/rasa-open-source/install/python/
+# - SENTRY_AUTH_TOKEN: authentication used to tell Sentry about any new releases
+#                      created at https://sentry.io/settings/account/api/auth-tokens/
 
 env:
   # needed to fix issues with boto during testing:
@@ -210,6 +217,14 @@ jobs:
     - name: Pull latest${{ matrix.image.tag_ext }} Docker image for caching
       run: docker pull rasa/rasa:latest${{ matrix.image.tag_ext }} || true
 
+    - name: Copy Segment write key to the package
+      if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags') && github.repository == 'RasaHQ/rasa'
+      env:
+        RASA_TELEMETRY_WRITE_KEY: ${{ secrets.RASA_OSS_TELEMETRY_WRITE_KEY }}
+        RASA_EXCEPTION_WRITE_KEY: ${{ secrets.RASA_OSS_EXCEPTION_WRITE_KEY }}
+      run: |
+        ./scripts/write_keys_file.sh
+
     - name: Build latest${{ matrix.image.tag_ext }} Docker image
       run: docker build . --file ${{ matrix.image.file }} --tag rasa/rasa:latest${{ matrix.image.tag_ext }} --cache-from rasa/rasa:latest${{ matrix.image.tag_ext }}
 
@@ -253,6 +268,13 @@ jobs:
       with:
         poetry-version: ${{ env.POETRY_VERSION }}
 
+    - name: Copy Segment write key to the package
+      env:
+        RASA_TELEMETRY_WRITE_KEY: ${{ secrets.RASA_OSS_TELEMETRY_WRITE_KEY }}
+        RASA_EXCEPTION_WRITE_KEY: ${{ secrets.RASA_OSS_EXCEPTION_WRITE_KEY }}
+      run: |
+        ./scripts/write_keys_file.sh
+
     - name: Build ⚒️ Distributions
       run: poetry build
 
@@ -262,6 +284,17 @@ jobs:
         user: __token__
         password: ${{ secrets.PYPI_TOKEN }}
 
+    - name: Notify Sentry about the release
+      env:
+        GITHUB_TAG: ${{ github.ref }}
+        SENTRY_ORG: rasahq
+        SENTRY_AUTH_TOKEN: ${{ secrets.SENTRY_AUTH_TOKEN }}
+      run: |
+        GITHUB_TAG=${GITHUB_TAG/refs\/tags\//}
+        sentry-cli releases new -p rasa-open-source "rasa-$GITHUB_TAG"
+        sentry-cli releases set-commits --auto "rasa-$GITHUB_TAG"
+        sentry-cli releases finalize "rasa-$GITHUB_TAG"
+
     - name: Notify Slack & Publish Release Notes 🗞
       env:
         GH_RELEASE_NOTES_TOKEN: ${{ secrets.GH_RELEASE_NOTES_TOKEN }}

diff --git a/.gitignore b/.gitignore
@@ -86,6 +86,8 @@ docs/docs/variables.json
 docs/docs/sources/
 docs/docs/reference/
 docs/docs/changelog.mdx
+rasa/segment_key
+rasa/keys
 
 # Local Netlify folder
 .netlify
diff --git a/.typo-ci.yml b/.typo-ci.yml
@@ -94,6 +94,7 @@ excluded_words:
   - forni
   - gzip
   - gzipped
+  - hftransformersnlp
   - initializer
   - instaclient
   - jwt
@@ -111,6 +112,7 @@ excluded_words:
   - memoization
   - miniconda
   - mitie
+  - mitiefeaturizer
   - mitie's
   - mitienlp
   - dataset
@@ -148,6 +150,7 @@ excluded_words:
   - scipy
   - sklearn
   - spacy
+  - spacyfeaturizer
   - spacynlp
   - ish
   - spaCy

diff --git a/CHANGELOG.mdx b/CHANGELOG.mdx
@@ -186,7 +186,7 @@ https://github.com/RasaHQ/rasa/tree/master/changelog/ . -->
   `from_entity` (see [Forms](./forms.mdx)).
 
   :::note
-  Composite entities are currently just supported by the [DIETClassifier](./components/intent-classifiers.mdx#dietclassifier) and [CRFEntityExtractor](./components/entity-extractors.mdx#crfentityextractor).
+  Composite entities are currently just supported by the [DIETClassifier](./components.mdx#dietclassifier) and [CRFEntityExtractor](./components.mdx#crfentityextractor).
 
   :::
 
@@ -385,7 +385,7 @@ https://github.com/RasaHQ/rasa/tree/master/changelog/ . -->
 
 * [#5006](https://github.com/rasahq/rasa/issues/5006): Channel `hangouts` for Rasa integration with Google Hangouts Chat is now supported out-of-the-box.
 
-* [#5389](https://github.com/rasahq/rasa/issues/5389): Add an optional path to a specific directory to download and cache the pre-trained model weights for [HFTransformersNLP](./components/language-models.mdx#hftransformersnlp).
+* [#5389](https://github.com/rasahq/rasa/issues/5389): Add an optional path to a specific directory to download and cache the pre-trained model weights for [HFTransformersNLP](./components.mdx#hftransformersnlp).
 
 * [#5422](https://github.com/rasahq/rasa/issues/5422): Add options `tensorboard_log_directory` and `tensorboard_log_level` to `EmbeddingIntentClassifier`,
   `DIETClasifier`, `ResponseSelector`, `EmbeddingPolicy` and `TEDPolicy`.
@@ -544,18 +544,18 @@ https://github.com/RasaHQ/rasa/tree/master/changelog/ . -->
 
 * [#4088](https://github.com/rasahq/rasa/issues/4088): Add story structure validation functionality (e.g. rasa data validate stories –max-history 5).
 
-* [#5065](https://github.com/rasahq/rasa/issues/5065): Add [LexicalSyntacticFeaturizer](./components/featurizers.mdx#lexicalsyntacticfeaturizer) to sparse featurizers.
+* [#5065](https://github.com/rasahq/rasa/issues/5065): Add [LexicalSyntacticFeaturizer](./components.mdx#lexicalsyntacticfeaturizer) to sparse featurizers.
 
   `LexicalSyntacticFeaturizer` does the same featurization as the `CRFEntityExtractor`. We extracted the
   featurization into a separate component so that the features can be reused and featurization is independent from the
   entity extraction.
 
 * [#5187](https://github.com/rasahq/rasa/issues/5187): Integrate language models from HuggingFace's [Transformers](https://github.com/huggingface/transformers) Library.
 
-  Add a new NLP component [HFTransformersNLP](./components/language-models.mdx#hftransformersnlp) which tokenizes and featurizes incoming messages using a specified
+  Add a new NLP component [HFTransformersNLP](./components.mdx#hftransformersnlp) which tokenizes and featurizes incoming messages using a specified
   pre-trained model with the Transformers library as the backend.
-  Add [LanguageModelTokenizer](./components/tokenizers.mdx#languagemodeltokenizer) and [LanguageModelFeaturizer](./components/featurizers.mdx#languagemodelfeaturizer) which use the information from
-  [HFTransformersNLP](./components/language-models.mdx#hftransformersnlp) and sets them correctly for message object.
+  Add [LanguageModelTokenizer](./components.mdx#languagemodeltokenizer) and [LanguageModelFeaturizer](./components.mdx#languagemodelfeaturizer) which use the information from
+  [HFTransformersNLP](./components.mdx#hftransformersnlp) and sets them correctly for message object.
   Language models currently supported: BERT, OpenAIGPT, GPT-2, XLNet, DistilBert, RoBERTa.
 
 * [#5225](https://github.com/rasahq/rasa/issues/5225): Added a new CLI command `rasa export` to publish tracker events from a persistent
@@ -578,12 +578,12 @@ https://github.com/RasaHQ/rasa/tree/master/changelog/ . -->
   TF_INTRA_OP_PARALLELISM_THREADS="2"
   ```
 
-* [#5266](https://github.com/rasahq/rasa/issues/5266): Added a new NLU component [DIETClassifier](./components/intent-classifiers.mdx#dietclassifier) and a new policy [TEDPolicy](./policies.mdx#ted-policy).
+* [#5266](https://github.com/rasahq/rasa/issues/5266): Added a new NLU component [DIETClassifier](./components.mdx#dietclassifier) and a new policy [TEDPolicy](./policies.mdx#ted-policy).
 
   DIET (Dual Intent and Entity Transformer) is a multi-task architecture for intent classification and entity
-  recognition. You can read more about this component in our [documentation](./components/intent-classifiers.mdx#dietclassifier).
+  recognition. You can read more about this component in our [documentation](./components.mdx#dietclassifier).
   The new component will replace the `EmbeddingIntentClassifier` and the
-  [CRFEntityExtractor](./components/entity-extractors.mdx#crfentityextractor) in the future.
+  [CRFEntityExtractor](./components.mdx#crfentityextractor) in the future.
   Those two components are deprecated from now on.
   See [migration guide](./migration-guide.mdx#migration-to-rasa-1-8) for details on how to
   switch to the new component.
@@ -848,7 +848,7 @@ https://github.com/RasaHQ/rasa/tree/master/changelog/ . -->
     min_df: 5
   ```
 
-* [#4957](https://github.com/rasahq/rasa/issues/4957): To [use custom features in the `CRFEntityExtractor`](./components/entity-extractors.mdx#passing-custom-features-to-crfentityextractor)
+* [#4957](https://github.com/rasahq/rasa/issues/4957): To [use custom features in the `CRFEntityExtractor`](./components.mdx#passing-custom-features-to-crfentityextractor)
   use `text_dense_features` instead of `ner_features`. If
   `text_dense_features` are present in the feature set, the `CRFEntityExtractor` will automatically make use of
   them. Just make sure to add a dense featurizer in front of the `CRFEntityExtractor` in your pipeline and set the
@@ -899,13 +899,13 @@ https://github.com/RasaHQ/rasa/tree/master/changelog/ . -->
 
   Add option `return_sequence` to all featurizers. By default all featurizers return a matrix of size
   (1 x feature-dimension). If the option `return_sequence` is set to `True`, the corresponding featurizer will return
-  a matrix of size (token-length x feature-dimension). See [Text Featurizers](./components/featurizers.mdx).
+  a matrix of size (token-length x feature-dimension). See [Text Featurizers](./components.mdx#featurizers).
   Default value is set to `False`. However, you might want to set it to `True` if you want to use custom features
   in the `CRFEntityExtractor`.
-  See [passing custom features to the `CRFEntityExtractor`](./components/entity-extractors.mdx#passing-custom-features-to-crfentityextractor)
+  See [passing custom features to the `CRFEntityExtractor`](./components.mdx#passing-custom-features-to-crfentityextractor)
 
   Changed some featurizers to use sparse features, which should reduce memory usage with large amounts of training data significantly.
-  Read more: [Text Featurizers](./components/featurizers.mdx) .
+  Read more: [Text Featurizers](./components.mdx#featurizers) .
 
   :::caution
   These changes break model compatibility. You will need to retrain your old models!

diff --git a/changelog/5510.feature.md b/changelog/5510.feature.md
@@ -4,7 +4,7 @@ You can now define what kind of features should be used by what component
 You can set an alias via the option `alias` for every featurizer in your pipeline.
 The `alias` can be anything, by default it is set to the full featurizer class name.
 You can then specify, for example, on the
-[DIETClassifier](./components/intent-classifiers.mdx#diet-classifier) what features from which
+[DIETClassifier](./components.mdx#diet-classifier) what features from which
 featurizers should be used.
 If you don't set the option `featurizers` all available features will be used.
 This is also the default behavior.

diff --git a/changelog/5957.feature.md b/changelog/5957.feature.md
@@ -1,2 +1,2 @@
 Add new entity extractor `RegexEntityExtractor`. The entity extractor extracts entities using the lookup tables
-and regexes defined in the training data. For more information see [RegexEntityExtractor](./components/entity-extractors.mdx#regexentityextractor).
+and regexes defined in the training data. For more information see [RegexEntityExtractor](./components.mdx#regexentityextractor).
diff --git a/changelog/6088.feature.md b/changelog/6088.feature.md
@@ -6,15 +6,15 @@ policies [Mapping Policy](./policies.mdx#mapping-policy),
 deprecated and will be removed in the future. Please see the
 [rules documentation](./rules.mdx) for more information.
 
-Added new NLU component [FallbackClassifier](./components/intent-classifiers.mdx#fallbackclassifier) 
+Added new NLU component [FallbackClassifier](./components.mdx#fallbackclassifier) 
 which predicts an intent `nlu_fallback` in case the confidence was below a given
 threshold. The intent `nlu_fallback` may
 then be used to write stories / rules to handle the fallback in case of low NLU
 confidence.
 
-```python
+```yaml-rasa
 pipeline:
-- ... # Other NLU components
+- # Other NLU components ...
 - name: FallbackClassifier
   # If the highest ranked intent has a confidence lower than the threshold then
   # the NLU pipeline predicts an intent `nlu_fallback` which you can then be used in

diff --git a/changelog/6453.removal.md b/changelog/6453.removal.md
@@ -10,7 +10,7 @@ NLU `Component`:
 
 Removed `_guess_format()` utils method from `rasa.nlu.training_data.loading` (use `guess_format` instead).
 
-Removed several config options for [TED Policy](./policies#ted-policy), [DIETClassifier](./components/intent-classifiers#dietclassifier) and [ResponseSelector](./components/selectors#responseselector):
+Removed several config options for [TED Policy](./policies.mdx#ted-policy), [DIETClassifier](./components.mdx#dietclassifier) and [ResponseSelector](./components.mdx#responseselector):
 - `hidden_layers_sizes_pre_dial`
 - `hidden_layers_sizes_bot`
 - `droprate`

diff --git a/changelog/6601.bugfix.md b/changelog/6601.bugfix.md
@@ -0,0 +1,3 @@
+Fixed a bug in the featurization of the boolean slot type. Previously, to set a slot value to "true", 
+you had to set it to "1", which is in conflict with the documentation. In older versions `true` 
+(without quotes) was also possible, but now raised an error during yaml validation. 
diff --git a/changelog/6613.improvement.md b/changelog/6613.improvement.md
@@ -0,0 +1,4 @@
+Added telemetry reporting. Rasa uses telemetry to report anonymous usage information. 
+This information is essential to help improve Rasa Open Source for all users.
+Reporting will be opt-out. More information can be found in our 
+[telemetry documentation](./telemetry/telemetry.mdx).
diff --git a/changelog/6658.removal.md b/changelog/6658.removal.md
@@ -0,0 +1 @@
+`SklearnPolicy` was deprecated. `TEDPolicy` is the preferred machine-learning policy for dialogue models.
diff --git a/docs/.gitignore b/docs/.gitignore
@@ -1,2 +1,3 @@
 # Local Netlify folder
-.netlify
+.netlify
+docs/telemetry/reference.mdx
diff --git a/docs/docs/actions.mdx b/docs/docs/actions.mdx
@@ -2,6 +2,24 @@
 id: actions
 sidebar_label: Overview
 title: Actions
+abstract: After each user message, the model will predict an action that the assistant should perform next. This page gives you an overview of the different types of actions you can use.
 ---
 
-<!-- TODO: add an overview of the different types of actions -->
+## Responses
+A [response](./responses.mdx) is a message the assistant will send back to the user. This is
+the action you will use most often, when you want the assistant to send text, images, buttons
+or similar to the user.
+
+## Custom Actions
+A [custom action](./custom-actions.mdx) is an action that can run any code you want. This can be used to make an
+API call, or to query a database for example.
+
+## Forms
+[Forms](./forms.mdx) are a special type of custom action, designed to handle business logic. If you have
+any conversation designs where you expect the assistant to ask for a specific set of
+information, you should use forms.
+
+## Default Actions
+[Default actions](./default-actions.mdx) are actions that are built into the dialogue manager by default. Most of
+these are automatically predicted based on certain conversation situations. You may want to
+customize these to personalize your assistant.
diff --git a/docs/docs/business-logic.mdx b/docs/docs/business-logic.mdx
@@ -188,8 +188,8 @@ data to your NLU file:
 
 :::note
 Entities like `business_email` and `budget` would usually be handled by pretrained entity extractors
-(e.g. [DucklingHTTPExtractor](./components/entity-extractors.mdx#ducklinghttpextractor)
-or [SpacyEntityExtractor](./components/entity-extractors.mdx#spacyentityextractor)), but for this tutorial
+(e.g. [DucklingHTTPExtractor](./components.mdx#ducklinghttpextractor)
+or [SpacyEntityExtractor](./components.mdx#spacyentityextractor)), but for this tutorial
 we want to avoid any additional setup.
 
 :::

diff --git a/docs/docs/cdd.mdx b/docs/docs/cdd.mdx
diff --git a/docs/docs/chitchat-faqs.mdx b/docs/docs/chitchat-faqs.mdx
@@ -245,7 +245,7 @@ When you need to handle lots of different messages like FAQs or chitchat, the ab
 approach using the `MemoizationPolicy` will become cumbersome. You will need to write
 one story for each of the different intents.
 
-The [ResponseSelector](components/selectors.mdx#responseselector) is designed to
+The [ResponseSelector](components.mdx#responseselector) is designed to
 make it easier to handle conversation patterns like small talk and FAQ messages.
 When you use the `ResponseSelector`, you only need one story to handle all FAQs,
 instead of adding one story for each intent.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		`SklearnPolicy` was deprecated. `TEDPolicy` is the preferred machine-learning policy for dialogue models.