Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Response generator responses with multimedia elements #6323

Merged
merged 46 commits into from
Aug 17, 2020

Conversation

tmbo
Copy link
Member

@tmbo tmbo commented Aug 3, 2020

Proposed changes:

  • added yaml support for responses. example of a responses.yml with the format as it is currently implemented:
responses:
  chitchat/ask_weather:
  - text: Where do you want to check the weather?
    buttons:
    - title: Current location
      payload: here
    - title: Other place
      payload: other_location

  chitchat/ask_name:
  - text: my name is Sara, Rasa's documentation bot!
    image: "https://i.imgur.com/nGF1K8f.jpg"
  • responses model only gets trained on the text part of the first response.
  • allows users to use multimedia elements (images, buttons, ...) in response templates
  • the format (and the parser) is the same as we use for utter templates in the domain
  • this includes the changes to port the moodbot demo to the new file format (needed an example...)

Status (please check what you already did):

  • handle case where the response.text is empty -> use the intent name instead
  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

@tmbo
Copy link
Member Author

tmbo commented Aug 3, 2020

@dakshvar22 are there any concerns regarding the data format?

@@ -233,7 +233,7 @@ def is_markdown_story_file(file_path: Text) -> bool:
"""
suffix = PurePath(file_path).suffix

if suffix and suffix != MARKDOWN_FILE_EXTENSION:
if suffix not in MARKDOWN_FILE_EXTENSIONS:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should discuss if we keep the original behaviour: I changed it to keep it uniform across readers. The difference between the versions is that the original one would happily read files without an extension as story files, which seems arbitrary. Nevertheless, it is breaking backwards compatibility in an odd way, so kind of betting that no one relies on this odd behaviour 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add this to the changelog? Training data form in markdown format has to have the file suffix .md from now on`?

@@ -1098,7 +1104,7 @@ def _load_model(
data_signature=model_data_example.get_signature(),
label_data=label_data,
entity_tag_specs=entity_tag_specs,
config=meta,
config=copy.deepcopy(meta),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fix was needed as this config gets passed to tensorflow which in place convert all values of the dictionary to tensorflow types (e.g. it will replace an ordinary python string with a tensorflow specific string). to avoid tf modifying this original dictionary we need to pass in a copy...

@dakshvar22
Copy link
Contributor

dakshvar22 commented Aug 3, 2020

@tmbo Data format looks good. Just two comments(maybe you have thought of these already):

  1. In contrast to usual response templates, there cannot be multiple text attributes for one retrieval intent. For example, this would be invalid:
responses:
  chitchat/ask_weather:
  - text: Where do you want to check the weather?
    buttons:
    - title: Current location
      payload: here
    - title: Other place
      payload: other_location
 - text: Do you want to know the weather for today?
    buttons:
    - title: Today
      payload: here
    - title: Whole week
      payload: other_location

This is primarily because the response selector can only be trained with one ground truth label. Supporting multiple responses should be left out of scope for now IMO.

  1. I am not sure of this but the text attribute of any response template is optional, right? For example, is this valid? -
responses:
  chitchat/ask_weather:
  - buttons:
    - title: Current location
      payload: here
    - title: Other place
      payload: other_location

If it is valid, then we should use the full name of the retrieval intent(for e.g. - chitchat/ask_weather) as the proxy for the text attribute(just for model training)

@tmbo
Copy link
Member Author

tmbo commented Aug 3, 2020

In contrast to usual response templates, there cannot be multiple text attributes for one retrieval intent. For example, this would be invalid:

just because the ML model can't be trained with multiple of these, it doesn't mean this needs to be invalid. e.g. we could do the same as we do for utterance templates and just select a random one.

@tmbo
Copy link
Member Author

tmbo commented Aug 3, 2020

I am not sure of this but the text attribute of any response template is optional, right? For example, is this valid? -

Yes I think that is valid. I like the solution, let's use the intent name in that case

@dakshvar22
Copy link
Contributor

dakshvar22 commented Aug 3, 2020

just because the ML model can't be trained with multiple of these, it doesn't mean this needs to be invalid. e.g. we could do the same as we do for utterance templates and just select a random one.

Yes, we can do that but which one would the model be trained on? It has to be just one and taking a random one for training, doesn't seem like a good option. If we always take the first response as the training response would it introduce any confusion/problems in the UX?(I am not very clear on this myself) For example, model would need to be retrained if the first response was swapped with the last.

@tmbo
Copy link
Member Author

tmbo commented Aug 3, 2020

well, we can train the model on either the first, all of them or any of them: whatever you prefer.

I think from a UX perspective it is easier to explain that the format is exactly the same as the one for utterance templates. The model gets retrained on any change to the responses at the moment, I think from the training perspective it is pretty consistent.

@tmbo tmbo added this to the 2.0a2 Rasa Open Source milestone Aug 10, 2020
Copy link
Contributor

@wochinge wochinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first part of the review. Looking good so far 💯

examples/moodbot/data/rules.yml Outdated Show resolved Hide resolved
examples/moodbot/data/rules.yml Outdated Show resolved Hide resolved
changelog/6323.improvement.md Outdated Show resolved Hide resolved
tests/core/test_exporter.py Outdated Show resolved Hide resolved
rasa/nlu/training_data/training_data.py Outdated Show resolved Hide resolved
rasa/nlu/training_data/training_data.py Show resolved Hide resolved
rasa/nlu/training_data/training_data.py Show resolved Hide resolved
rasa/nlu/training_data/training_data.py Outdated Show resolved Hide resolved
tests/nlu/training_data/test_training_data.py Show resolved Hide resolved
tests/nlu/selectors/test_selectors.py Show resolved Hide resolved
Copy link
Contributor

@wochinge wochinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nearly done now. Had to submit due to outdated diff

rasa/core/actions/action.py Outdated Show resolved Hide resolved
rasa/nlu/schemas/nlu.yml Show resolved Hide resolved
rasa/nlu/training_data/formats/markdown_nlg.py Outdated Show resolved Hide resolved
rasa/nlu/training_data/formats/rasa_yaml.py Outdated Show resolved Hide resolved
rasa/nlu/training_data/formats/rasa_yaml.py Outdated Show resolved Hide resolved
domain_file = stack.enter_context(open(default_domain_path))
config_file = stack.enter_context(open(default_stack_config))
nlu_file = stack.enter_context(open(default_nlu_data))
domain_data = rasa_utils.io.read_yaml_file(default_domain_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using parametrize or adding another test to make sure it still works with markdown?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a closer look and there are already a couple of other methods in there that test the same endpoint with markdown files. since this actually trains a model and takes time, I'd rather avoid adding more training runs. what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but then I'd maybe rename the tests to make it clear that this testing the MD support. Otherwise somebody might come, adapt the test, and suddenly all MD testing is gone.

Another quick way would be to extract the payload extraction to a separate function and test that using parametrize.

rasa/nlu/training_data/training_data.py Outdated Show resolved Hide resolved
rasa/nlu/selectors/response_selector.py Outdated Show resolved Hide resolved
rasa/nlu/selectors/response_selector.py Outdated Show resolved Hide resolved
rasa/nlu/classifiers/diet_classifier.py Show resolved Hide resolved
Copy link
Contributor

@wochinge wochinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a PR - great work! 👍

docs/docs/chitchat-faqs.mdx Show resolved Hide resolved
@@ -233,7 +233,7 @@ def is_markdown_story_file(file_path: Text) -> bool:
"""
suffix = PurePath(file_path).suffix

if suffix and suffix != MARKDOWN_FILE_EXTENSION:
if suffix not in MARKDOWN_FILE_EXTENSIONS:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add this to the changelog? Training data form in markdown format has to have the file suffix .md from now on`?

@tmbo
Copy link
Member Author

tmbo commented Aug 14, 2020

thanks a lot @wochinge fur the super quick and thorough review 🚀

@wochinge
Copy link
Contributor

@tmbo During the review I noticed the close coupling of training data objects (Domain, TrainingData) with the different Writer classes.

What do think about having a pendant to TrainingDataImporter to

  • support other file format implementations (e.g. implementations of the community, adapters to other frameworks)
  • clear separation of concerns / classes don't do multiple things at the same time
  • easier testability
  • DRYer (each class is doing some sort of if endswith "json" elif "endswith md" currently)

@tmbo
Copy link
Member Author

tmbo commented Aug 17, 2020

yes @wochinge I think that is a good idea 👍

@tmbo tmbo merged commit da11cf9 into master Aug 17, 2020
@tmbo tmbo deleted the responses-with-extras branch August 17, 2020 08:04
@wochinge wochinge mentioned this pull request Aug 17, 2020
indam23 pushed a commit that referenced this pull request Jul 27, 2022
* Use rules for greet, goodbye & challenge

* Convert nlu & stories to yml

* Add '-' in front of examples

* Add 'y' , 'n' to affirm & deny intents

* Remove 'greet -> utter_greet' rule, and use in stories instead

* implement yaml response format as well as extended response format

extended response format adds the ability to add images, buttons, ... to responses generated from the response selector. the format is the same as we use for utter templates in the domain.

* code style improvement

* fixed linter error

* fixed some more test and renamed nlg_stories to responses

* fixed typing issue

* fixed more types

* fixed tests

* Update training_data.py

* fixed import errror

* fixed remaining tests

* fixed name error

* Update rules.yml

* added tests for responses

* added changelog entry

* updated documentation

* applied review suggestions

* integrated review comments

* fixed typing issue

Co-authored-by: Arjaan Buijk <[email protected]>
Co-authored-by: Arjaan Buijk <[email protected]>
Co-authored-by: Roberto <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants