-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Response generator responses with multimedia elements #6323
Conversation
extended response format adds the ability to add images, buttons, ... to responses generated from the response selector. the format is the same as we use for utter templates in the domain.
@dakshvar22 are there any concerns regarding the data format? |
@@ -233,7 +233,7 @@ def is_markdown_story_file(file_path: Text) -> bool: | |||
""" | |||
suffix = PurePath(file_path).suffix | |||
|
|||
if suffix and suffix != MARKDOWN_FILE_EXTENSION: | |||
if suffix not in MARKDOWN_FILE_EXTENSIONS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should discuss if we keep the original behaviour: I changed it to keep it uniform across readers. The difference between the versions is that the original one would happily read files without an extension as story files, which seems arbitrary. Nevertheless, it is breaking backwards compatibility in an odd way, so kind of betting that no one relies on this odd behaviour 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add this to the changelog? Training data form in markdown format
has to have the file suffix .md
from now on`?
@@ -1098,7 +1104,7 @@ def _load_model( | |||
data_signature=model_data_example.get_signature(), | |||
label_data=label_data, | |||
entity_tag_specs=entity_tag_specs, | |||
config=meta, | |||
config=copy.deepcopy(meta), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fix was needed as this config gets passed to tensorflow which in place convert all values of the dictionary to tensorflow types (e.g. it will replace an ordinary python string with a tensorflow specific string). to avoid tf modifying this original dictionary we need to pass in a copy...
@tmbo Data format looks good. Just two comments(maybe you have thought of these already):
This is primarily because the response selector can only be trained with one ground truth label. Supporting multiple responses should be left out of scope for now IMO.
If it is valid, then we should use the full name of the retrieval intent(for e.g. - |
just because the ML model can't be trained with multiple of these, it doesn't mean this needs to be invalid. e.g. we could do the same as we do for utterance templates and just select a random one. |
Yes I think that is valid. I like the solution, let's use the intent name in that case |
Yes, we can do that but which one would the model be trained on? It has to be just one and taking a random one for training, doesn't seem like a good option. If we always take the first response as the training response would it introduce any confusion/problems in the UX?(I am not very clear on this myself) For example, model would need to be retrained if the first response was swapped with the last. |
well, we can train the model on either the first, all of them or any of them: whatever you prefer. I think from a UX perspective it is easier to explain that the format is exactly the same as the one for utterance templates. The model gets retrained on any change to the responses at the moment, I think from the training perspective it is pretty consistent. |
…esponses-with-extras
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first part of the review. Looking good so far 💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nearly done now. Had to submit due to outdated diff
domain_file = stack.enter_context(open(default_domain_path)) | ||
config_file = stack.enter_context(open(default_stack_config)) | ||
nlu_file = stack.enter_context(open(default_nlu_data)) | ||
domain_data = rasa_utils.io.read_yaml_file(default_domain_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using parametrize
or adding another test to make sure it still works with markdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a closer look and there are already a couple of other methods in there that test the same endpoint with markdown files. since this actually trains a model and takes time, I'd rather avoid adding more training runs. what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but then I'd maybe rename the tests to make it clear that this testing the MD support. Otherwise somebody might come, adapt the test, and suddenly all MD testing is gone.
Another quick way would be to extract the payload extraction to a separate function and test that using parametrize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a PR - great work! 👍
@@ -233,7 +233,7 @@ def is_markdown_story_file(file_path: Text) -> bool: | |||
""" | |||
suffix = PurePath(file_path).suffix | |||
|
|||
if suffix and suffix != MARKDOWN_FILE_EXTENSION: | |||
if suffix not in MARKDOWN_FILE_EXTENSIONS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add this to the changelog? Training data form in markdown format
has to have the file suffix .md
from now on`?
thanks a lot @wochinge fur the super quick and thorough review 🚀 |
@tmbo During the review I noticed the close coupling of training data objects ( What do think about having a pendant to
|
…esponses-with-extras
yes @wochinge I think that is a good idea 👍 |
* Use rules for greet, goodbye & challenge * Convert nlu & stories to yml * Add '-' in front of examples * Add 'y' , 'n' to affirm & deny intents * Remove 'greet -> utter_greet' rule, and use in stories instead * implement yaml response format as well as extended response format extended response format adds the ability to add images, buttons, ... to responses generated from the response selector. the format is the same as we use for utter templates in the domain. * code style improvement * fixed linter error * fixed some more test and renamed nlg_stories to responses * fixed typing issue * fixed more types * fixed tests * Update training_data.py * fixed import errror * fixed remaining tests * fixed name error * Update rules.yml * added tests for responses * added changelog entry * updated documentation * applied review suggestions * integrated review comments * fixed typing issue Co-authored-by: Arjaan Buijk <[email protected]> Co-authored-by: Arjaan Buijk <[email protected]> Co-authored-by: Roberto <[email protected]>
Proposed changes:
responses.yml
with the format as it is currently implemented:text
part of the firstresponse
.Status (please check what you already did):
response.text
is empty -> use the intent name insteadblack
(please check Readme for instructions)