Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converter works unpredictibly with retrieval intents #7032

Closed
degiz opened this issue Oct 16, 2020 · 11 comments
Closed

Converter works unpredictibly with retrieval intents #7032

degiz opened this issue Oct 16, 2020 · 11 comments
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework resolution:wontfix issue is acknowledged but we will not work on this (nor will we accept contributions) type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@degiz
Copy link
Contributor

degiz commented Oct 16, 2020

Rasa version: 2.0.0

Issue:
If you have an md e2e story that contains full retrieval intents or responses, the converter sometimes strips out yaml with only the base intent

This:

## id
* out_of_scope/other: hahaha
    - utter_out_of_scope/other

is converted to:

version: "2.0"
stories:
- story: id
  steps:
  - intent: 'out_of_scope/other: hahaha'
  - action: utter_out_of_scope/other

BUT this

## id

* out_of_scope/other: hahaha
    - utter_out_of_scope/other

gets converted into something completely different, with the retrieval intent removed:

version: "2.0"
stories:
- story: id
  steps:
  - intent: out_of_scope
    user: |-
      hahaha
  - action: utter_out_of_scope/other
@degiz degiz added type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Oct 16, 2020
@degiz degiz added this to the 2.1 Rasa Open Source milestone Oct 16, 2020
@samsucik samsucik self-assigned this Oct 16, 2020
@samsucik
Copy link
Contributor

Looks like this is due to us currently not supporting retrieval intents in test conversations (which we should, see #7002 ).

The above markdown examples get treated differently -- the first one is not treated as a test story, the second is.

What happens is this chain of calls:

StoryMarkdownToYamlConverter.convert_and_write
	MarkdownStoryReader.is_test_stories_file
		rasa.shared.data.is_story_file
			MarkdownStoryReader.is_stories_file
				rasa.shared.data.is_nlu_file
					rasa.shared.nlu.training_data.loading.guess_format
						MarkdownReader.is_markdown_nlu_file
						NLGMarkdownReader.is_markdown_nlg_file

during which NLGMarkdownReader (wrongly) identifies the 2nd example as an NLG file and then overall the file is deemed by MarkdownStoryReader.is_test_stories_file() to not be a test story file because it is an NLU file (not a story file) in the first place.

Which component is to be blamed?

  • 1st possibility: MarkdownStoryReader.is_stories_file(). It's got the following condition here:
if not rasa.shared.data.is_likely_markdown_file(file_path) or rasa.shared.data.is_nlu_file(file_path):
    return False

Which ignores the fact that a file can be a story even if looks like a valid NLU (here NLG) file.

  • 2nd possibility: NLGMarkdownReader.is_markdown_nlg_file. It decides whether something is an NLG file based on the following regex: ##\s*.*\n\*[^:]*/.*\n\s*\t*\-.* which essentially checks if the file contains a section heading, a retrieval intent (possibly with message text) and a response:
## ab
* cd/ef[: text]
    - gh

This is tricky because the format looks exactly like a story with a retrieval intent! From this point of view, a markdown snippet should be able to be both a story and an NLG piece.

Note that this relates only to our markdown format -- in yaml, a response file looks different to a story file. However, the fact that we don't support retrieval intents in testing conversations affects both markdown (see the md->yml example in the issue description) and yaml (see #7002 ).

@samsucik
Copy link
Contributor

I've just realised that there is one element that can help us distinguish between markdown NLG files and e2e stories with retrieval intents. If we agree that the latter has to have message text for each intent, then that could be the difference. Then an NLG file would look like this:

## ab
* cd/ef
    - gh

whereas an e2e story file (with a retrieval intent) would look like this:

## ab
* cd/ef: text
    - gh

However, if we decide to change the NLG regex to require no message text (a very suggestive idea I have to say), I think this might create back-compatibility issues when/if we allow true e2e stories where there can be only the intent or the text (i.e. then an e2e story could still look like NLG). Cc @Ghostvv for this.

@kalkbrennerei
Copy link
Contributor

Maybe we can use the is_used_for_conversion flag of the MarkdownStoryReader to fix this? To me it seems the easiest way to avoid backwards compatibility issues

@samsucik
Copy link
Contributor

samsucik commented Oct 21, 2020

@kalkbrennerei sorry, I'm not sure if I fully get what you're saying. Do you mean fixing this bug but only when the reader is used in converting stories from markdown? If yes, then I'm honestly unsure and would like to hear another opinion on this.

Personally, I'd say that even in Rasa 1.x, e2e test stories with full retrieval intents should be possible to read and use regardless of whether there's an empty line after the ## or not, but I'd understand if we decided to assign this a low priority and instead focus on yaml (i.e. #7002). However, then we could still consider fixing #7034 so that markdown users can use test stories with full retrieval intents at all (provided they include an empty line after the ##).

Please, correct me if I misunderstood you :-)

@degiz
Copy link
Contributor Author

degiz commented Nov 11, 2020

@samsucik Doesn't seem I can reproduce that anymore 🤔

Update: I can 😞

@degiz
Copy link
Contributor Author

degiz commented Nov 11, 2020

@samsucik So from what I can see:

  • We've fixed the problem with YAML, and it doesn't split the retrieval intent
  • There's still a problem with MD stories and this empty line

We've agreed not to fix non-critical bugs for the MD format (let's not forget that we've switched from MD to YAML also because of difficulties with regexes), also the retrieval intents were an experimental feature in 1.x.

I'd say that the empty line problem is a won't fix situation.

@samsucik
Copy link
Contributor

@degiz I agree that having it fixed for YAML is the most important bit. For MD, though, we say in the 2.0.0 changelog:

Rasa Open Source can now read stories in both Markdown and YAML format.

which I read as "we currently support both". And since retrieval intents are no longer an experimental feature, not being able to use MD test stories with such intents might be close to critical? It's just my personal opinion though; and I know quite little about the decisions taken with regards to MD (and how to slowly move towards deprecating it)...

@degiz
Copy link
Contributor Author

degiz commented Nov 11, 2020

which I read as "we currently support both"

You are right, it's a wrong phrasing. MD is in legacy mode, we're vocal that we don't add features/fix non-critical bugs for it and encourage users to convert their existing projects to the YAML.

Please don't get me wrong, I'm not advocating my own laziness 😄 I have one more example - the Rule system. It's an official feature that won't have official support in MD.

@degiz
Copy link
Contributor Author

degiz commented Nov 11, 2020

@m-vdb what's your opinion on that?

@samsucik
Copy link
Contributor

Right, if you say it like this, then it makes complete sense to me to not fix this :-) Since test stories with retrieval intents previously didn't work for any format, it's unlikely that users will now have MD stories with retrieval intents that they'd like to convert to YAML.

@m-vdb
Copy link
Collaborator

m-vdb commented Nov 11, 2020

I agree with your recommendation @degiz 👍 I'd move this to "wont't fix"

@degiz degiz added the resolution:wontfix issue is acknowledged but we will not work on this (nor will we accept contributions) label Nov 11, 2020
@degiz degiz closed this as completed Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework resolution:wontfix issue is acknowledged but we will not work on this (nor will we accept contributions) type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

No branches or pull requests

4 participants