Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E2e story printing #7388

Merged
merged 35 commits into from
Dec 7, 2020
Merged

E2e story printing #7388

merged 35 commits into from
Dec 7, 2020

Conversation

wochinge
Copy link
Contributor

@wochinge wochinge commented Nov 26, 2020

Proposed changes:

  • fix Make sure the e2e stories are printed correctly #7130
  • fix printing of end-to-end bot actions in rasa interactive
  • fix failing tests due to the DEFAULT_STORIES_FILE changed to be yaml
  • fix reading and writing of yaml stories with end-to-end tests and end-to-end training data
  • raise exception if end-to-end UserUttered / ActionExecuted events are tried to be printed to markdown
  • change fingerprinting to use yaml to make sure it works with end-to-end training data (fingerprinting is still not stable for checkpoints - however, should be fixed on master and not here
  • change story structure validator to use str to represent actions instead of markdown representation which uses as_story_string

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

Returns:
Event as string.
"""
if self.use_text_for_featurization:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ghostvv I think printing an end-to-end UserUttered event in the conversation test format will just make things worse and more complicated. I'd vote to fail early. What do you think?
I checked the usage and we should be fine except some stuff for the story validation tool which for some reason uses this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it used only in markdown, then ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer yaml writer to also use method inside the event then everything is in one place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joe and I just had quite a session about it and the fact that as_story_string is used for varying use cases (fingerprinting, printing stories, story validation) actually makes changing code quite complicated. I prefer having the representation of an event in a specific format separated out. How an event is featurized / unique represent should be part of the Event , but it shouldn't be related to any markdown stuff.

rasa/shared/core/events.py Outdated Show resolved Hide resolved
@wochinge
Copy link
Contributor Author

wochinge commented Dec 2, 2020

Rasa X story printing will be broken

rasa/test.py Outdated Show resolved Hide resolved
@wochinge wochinge requested review from Ghostvv and joejuzl December 3, 2020 17:08
rasa/shared/core/events.py Outdated Show resolved Hide resolved
@@ -231,7 +232,7 @@ def parse_e2e_message(line: Text, is_used_for_training: bool = True) -> Message:
intent = match.group(2)
message = match.group(4)
example = entities_parser.parse_training_example(message, intent)
if not is_used_for_training:
if not is_used_for_training and not self.use_e2e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is parse_e2e_message used but elf.use_e2e False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when converting stories from Markdown to YAML core.training.converters.test_story_markdown_to_yaml_converter.test_test_stories was failing

@@ -389,7 +372,7 @@ def _user_intent_from_step(
) -> Tuple[Text, Optional[Text]]:
user_intent = step.get(KEY_USER_INTENT, "").strip()

if not user_intent:
if not user_intent and KEY_USER_MESSAGE not in step:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the rest of this method gracefully handle there being no intent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have tests for it and they pass 🤷‍♂️

rasa/shared/core/training_data/structures.py Show resolved Hide resolved
tests/shared/core/test_events.py Show resolved Hide resolved
@wochinge wochinge requested a review from joejuzl December 4, 2020 17:19
@joejuzl
Copy link
Contributor

joejuzl commented Dec 4, 2020

LGTM

@wochinge
Copy link
Contributor Author

wochinge commented Dec 7, 2020

@Ghostvv Would you mind reviewing?

@wochinge wochinge marked this pull request as ready for review December 7, 2020 10:17
@wochinge wochinge requested a review from a team December 7, 2020 10:17
Copy link
Contributor

@Ghostvv Ghostvv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, but it is better if someone who knows yaml writing will take a look


def __str__(self) -> Text:
"""Returns text representation of event."""
return (
f"UserUttered(text: {self.text}, intent: {self.intent}, "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be

Suggested change
f"UserUttered(text: {self.text}, intent: {self.intent}, "
f"UserUttered(text: {self.text}, intent: {self.intent_name}, "

otherwise you get all the confidences etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. We should revisit this on master once we decided on a Python code convention. Let's focus on getting e2e merged and not to fix all problems in the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean? I don't think its convention problem. If you use str as hash method, you'd get different events all the time, because confidences will be different

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use str as hash here. The hash actually uses the intent_name. __hash__ is currently not used for hashing on predictions / only on training data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then hash is wrong, because it contains self.entities which contain confidences and extractors etc, which I don't think should be a part of the hash

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed representation and hashing

rasa/shared/core/events.py Show resolved Hide resolved

def __str__(self) -> Text:
"""Returns text representation of event."""
return (
f"UserUttered(text: {self.text}, intent: {self.intent}, "
f"entities: {self.entities})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that will contain a dict with all the confidences, extractors, etc, should we filter only entity names here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my answer to #7388 (comment)

@wochinge
Copy link
Contributor Author

wochinge commented Dec 7, 2020

looks good, but it is better if someone who knows yaml writing will take a look

@Ghostvv I think the changes re YAML are very small in this PR. I think it'd be better to merge this now and then have @degiz review the entire changes on the e2e branch. Ok?

Can I merge then?

rasa/shared/core/events.py Outdated Show resolved Hide resolved
@Ghostvv
Copy link
Contributor

Ghostvv commented Dec 7, 2020

I'd add a TODO/create separate issue about self.entities in hash and we can merge it

@@ -377,30 +409,44 @@ def _from_parse_data(
)

def __hash__(self) -> int:
return hash((self.text, self.intent_name, jsonpickle.encode(self.entities)))
"""Returns unique hash of object."""
return hash(json.dumps(self.as_sub_state()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@wochinge wochinge merged commit de9e17a into e2e Dec 7, 2020
@wochinge wochinge deleted the e2e-story-printing branch December 7, 2020 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants