Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Markdown training data format #9390

Merged
merged 29 commits into from
Aug 26, 2021
Merged

Remove Markdown training data format #9390

merged 29 commits into from
Aug 26, 2021

Conversation

alwx
Copy link
Contributor

@alwx alwx commented Aug 18, 2021

Proposed changes:

Scope/list of changes:

  • Remove MarkdownStoryReader and all the code that uses it
  • Remove StoryMarkdownToYamlConverter
  • Remove _convert_core_data function and the ability to convert data from markdown to yaml
  • Adapt tests in test_yaml_story_writer.py to work without a MarkdownStoryReader
  • Remove MarkdownReader
  • Remove MarkdownWriter
  • Update tests in test_extractor.py to use RasaYAMLReader instead of MarkdownReader
  • Remove NLGMarkdownReader
  • Remove _convert_nlg_data function and the ability to convert data from NLG markdown to yaml
  • Remove rasa.nlu.training_data.converters
  • Remove markdown story files in data directory
  • Drop use_e2e in Rasa Open Source 3.0.0 when removing Markdown support

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

@alwx alwx requested review from ancalita and removed request for ancalita August 20, 2021 15:02
@chdorner
Copy link
Contributor

One question I've had that isn't only related to this but also other backwards compatible changes in rasa.shared is that between these OSS changes merging into main and the work in Rasa X being done we cannot run Rasa X with the latest OSS main. Is that right?

As an example, Rasa X imports rasa.shared.core.training_data.story_reader.markdown_story_reader.MarkdownStoryReader in various places, this class and the module is being removed in this PR. Since we only support Rasa 3.0 with Rasa X/EE 1.1 and we're still working on 1.0 we cannot remove Markdown support for another few weeks if not a month or two. During this time we wouldn't be able to run Rasa X/EE with Rasa OSS from main.

@alwx
Copy link
Contributor Author

alwx commented Aug 24, 2021

@chdorner yes, that's correct — we're currently introducing a lot of architectural changes so it's not guaranteed that Rasa X will work with Rasa from main branch.

@alwx alwx requested a review from ancalita August 25, 2021 07:18
@alwx alwx marked this pull request as ready for review August 25, 2021 07:28
@alwx alwx requested a review from a team as a code owner August 25, 2021 07:28
@alwx alwx requested review from a team, alopez and wochinge and removed request for a team August 25, 2021 07:28
@alwx alwx removed the request for review from alopez August 25, 2021 07:30
Copy link
Contributor

@wochinge wochinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow - this is big! Thanks for taking this huge thing on 💯

Some other MD things I found

  • comment here, here and some others
  • do a lot of the as_story_string methods actually still make sense? I think we need to keep the Event.as_story_string ones for rasa interactive but we might be able to delete them for StoryGraph, Story (Rasa X is using this here but this doesn't make sense any more after dropping MD in Rasa X as well). Might be worth dropping that in a separate PR
  • drop it here

@@ -0,0 +1,5 @@
Remove the support of Markdown training data format. This includes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add also an entry to the migration guide and maybe a link to the old migration guide where we tell them how to convert old MD files to YAML

rasa/server.py Outdated Show resolved Hide resolved
rasa/shared/core/trackers.py Show resolved Hide resolved
rasa/shared/data.py Outdated Show resolved Hide resolved
rasa/shared/importers/importer.py Outdated Show resolved Hide resolved
tests/utils/tensorflow/test_model_data_utils.py Outdated Show resolved Hide resolved
rasa/core/test.py Show resolved Hide resolved
Copy link
Member

@ancalita ancalita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow this PR is an amazing feat 🚀
Left a few small comments.

docs/docs/training-data-format.mdx Show resolved Hide resolved
rasa/cli/data.py Outdated Show resolved Hide resolved
rasa/core/test.py Outdated Show resolved Hide resolved
rasa/shared/nlu/training_data/training_data.py Outdated Show resolved Hide resolved
tests/cli/test_rasa_data.py Show resolved Hide resolved
@alwx alwx requested review from ancalita and wochinge August 25, 2021 13:31
@@ -62,18 +56,6 @@ def read_from_file(
"""
raise NotImplementedError

@staticmethod
def is_test_stories_file(filename: Text) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly confused why this method was removed 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason for it to exist because it's just a static method that just proxies its arguments to another function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, you're referring to its implementation in YAMLStoryReader.

@alwx alwx requested a review from ancalita August 25, 2021 14:03
Copy link
Member

@ancalita ancalita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌🏼

Copy link
Contributor

@wochinge wochinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very close 🚀
Can we also remove this one ?

docs/docs/training-data-format.mdx Show resolved Hide resolved
test_file = test_dir / f"tests{suffix}"
test_file.write_bytes(request.body)
return str(test_file)
def _test_data_file_from_payload(request: Request, temporary_directory: Path) -> Text:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

training_payload = _training_payload_from_json(request, temporary_directory)
and _training_payload_from_json can also be removed, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alwx _training_payload_from_json is still there.

rasa/shared/importers/importer.py Show resolved Hide resolved
@alwx alwx requested a review from wochinge August 26, 2021 06:24
@@ -0,0 +1,2 @@
Removes `template_variables` argument from `get_stories` method of `TrainingDataImporter`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Removes `template_variables` argument from `get_stories` method of `TrainingDataImporter`.
Removes `template_variables` and `e2e` arguments from `get_stories` method of `TrainingDataImporter`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also mention in the migration guide as it's an explicit public interface

docs/docs/training-data-importers.mdx Outdated Show resolved Hide resolved
test_file = test_dir / f"tests{suffix}"
test_file.write_bytes(request.body)
return str(test_file)
def _test_data_file_from_payload(request: Request, temporary_directory: Path) -> Text:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alwx _training_payload_from_json is still there.

@alwx alwx requested a review from wochinge August 26, 2021 09:55
Copy link
Contributor

@wochinge wochinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wohoo 🚀

@wochinge wochinge merged commit 2e82c9c into main Aug 26, 2021
@wochinge wochinge deleted the no-markdown branch August 26, 2021 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove Markdown training data format
4 participants