Initial implementation of converter for training data files #6404

degiz · 2020-08-13T15:38:44Z

Closes #6402

Proposed changes:

CLI command rasa data convert {nlu|core} -f yaml to convert training data from MD to YAML

Status (please check what you already did):

added some tests for the functionality
updated the documentation
updated the changelog (please check changelog for instructions)
reformat files using black (please check Readme for instructions)

wochinge · 2020-08-13T20:16:52Z

does it make sense that it's rasa data convert ?

federicotdn

It's working nicely!
Two things I think we still need:

Docstrings on the public methods/functions.
Maybe one or two tests that run assertions on the contents of the converted files. But since the writers themselves are already tested maybe this isn't necessary.

(I've also added other comments)

does it make sense that it's rasa data convert ?

@wochinge Would there be other rasa data commands?

changelog/6404.feature.md

rasa/cli/arguments/convert.py

rasa/cli/convert.py

wochinge · 2020-08-14T07:55:53Z

@wochinge Would there be other rasa data commands?

there is already rasa data split and I believe rasa data validate

wochinge · 2020-08-14T07:57:41Z

rasa/cli/convert.py

+        for file in os.listdir(training_data_path):
+            source_path = Path(training_data_path) / file
+            output_path = Path(output) / f"{source_path.stem}{CONVERTED_FILE_POSTFIX}"


Does it make sense to use os.walk (or even completely re-using some parts of data.get_core_nlu_files?

I chose os.listdir() over the os.walk() to avoid possible confusion for the users, in case they keep the files in different sub folders and want to experiment with the files one by one.
But I might be wrong!

degiz · 2020-08-14T08:21:48Z

@wochinge @federicotdn The fun part is that we already have rasa data convert 😄

usage: rasa data convert [-h] [-v] [-vv] [--quiet] {nlu} ...

positional arguments:
  {nlu}
    nlu          Converts NLU data between Markdown and json formats.

Looks like we actually need to reuse it here.

What about rasa data convert -f yaml --nlu {DIR} --core {DIR} -output ?

degiz · 2020-08-14T08:27:48Z

Ok, actually even like this:

rasa data convert --nlu {DIR} --core {DIR} -f yaml -output {DIR}

And add a note that we currently convert to YAML only from MD.

degiz · 2020-08-14T09:07:33Z

Ok, another idea:

rasa data convert nlu --data {DIR} -f yaml -out {DIR}
rasa data convert core --data {DIR} -f yaml -out {DIR}

Then we're super consistent.

tmbo · 2020-08-14T10:21:26Z

assuming we might also want to make changes to the domain / configuration that we might want to migrate, would it make sense to use something more general, e.g. rasa migrate?

degiz · 2020-08-14T10:22:55Z

rasa migrate

I like the idea! Should we probably use it once we really will migrate something? Currently it's purely about converting training data from one format to the other.

That would be actually a great tool to migrate the whole project and make sure it's 2.0 compatible.

tmbo · 2020-08-14T10:22:01Z

changelog/6404.feature.md

@@ -0,0 +1 @@
+User can use ``rasa data convert {nlu|core} -f yaml`` command to convert training data from Markdown format to YAML format.


what do I need to do to make this support nlg (responses) as well?

NLGMarkdownReader::reads returns TrainingDataas well as MarkdownReader, so it's mostly about:

writing tests for RasaYamlWriter making sure the conversion is correct

extending the "convert_to_yaml" function from this review to support NLGMarkdownReader

tmbo · 2020-08-14T10:27:22Z

That would be actually a great tool to migrate the whole project and make sure it's 2.0 compatible.

yes that is a good point, let's separate that and allow data migration as a separate command 👍

In any case, I think it already makes sense to add some instructions to the documentation about how to use the data migration https://rasa.com/docs/rasa/next/migration-guide#rasa-110-to-rasa-20 (should be a new section, what does the user need to do?)

degiz · 2020-08-14T10:30:54Z

add some instructions to the documentation

That's true! I'm just waiting for this review to have an approval to be sure what the final cli syntax looks like 😄

federicotdn

Looks good!! Stuff we might want to add on a future PR: docstrings for the public methods/functions, and deeper testing for the convertion results (file contents).

changelog/6404.feature.md

rasa/cli/arguments/data.py

rasa/cli/data.py

federicotdn · 2020-08-14T10:48:39Z

rasa/cli/data.py

+        if MarkdownReader.is_markdown_nlu_file(source_path):
+            if not is_nlu:
+                continue
+            _write_nlu_yaml(source_path, output_path, source_path)
+            num_of_files_converted += 1
+        elif not is_nlu and MarkdownStoryReader.is_markdown_story_file(source_path):
+            _write_core_yaml(source_path, output_path, source_path)
+            num_of_files_converted += 1
+        else:
+            print_warning(f"Skipped file '{source_path}'")


The warning is not showing when doing rasa convert core but iterating over NLU files. Maybe the if structure can be changed like this:

Suggested change

if MarkdownReader.is_markdown_nlu_file(source_path):

if not is_nlu:

continue

_write_nlu_yaml(source_path, output_path, source_path)

num_of_files_converted += 1

elif not is_nlu and MarkdownStoryReader.is_markdown_story_file(source_path):

_write_core_yaml(source_path, output_path, source_path)

num_of_files_converted += 1

else:

print_warning(f"Skipped file '{source_path}'")

if is_nlu and MarkdownReader.is_markdown_nlu_file(source_path):

_write_nlu_yaml(source_path, output_path, source_path)

num_of_files_converted += 1

continue

if not is_nlu and MarkdownStoryReader.is_markdown_story_file(source_path):

_write_core_yaml(source_path, output_path, source_path)

num_of_files_converted += 1

continue

print_warning(f"Skipped file '{source_path}'")

It's a little bit trickier. MarkdownStoryReader.is_markdown_story_file returns true for the NLU files 🤦
So our rasa train and other commands work only because we first check if it's NLU before Core.
So this if condition is hacky but correct.

degiz · 2020-08-14T11:33:49Z

docstrings for the public methods/functions

I agree, but non of the new methods are "public", I've renamed them to have "_" prefix

deeper testing for the convertion results (file contents)

Do you think we should read the files and check the actual content?

federicotdn · 2020-08-14T11:58:54Z

Do you think we should read the files and check the actual content?

I think it's worth it, yes. But because the writers are already tested I wouldn't consider it a top priority.

m-vdb · 2020-08-14T17:12:04Z

using my admin rights to merge this - everything passes except windows tests that we identified are not passing on any build at the moment

degiz force-pushed the 6402_cli_converter branch from 8fb9c97 to 28ce03f Compare August 13, 2020 15:40

degiz requested review from federicotdn and tmbo August 13, 2020 15:41

degiz force-pushed the 6402_cli_converter branch from 28ce03f to 1d36962 Compare August 13, 2020 15:58

federicotdn suggested changes Aug 14, 2020

View reviewed changes

wochinge reviewed Aug 14, 2020

View reviewed changes

degiz force-pushed the 6402_cli_converter branch 2 times, most recently from 86560b7 to 77b4ec5 Compare August 14, 2020 10:14

degiz requested a review from federicotdn August 14, 2020 10:16

degiz force-pushed the 6402_cli_converter branch from 77b4ec5 to 622e455 Compare August 14, 2020 10:18

tmbo reviewed Aug 14, 2020

View reviewed changes

degiz force-pushed the 6402_cli_converter branch from 622e455 to ee3c4c5 Compare August 14, 2020 10:37

federicotdn approved these changes Aug 14, 2020

View reviewed changes

degiz force-pushed the 6402_cli_converter branch 2 times, most recently from f55e8a2 to f2657db Compare August 14, 2020 11:33

degiz force-pushed the 6402_cli_converter branch from f2657db to 29907af Compare August 14, 2020 11:47

Initial implementation of converter for training data files

3ec8253

degiz force-pushed the 6402_cli_converter branch from 29907af to 3ec8253 Compare August 14, 2020 13:38

Merge branch 'master' into 6402_cli_converter

d015340

Added active_loop key to the Stories schema for a story step

a50e920

m-vdb merged commit 989e592 into master Aug 14, 2020

m-vdb deleted the 6402_cli_converter branch August 14, 2020 17:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial implementation of converter for training data files #6404

Initial implementation of converter for training data files #6404

degiz commented Aug 13, 2020 •

edited

Loading

wochinge commented Aug 13, 2020

federicotdn left a comment •

edited

Loading

wochinge commented Aug 14, 2020

wochinge Aug 14, 2020

degiz Aug 14, 2020

degiz commented Aug 14, 2020

degiz commented Aug 14, 2020

degiz commented Aug 14, 2020 •

edited

Loading

tmbo commented Aug 14, 2020

degiz commented Aug 14, 2020 •

edited

Loading

tmbo Aug 14, 2020

degiz Aug 14, 2020

tmbo commented Aug 14, 2020

degiz commented Aug 14, 2020

federicotdn left a comment

federicotdn Aug 14, 2020

degiz Aug 14, 2020

degiz commented Aug 14, 2020 •

edited

Loading

federicotdn commented Aug 14, 2020 •

edited

Loading

m-vdb commented Aug 14, 2020

		@@ -0,0 +1 @@
		User can use ``rasa data convert {nlu\|core} -f yaml`` command to convert training data from Markdown format to YAML format.

Initial implementation of converter for training data files #6404

Initial implementation of converter for training data files #6404

Conversation

degiz commented Aug 13, 2020 • edited Loading

wochinge commented Aug 13, 2020

federicotdn left a comment • edited Loading

Choose a reason for hiding this comment

wochinge commented Aug 14, 2020

wochinge Aug 14, 2020

Choose a reason for hiding this comment

degiz Aug 14, 2020

Choose a reason for hiding this comment

degiz commented Aug 14, 2020

degiz commented Aug 14, 2020

degiz commented Aug 14, 2020 • edited Loading

tmbo commented Aug 14, 2020

degiz commented Aug 14, 2020 • edited Loading

tmbo Aug 14, 2020

Choose a reason for hiding this comment

degiz Aug 14, 2020

Choose a reason for hiding this comment

tmbo commented Aug 14, 2020

degiz commented Aug 14, 2020

federicotdn left a comment

Choose a reason for hiding this comment

federicotdn Aug 14, 2020

Choose a reason for hiding this comment

degiz Aug 14, 2020

Choose a reason for hiding this comment

degiz commented Aug 14, 2020 • edited Loading

federicotdn commented Aug 14, 2020 • edited Loading

m-vdb commented Aug 14, 2020

degiz commented Aug 13, 2020 •

edited

Loading

federicotdn left a comment •

edited

Loading

degiz commented Aug 14, 2020 •

edited

Loading

degiz commented Aug 14, 2020 •

edited

Loading

degiz commented Aug 14, 2020 •

edited

Loading

federicotdn commented Aug 14, 2020 •

edited

Loading