Unescape tokens on md-json conversion #6308

AMR-KELEG · 2020-07-31T02:58:05Z

fixes #6087
On converting json nlu data into markdown, tokens like: "\n" are
espaced to "\n".
However, on converting markdown nlu data into json,
Unescaping isn't done

Proposed changes:

...

Status (please check what you already did):

added some tests for the functionality
updated the documentation
updated the changelog (please check changelog for instructions)
reformat files using black (please check Readme for instructions)

Solve #6087 On converting json nlu data into markdown, tokens like: "\n" are espaced to "\\n". However, on converting markdown nlu data into json, Unescaping isn't done

sara-tagger · 2020-07-31T06:00:04Z

Thanks for submitting a pull request 🚀 @akelad will take a look at it as soon as possible ✨

tabergma

Thanks for taking this over!

Code wise it looks good 🚀 However, could you please add a test here and also add a changelog entry. Thanks.

tabergma · 2020-08-10T07:08:50Z

rasa/nlu/training_data/formats/markdown.py

@@ -37,13 +37,15 @@
 AVAILABLE_SECTIONS = [INTENT, SYNONYM, REGEX, LOOKUP]
 MARKDOWN_SECTION_MARKERS = [f"## {s}:" for s in AVAILABLE_SECTIONS]

-item_regex = re.compile(r"\s*[-*+]\s*(.+)")
+item_regex = re.compile(r"\s*[-*+]\s*((?:.+\s*)*)")


Why did you changed this?

The item_regex will fail in case of having whitespace tokens in the line since they aren't escaped anymore.

import re item_regex = re.compile(r"\s*[-*+]\s*((?:.+\s*)*)") old_item_regex = re.compile(r"\s*[-*+]\s*(.*)") t = '- THIS IS A MD ENTRY WITH a "\\n" token' print(repr(t)) # '- THIS IS A MD ENTRY WITH a "\\n" token' unespaced_t = decode_string(t) print(repr(unespaced_t)) # '- THIS IS A MD ENTRY WITH a "\n" token' re.match(old_item_regex, unespaced_t).groups(0)[0] #'THIS IS A MD ENTRY WITH a "' re.match(item_regex, unespaced_t).groups(0)[0] # 'THIS IS A MD ENTRY WITH a "\n" token'

tabergma

Great work! 💯

rasabot · 2020-08-11T13:22:26Z

Could not update branch. Most likely this is due to a merge conflict. Please update the branch manually and fix any issues.

tabergma · 2020-08-14T09:48:54Z

@AMR-KELEG Can you please resolve the merge conflicts? master is getting updated quite frequently right now, will try to get it in as soon as the conflicts are resolved.

…LEG/rasa into unescape-chars-in-markdown-format

AMR-KELEG · 2020-08-14T13:02:35Z

@tabergma I have updated the files.
I think that the commit history is a little bit messy,
Should I try to squash the commits somehow?
Or can you at least squash them on merging to master?
I believe the change isn't that big so a single commit can still be meaningful.

tabergma · 2020-08-14T13:23:37Z

I can squash them on merging into master. Thanks for updating!

akelad · 2020-08-18T08:30:39Z

@tabergma can we just use the ready-to-merge label here?

tabergma · 2020-08-18T08:33:32Z

Did not worked the last time I tried. But @tmbo mentioned that he could help merging this.

tmbo · 2020-08-18T08:35:11Z

I don't think the label works for community PRs (not a 100% sure though).

AMR-KELEG · 2020-08-19T19:12:36Z

Thanks all for your efforts merging this PR 😅🎉

Unescape tokens on md-json conversion

2035bac

Solve #6087 On converting json nlu data into markdown, tokens like: "\n" are espaced to "\\n". However, on converting markdown nlu data into json, Unescaping isn't done

sara-tagger requested a review from TyDunn July 31, 2020 06:00

TyDunn removed their request for review July 31, 2020 11:39

TyDunn assigned akelad Jul 31, 2020

akelad self-requested a review July 31, 2020 11:42

akelad removed their assignment Jul 31, 2020

Merge branch 'master' into unescape-chars-in-markdown-format

040c3e1

tabergma requested changes Aug 10, 2020

View reviewed changes

tabergma removed the request for review from akelad August 10, 2020 07:11

AMR-KELEG added 3 commits August 11, 2020 00:52

Add an entry in the changelog

2c24a91

Add test cases

f5839f2

Merge branch 'master' into unescape-chars-in-markdown-format

4adfa4a

tabergma approved these changes Aug 11, 2020

View reviewed changes

AMR-KELEG and others added 2 commits August 11, 2020 11:26

Merge branch 'master' into unescape-chars-in-markdown-format

8e397d6

Merge branch 'master' into unescape-chars-in-markdown-format

88682aa

tabergma added the status:ready-to-merge label Aug 11, 2020

rasabot removed the status:ready-to-merge label Aug 11, 2020

Merge branch 'master' into unescape-chars-in-markdown-format

63f494a

AMR-KELEG added 5 commits August 14, 2020 14:24

Merge branch 'origin/master' into unescape-chars-in-markdown-format

93cbfa1

Move the decode_string to rasa/utils/io.py

fa1f8c5

Merge branch 'master' into unescape-chars-in-markdown-format

2f256f4

Remove unnecessary list comprehension

6c8bf24

Merge branch 'unescape-chars-in-markdown-format' of github.com:AMR-KE…

1a183be

…LEG/rasa into unescape-chars-in-markdown-format

Merge branch 'master' into unescape-chars-in-markdown-format

656c1b3

Merge branch 'master' into unescape-chars-in-markdown-format

d468a28

tabergma added 3 commits August 17, 2020 09:18

Merge branch 'master' into unescape-chars-in-markdown-format

958c704

Merge branch 'master' into unescape-chars-in-markdown-format

1edb794

Merge branch 'master' into unescape-chars-in-markdown-format

931ec73

tmbo merged commit 46326e7 into RasaHQ:master Aug 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unescape tokens on md-json conversion #6308

Unescape tokens on md-json conversion #6308

AMR-KELEG commented Jul 31, 2020 •

edited

Loading

sara-tagger commented Jul 31, 2020 •

edited by TyDunn

Loading

tabergma left a comment •

edited

Loading

tabergma Aug 10, 2020

AMR-KELEG Aug 10, 2020

tabergma left a comment

rasabot commented Aug 11, 2020

tabergma commented Aug 14, 2020

AMR-KELEG commented Aug 14, 2020

tabergma commented Aug 14, 2020

akelad commented Aug 18, 2020

tabergma commented Aug 18, 2020

tmbo commented Aug 18, 2020

AMR-KELEG commented Aug 19, 2020

Unescape tokens on md-json conversion #6308

Unescape tokens on md-json conversion #6308

Conversation

AMR-KELEG commented Jul 31, 2020 • edited Loading

sara-tagger commented Jul 31, 2020 • edited by TyDunn Loading

tabergma left a comment • edited Loading

Choose a reason for hiding this comment

tabergma Aug 10, 2020

Choose a reason for hiding this comment

AMR-KELEG Aug 10, 2020

Choose a reason for hiding this comment

tabergma left a comment

Choose a reason for hiding this comment

rasabot commented Aug 11, 2020

tabergma commented Aug 14, 2020

AMR-KELEG commented Aug 14, 2020

tabergma commented Aug 14, 2020

akelad commented Aug 18, 2020

tabergma commented Aug 18, 2020

tmbo commented Aug 18, 2020

AMR-KELEG commented Aug 19, 2020

AMR-KELEG commented Jul 31, 2020 •

edited

Loading

sara-tagger commented Jul 31, 2020 •

edited by TyDunn

Loading

tabergma left a comment •

edited

Loading