Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unescape tokens on md-json conversion #6308

Merged
merged 18 commits into from
Aug 18, 2020
Merged

Unescape tokens on md-json conversion #6308

merged 18 commits into from
Aug 18, 2020

Conversation

AMR-KELEG
Copy link
Contributor

@AMR-KELEG AMR-KELEG commented Jul 31, 2020

fixes #6087
On converting json nlu data into markdown, tokens like: "\n" are
espaced to "\n".
However, on converting markdown nlu data into json,
Unescaping isn't done

Proposed changes:

  • ...

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

Solve #6087
On converting json nlu data into markdown, tokens like: "\n" are
espaced to "\\n".
However, on converting markdown nlu data into json,
Unescaping isn't done
@sara-tagger sara-tagger requested a review from TyDunn July 31, 2020 06:00
@sara-tagger
Copy link
Collaborator

sara-tagger commented Jul 31, 2020

Thanks for submitting a pull request 🚀 @akelad will take a look at it as soon as possible ✨

@TyDunn TyDunn removed their request for review July 31, 2020 11:39
@akelad akelad self-requested a review July 31, 2020 11:42
@akelad akelad removed their assignment Jul 31, 2020
Copy link
Contributor

@tabergma tabergma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this over!

Code wise it looks good 🚀 However, could you please add a test here and also add a changelog entry. Thanks.

@@ -37,13 +37,15 @@
AVAILABLE_SECTIONS = [INTENT, SYNONYM, REGEX, LOOKUP]
MARKDOWN_SECTION_MARKERS = [f"## {s}:" for s in AVAILABLE_SECTIONS]

item_regex = re.compile(r"\s*[-*+]\s*(.+)")
item_regex = re.compile(r"\s*[-*+]\s*((?:.+\s*)*)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you changed this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The item_regex will fail in case of having whitespace tokens in the line since they aren't escaped anymore.

import re

item_regex = re.compile(r"\s*[-*+]\s*((?:.+\s*)*)")
old_item_regex = re.compile(r"\s*[-*+]\s*(.*)")

t = '- THIS IS A MD ENTRY WITH a "\\n" token'
print(repr(t)) 
# '- THIS IS A MD ENTRY WITH a "\\n" token'

unespaced_t = decode_string(t)
print(repr(unespaced_t)) 
# '- THIS IS A MD ENTRY WITH a "\n" token'

re.match(old_item_regex, unespaced_t).groups(0)[0] 
#'THIS IS A MD ENTRY WITH a "'

re.match(item_regex, unespaced_t).groups(0)[0] 
# 'THIS IS A MD ENTRY WITH a "\n" token'

@tabergma tabergma removed the request for review from akelad August 10, 2020 07:11
Copy link
Contributor

@tabergma tabergma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! 💯

@rasabot
Copy link
Collaborator

rasabot commented Aug 11, 2020

Could not update branch. Most likely this is due to a merge conflict. Please update the branch manually and fix any issues.

@tabergma
Copy link
Contributor

@AMR-KELEG Can you please resolve the merge conflicts? master is getting updated quite frequently right now, will try to get it in as soon as the conflicts are resolved.

@AMR-KELEG
Copy link
Contributor Author

@tabergma I have updated the files.
I think that the commit history is a little bit messy,
Should I try to squash the commits somehow?
Or can you at least squash them on merging to master?
I believe the change isn't that big so a single commit can still be meaningful.

@tabergma
Copy link
Contributor

I can squash them on merging into master. Thanks for updating!

@akelad
Copy link
Contributor

akelad commented Aug 18, 2020

@tabergma can we just use the ready-to-merge label here?

@tabergma
Copy link
Contributor

Did not worked the last time I tried. But @tmbo mentioned that he could help merging this.

@tmbo
Copy link
Member

tmbo commented Aug 18, 2020

I don't think the label works for community PRs (not a 100% sure though).

@tmbo tmbo merged commit 46326e7 into RasaHQ:master Aug 18, 2020
@AMR-KELEG
Copy link
Contributor Author

Thanks all for your efforts merging this PR 😅🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Newline token conversion between markdown and json formats
6 participants