-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unescape tokens on md-json conversion #6308
Unescape tokens on md-json conversion #6308
Conversation
Solve #6087 On converting json nlu data into markdown, tokens like: "\n" are espaced to "\\n". However, on converting markdown nlu data into json, Unescaping isn't done
Thanks for submitting a pull request 🚀 @akelad will take a look at it as soon as possible ✨ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -37,13 +37,15 @@ | |||
AVAILABLE_SECTIONS = [INTENT, SYNONYM, REGEX, LOOKUP] | |||
MARKDOWN_SECTION_MARKERS = [f"## {s}:" for s in AVAILABLE_SECTIONS] | |||
|
|||
item_regex = re.compile(r"\s*[-*+]\s*(.+)") | |||
item_regex = re.compile(r"\s*[-*+]\s*((?:.+\s*)*)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you changed this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The item_regex
will fail in case of having whitespace tokens in the line since they aren't escaped anymore.
import re
item_regex = re.compile(r"\s*[-*+]\s*((?:.+\s*)*)")
old_item_regex = re.compile(r"\s*[-*+]\s*(.*)")
t = '- THIS IS A MD ENTRY WITH a "\\n" token'
print(repr(t))
# '- THIS IS A MD ENTRY WITH a "\\n" token'
unespaced_t = decode_string(t)
print(repr(unespaced_t))
# '- THIS IS A MD ENTRY WITH a "\n" token'
re.match(old_item_regex, unespaced_t).groups(0)[0]
#'THIS IS A MD ENTRY WITH a "'
re.match(item_regex, unespaced_t).groups(0)[0]
# 'THIS IS A MD ENTRY WITH a "\n" token'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! 💯
Could not update branch. Most likely this is due to a merge conflict. Please update the branch manually and fix any issues. |
@AMR-KELEG Can you please resolve the merge conflicts? master is getting updated quite frequently right now, will try to get it in as soon as the conflicts are resolved. |
…LEG/rasa into unescape-chars-in-markdown-format
@tabergma I have updated the files. |
I can squash them on merging into master. Thanks for updating! |
@tabergma can we just use the |
Did not worked the last time I tried. But @tmbo mentioned that he could help merging this. |
I don't think the label works for community PRs (not a 100% sure though). |
Thanks all for your efforts merging this PR 😅🎉 |
fixes #6087
On converting json nlu data into markdown, tokens like: "\n" are
espaced to "\n".
However, on converting markdown nlu data into json,
Unescaping isn't done
Proposed changes:
Status (please check what you already did):
black
(please check Readme for instructions)