-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Newline token conversion between markdown and json formats #6087
Comments
@tabergma sorry for the tag if it's somehow spammy but can you help me with this issue. |
@akelad Could you please check this issue? |
It's been added to one of our teams inboxes - can I ask how come you're using JSON in the first place? I believe that format might be deprecated soon |
Well, I have just checked the rasa blog post for version 2.0 and noticed that yaml will be the format for data files. |
yeah that makes sense - would using yaml once 2.0 be a good replacement option for you for json? Json will still be around for a while, but we will be encouraging users to switch to the new format. Also, since you already found the area of the code that causes this issue, would you be up for submitting a PR to fix it? |
I have only used yaml for pipeline configurations so I am not sure how it's used for nlu data (will give it a try soon). |
nice thanks! |
O/ Akela, I am checking the live docs https://rasa.com/docs/rasa/nlu/training-data-format/#data-formats but it looks like the yaml format isn't yet part of it. Thanks 😄 |
It's still a work in progress sorry! you can take a peek here: https://github.com/RasaHQ/rasa/pull/6297/files |
@AMR-KELEG still working on the docs but we'll have an update soon. once we merged the PR it will be available at https://rasa.com/docs/rasa/next |
No worries 😄 |
* Unescape tokens on md-json conversion Solve #6087 On converting json nlu data into markdown, tokens like: "\n" are espaced to "\\n". However, on converting markdown nlu data into json, Unescaping isn't done * Add an entry in the changelog * Add test cases * Move the decode_string to rasa/utils/io.py * Remove unnecessary list comprehension Co-authored-by: Akela Drissner-Schmid <[email protected]> Co-authored-by: Tanja <[email protected]>
Rasa version: rasa==1.10.3
Rasa SDK version (if used & relevant): rasa-sdk==1.10.2
Rasa X version (if used & relevant):
Python version: Python 3.7.8
Operating system Ubuntu 20
Issue:
My team has training datasets with newline tokens
\n
as part of the text field in json files.We generally use the markdown format for inspecting the datafiles before converting them back to json so that we can easily manipulate them.
But, converting the same json file to markdown and then back to json causes the escaping of newline tokens which isn't desirable.
Error (including full traceback):
Command or request that led to error:
Code responsible for the issue:
rasa/rasa/nlu/training_data/formats/markdown.py
Line 51 in 88ad06f
rasa/rasa/nlu/training_data/formats/markdown.py
Line 70 in 88ad06f
The text was updated successfully, but these errors were encountered: