Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix JSON parser to allow control characters in JSON string input #11433

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

PHILO-HE
Copy link
Contributor

@PHILO-HE PHILO-HE commented Nov 5, 2024

JSON string input can contain control characters, e.g., new line character \n. It can be
correctly handled by Presto, but not allowed by simdjson lib. If it exists in input, simdjson
will return UNESCAPED_CHARS error, then Velox will output null. Simdjson only allows literal
\n (represented by \\n). See its code link.

Here is a test result with presto.

SELECT json_extract('{"c1":"ab\ncd"}', '$.c1');
  _col0
----------
 "ab\ncd"
(1 row)

Just created one discussion thread in simdjson community: simdjson/simdjson#2287.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 5, 2024
Copy link

netlify bot commented Nov 5, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 960f35d
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6729afcfa84e260008816259

@kgpai
Copy link
Contributor

kgpai commented Nov 7, 2024

Hi @PHILO-HE , I am working on canonicalization of jsons (#11284); Ideally we aim to handle escaping during canonicalization (i.e from json_parse or when say cast(x to json)).
Looking into the example you raised, I am wondering if this is another way we should handle canonicalization (cc: @gggrace14 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants