-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Results are mismatch with vanilla Spark on release-1.1 when use get_json_object operator #5253
Comments
cc @kecookier |
@NEUpanning, I tried testing both main & 1.1 branch, but the result is NULL, consistent with Spark. Here is my test. Could you help check again?
|
@PHILO-HE Thanks for your reply. I can't reproduce it either, maybe there are other factors involved. I will take some time to figure out lately. |
@NEUpanning Is the issue I'm experiencing the same as this one
|
@wang-zhun Could you show the result of this SQL? |
@wang-zhun They may be related. |
@PHILO-HE Reviewing the commit history, you have more expertise in this section. Could you help verify this issue?
|
A simple modification to simdjson can resolve the issue, but it is uncertain if there will be other impacts |
@wang-zhun, thanks for your investigation! |
@PHILO-HE Thanks for your effort in #6661. I've cherry-picked this PR on our branch to test if this issue has been resolved, but the results are still mismatch with vanilla Spark. The json value in this issue doesn't contain extended ASCII and isn't a valid json. Therefore, this issue will remain open. |
@NEUpanning, have you tried with branch-1.2 or main? |
@PHILO-HE We are using v1.2.0-rc, higher version is not supported now. |
@NEUpanning, can the below case reproduce this issue on your side? If not, could you provide a small reproducible case?
|
@PHILO-HE Same as before discussion, I still can't reproduce it without our table. When I find a small reproducible case, I'll get in touch with you asap. Thanks. |
@PHILO-HE @NEUpanning @kecookier
Gluten returns But in |
@jiangjiangtian, after some investigations, I found simdjson ondemand API only validates structural correctness of JSON doc and the result for given JSON path. It's for performance consideration. |
@PHILO-HE Thanks for your investigations. We need to discuss internally to determine the requirement of the alignment. |
Backend
VL (Velox)
Bug description
The following SQL might lead to wrong results, but it's not yet certain if there are other factors involved. Vanilla Spark results is
NULL
but gluten results isN
.select get_json_object(extend_attr,'$.11000022') from mart_catering.dim_deal_all_info_ss where mt_deal_id=922798418 and partition_date='2024-03-27';
The extend_attr field value is
{"142":"[{\"112\":{\"template\":{\"A\":\"a\",\"RS\":\"a\",\"NRS\":\"a\"},\"label\":{\"fromNumber\":\"\",\"rsToNumber\":\"\",\"rsFromNumber\":\"\",\"toNumber\":\"\"},\"key\":\"A\"},\"141\":[[{\"name\":\"a\a\"},{\"number\":\"1\"},{\"price\":\"218\"},{\"size\":\"6\"},{\"unit\":\"a\"},{\"form\":\"a\"},{\"type\":\"a\"},{\"thickness\":\"\"}]]}]","11000022": "N"}
Unfortunately,i cannot reproduce this issue using simple SQL like
select get_json_object(col1,'$.11000022') from values('{"142":"[{\"112\":{\"template\":{\"A\":\"a\",\"RS\":\"a\",\"NRS\":\"a\"},\"label\":{\"fromNumber\":\"\",\"rsToNumber\":\"\",\"rsFromNumber\":\"\",\"toNumber\":\"\"},\"key\":\"A\"},\"141\":[[{\"name\":\"a\a\"},{\"number\":\"1\"},{\"price\":\"218\"},{\"size\":\"6\"},{\"unit\":\"a\"},{\"form\":\"a\"},{\"type\":\"a\"},{\"thickness\":\"\"}]]}]","11000022": "N"}')
and the results isNULL
same as Vanilla Spark results.Spark version
None
Spark configurations
No response
System information
No response
Relevant logs
No response
The text was updated successfully, but these errors were encountered: