-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-34216: [Python] Support for reading JSON Datasets With Python #34586
Conversation
…into ARROW-34216 merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! I have a few questions / suggestions. Also, we will need to add some unit tests for this feature.
Co-authored-by: Weston Pace <[email protected]>
Co-authored-by: Weston Pace <[email protected]>
Hi, I have added some tests, which references the csv tests. Please review them when you are free |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests look good, thank you for adding them. One small suggestion I think to improve them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed some lint changes. Assuming CI passes I think this is good to go.
Benchmark runs are scheduled for baseline = 0434ab6 and contender = 4963105. 4963105 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
['Python', 'R'] benchmarks have high level of regressions. |
apache#34586) This PR supports for reading JSON Datasets With Python. As mentioned in [apache#34216](apache#34216), only the reading ability are supported. Please compare the difference between my implemenation of _json.pyx, _json.pyd and _csv.pyx _csv.pyd. Cause _csv.pyd utilize pointer for cpp class and my implementation doesn't. **What changes are included in this PR?** C++: add inclusion for file_json.h Python: reference C++ codes and support reading JSON Datasets **Are these changes tested?** Yes 6 test samples added in tests/test_dataset.py * Closes: apache#34216 Lead-authored-by: Junming Chen <[email protected]> Co-authored-by: Weston Pace <[email protected]> Signed-off-by: Weston Pace <[email protected]>
apache#34586) This PR supports for reading JSON Datasets With Python. As mentioned in [apache#34216](apache#34216), only the reading ability are supported. Please compare the difference between my implemenation of _json.pyx, _json.pyd and _csv.pyx _csv.pyd. Cause _csv.pyd utilize pointer for cpp class and my implementation doesn't. **What changes are included in this PR?** C++: add inclusion for file_json.h Python: reference C++ codes and support reading JSON Datasets **Are these changes tested?** Yes 6 test samples added in tests/test_dataset.py * Closes: apache#34216 Lead-authored-by: Junming Chen <[email protected]> Co-authored-by: Weston Pace <[email protected]> Signed-off-by: Weston Pace <[email protected]>
apache#34586) This PR supports for reading JSON Datasets With Python. As mentioned in [apache#34216](apache#34216), only the reading ability are supported. Please compare the difference between my implemenation of _json.pyx, _json.pyd and _csv.pyx _csv.pyd. Cause _csv.pyd utilize pointer for cpp class and my implementation doesn't. **What changes are included in this PR?** C++: add inclusion for file_json.h Python: reference C++ codes and support reading JSON Datasets **Are these changes tested?** Yes 6 test samples added in tests/test_dataset.py * Closes: apache#34216 Lead-authored-by: Junming Chen <[email protected]> Co-authored-by: Weston Pace <[email protected]> Signed-off-by: Weston Pace <[email protected]>
This PR supports for reading JSON Datasets With Python. As mentioned in #34216, only the reading ability are supported.
Please compare the difference between my implemenation of _json.pyx, _json.pyd and _csv.pyx _csv.pyd.
Cause _csv.pyd utilize pointer for cpp class and my implementation doesn't.
What changes are included in this PR?
C++: add inclusion for file_json.h
Python: reference C++ codes and support reading JSON Datasets
Are these changes tested?
Yes
6 test samples added in tests/test_dataset.py