-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][DO-NOT-MERGE] Reference PR for flatMapGroupsWithState in PySpark #37863
Closed
HeartSaVioR
wants to merge
44
commits into
apache:master
from
HeartSaVioR:WIP-applyinpandaswithstate-jungtaek-pipeline-binpack
Closed
[WIP][DO-NOT-MERGE] Reference PR for flatMapGroupsWithState in PySpark #37863
HeartSaVioR
wants to merge
44
commits into
apache:master
from
HeartSaVioR:WIP-applyinpandaswithstate-jungtaek-pipeline-binpack
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
HeartSaVioR
added a commit
that referenced
this pull request
Sep 14, 2022
…PythonArrowOutput ### What changes were proposed in this pull request? This PR proposes to change PythonArrowInput and PythonArrowOutput to be more generic to cover the complex data type on both input and output. This is a baseline work for #37863. ### Why are the changes needed? The traits PythonArrowInput and PythonArrowOutput can be further generalized to cover complex data type on both input and output. E.g. Not all operators would have simple InternalRow as input data to pass to Python worker and vice versa for output data. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes #37864 from HeartSaVioR/SPARK-40414. Authored-by: Jungtaek Lim <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>
…ded only when there is no data
…l tests are failing yet (and the existing test code is hard to make it work with multiple calls)
… as it bothers MiMa
HeartSaVioR
force-pushed
the
WIP-applyinpandaswithstate-jungtaek-pipeline-binpack
branch
from
September 14, 2022 05:52
ce85c20
to
d22d7db
Compare
HeartSaVioR
force-pushed
the
WIP-applyinpandaswithstate-jungtaek-pipeline-binpack
branch
from
September 14, 2022 05:56
de9636d
to
e60408f
Compare
HeartSaVioR
added a commit
that referenced
this pull request
Sep 15, 2022
…ickled PySpark Row to JVM Row ### What changes were proposed in this pull request? This PR adds toJVMRow in PythonSQLUtils to convert pickled PySpark Row to JVM Row. Co-authored with HyukjinKwon . This is a breakdown PR of #37863. ### Why are the changes needed? This change will be leveraged in [SPARK-40434](https://issues.apache.org/jira/browse/SPARK-40434). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A. We will make sure test suites are constructed via E2E manner under [SPARK-40431](https://issues.apache.org/jira/browse/SPARK-40431). Closes #37891 from HeartSaVioR/SPARK-40433. Lead-authored-by: Jungtaek Lim <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>
HeartSaVioR
added a commit
that referenced
this pull request
Sep 15, 2022
…out in PySpark ### What changes were proposed in this pull request? This PR introduces GroupStateImpl and GroupStateTimeout in PySpark, and updates Scala codebase to support convenient conversion between PySpark implementation and Scala implementation. Co-authored with HyukjinKwon . This is a breakdown PR of #37863. ### Why are the changes needed? This change will be leveraged in SPARK-40434. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A. We will make sure test suites are constructed via E2E manner under SPARK-40431. Closes #37889 from HeartSaVioR/SPARK-40432. Lead-authored-by: Jungtaek Lim <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>
LuciferYang
pushed a commit
to LuciferYang/spark
that referenced
this pull request
Sep 20, 2022
…ickled PySpark Row to JVM Row ### What changes were proposed in this pull request? This PR adds toJVMRow in PythonSQLUtils to convert pickled PySpark Row to JVM Row. Co-authored with HyukjinKwon . This is a breakdown PR of apache#37863. ### Why are the changes needed? This change will be leveraged in [SPARK-40434](https://issues.apache.org/jira/browse/SPARK-40434). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A. We will make sure test suites are constructed via E2E manner under [SPARK-40431](https://issues.apache.org/jira/browse/SPARK-40431). Closes apache#37891 from HeartSaVioR/SPARK-40433. Lead-authored-by: Jungtaek Lim <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>
LuciferYang
pushed a commit
to LuciferYang/spark
that referenced
this pull request
Sep 20, 2022
…out in PySpark ### What changes were proposed in this pull request? This PR introduces GroupStateImpl and GroupStateTimeout in PySpark, and updates Scala codebase to support convenient conversion between PySpark implementation and Scala implementation. Co-authored with HyukjinKwon . This is a breakdown PR of apache#37863. ### Why are the changes needed? This change will be leveraged in SPARK-40434. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A. We will make sure test suites are constructed via E2E manner under SPARK-40431. Closes apache#37889 from HeartSaVioR/SPARK-40432. Lead-authored-by: Jungtaek Lim <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]>
Closing this since we only have test suite PR as remaining one now. |
dongjoon-hyun
pushed a commit
that referenced
this pull request
Nov 4, 2023
### What changes were proposed in this pull request? This pr upgrade Apache Arrow from 13.0.0 to 14.0.0. ### Why are the changes needed? The Apache Arrow 14.0.0 release brings a number of enhancements and bug fixes. In terms of bug fixes, the release addresses several critical issues that were causing failures in integration jobs with Spark([GH-36332](apache/arrow#36332)) and problems with importing empty data arrays([GH-37056](apache/arrow#37056)). It also optimizes the process of appending variable length vectors([GH-37829](apache/arrow#37829)) and includes C++ libraries for MacOS AARCH 64 in Java-Jars([GH-38076](apache/arrow#38076)). The new features and improvements focus on enhancing the handling and manipulation of data. This includes the introduction of DefaultVectorComparators for large types([GH-25659](apache/arrow#25659)), support for extended expressions in ScannerBuilder([GH-34252](apache/arrow#34252)), and the exposure of the VectorAppender class([GH-37246](apache/arrow#37246)). The release also brings enhancements to the development and testing process, with the CI environment now using JDK 21([GH-36994](apache/arrow#36994)). In addition, the release introduces vector validation consistent with C++, ensuring consistency across different languages([GH-37702](apache/arrow#37702)). Furthermore, the usability of VarChar writers and binary writers has been improved with the addition of extra input methods([GH-37705](apache/arrow#37705)), and VarCharWriter now supports writing from `Text` and `String`([GH-37706](apache/arrow#37706)). The release also adds typed getters for StructVector, improving the ease of accessing data([GH-37863](apache/arrow#37863)). The full release notes as follows: - https://arrow.apache.org/release/14.0.0.html ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #43650 from LuciferYang/arrow-14. Lead-authored-by: yangjie01 <[email protected]> Co-authored-by: YangJie <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
DO-NOT-MERGE! May need to split the PR to multiple reviewable size of PRs. Use this only for reference.
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
How was this patch tested?