[SPARK-44259][CONNECT][TESTS] Make `connect-client-jvm` pass on Java 21 except `RemoteSparkSession`-based tests #41805

LuciferYang · 2023-06-30T03:36:39Z

What changes were proposed in this pull request?

This pr ignore all tests inherit RemoteSparkSession as default for Java 21 by override the test function in RemoteSparkSession, they are all arrow-based tests due to the use of arrow data format for rpc communication in connect.

23/06/30 11:45:41 ERROR SparkConnectService: Error during: execute. UserId: . SessionId: e7479b73-d02c-47e9-85c8-40b3e9315561.
java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available
	at org.apache.arrow.memory.util.MemoryUtil.directBuffer(MemoryUtil.java:174)
	at org.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:229)
	at org.apache.arrow.memory.ArrowBuf.nioBuffer(ArrowBuf.java:224)
	at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:133)
	at org.apache.arrow.vector.ipc.message.MessageSerializer.writeBatchBuffers(MessageSerializer.java:303)
	at org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:276)
	at org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:237)
	at org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchWithSchemaIterator.$anonfun$next$3(ArrowConverters.scala:174)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1487)
	at org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchWithSchemaIterator.next(ArrowConverters.scala:181)
	at org.apache.spark.sql.execution.arrow.ArrowConverters$ArrowBatchWithSchemaIterator.next(ArrowConverters.scala:128)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at org.apache.spark.sql.connect.service.SparkConnectStreamHandler$.processAsArrowBatches(SparkConnectStreamHandler.scala:178)
	at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:104)
	at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$1(SparkConnectStreamHandler.scala:86)
	at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.$anonfun$handle$1$adapted(SparkConnectStreamHandler.scala:53)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$3(SessionHolder.scala:152)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:857)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$2(SessionHolder.scala:152)
	at org.apache.spark.JobArtifactSet$.withActive(JobArtifactSet.scala:109)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withContext$1(SessionHolder.scala:122)
	at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:209)
	at org.apache.spark.sql.connect.service.SessionHolder.withContext(SessionHolder.scala:121)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$1(SessionHolder.scala:151)
	at org.apache.spark.sql.connect.service.SessionHolder.withSessionBasedPythonPaths(SessionHolder.scala:137)
	at org.apache.spark.sql.connect.service.SessionHolder.withSession(SessionHolder.scala:150)
	at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handle(SparkConnectStreamHandler.scala:53)
	at org.apache.spark.sql.connect.service.SparkConnectService.executePlan(SparkConnectService.scala:166)
	at org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:584)
	at org.sparkproject.connect.grpc.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
	at org.sparkproject.connect.grpc.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:346)
	at org.sparkproject.connect.grpc.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:860)
	at org.sparkproject.connect.grpc.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at org.sparkproject.connect.grpc.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

All ignored test related to apache/arrow#35053, so we should wait for upgrading to a new arrow version and re-enable them for Java 21, the following TODO JIRA is created for that.

Reenable Arrow-based connect tests in Java 21: https://issues.apache.org/jira/browse/SPARK-44121

Why are the changes needed?

Make Java 21 daily test can monitor other non-arrow based tests.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass GitHub Actions
manually tests with Java 21:

java -version
openjdk version "21-ea" 2023-09-19
OpenJDK Runtime Environment Zulu21+65-CA (build 21-ea+26)
OpenJDK 64-Bit Server VM Zulu21+65-CA (build 21-ea+26, mixed mode, sharing)

build/sbt "connect-client-jvm/test" -Phive

[info] Run completed in 4 seconds, 640 milliseconds.
[info] Total number of tests run: 846
[info] Suites: completed 22, aborted 0
[info] Tests: succeeded 846, failed 0, canceled 167, ignored 1, pending 0
[info] All tests passed.

HyukjinKwon · 2023-06-30T03:51:54Z

Are there a lot of tests that fail with 21? I start to feel like we should just fix them instead of skipping individual tests.

LuciferYang · 2023-06-30T03:53:21Z

Are there a lot of tests that fail with 21? I start to feel like we should just fix them instead of skipping individual tests.

we can't fix them now, we need wait a new arrow version

LuciferYang · 2023-06-30T03:57:25Z

@HyukjinKwon All ignored tests related to apache/arrow#35053 and @dongjoon-hyun just fixed it, but arrow has not released a new version yet

LuciferYang · 2023-06-30T03:59:21Z

also cc @dongjoon-hyun FYI

dongjoon-hyun · 2023-06-30T04:06:49Z

Yes, I fixed the root cause in Arrow-side. SPARK-44247 will upgrade to Apache Arrow 13.0.0 which will recover all ignored tests. The remaining problem is that Apache Arrow 13.0.0 release is expected on August. We will see.

For this PR, I didn't choose this approach because this way was a little too intrusive.

BTW, @LuciferYang . connect module already passed after SPARK-44122. Could you remove the following from the PR description?

build/sbt "connect-/test" -Phive

LuciferYang · 2023-06-30T04:12:49Z

BTW, @LuciferYang . connect module already passed after SPARK-44122. Could you remove the following from the PR description?

Already removed ~

dongjoon-hyun · 2023-06-30T04:19:15Z

@LuciferYang . ~~For this one, shall you add a new test tag, ExtendedArrowTest, and use it?~~

dongjoon-hyun · 2023-06-30T04:22:21Z

Never mind~

dongjoon-hyun · 2023-06-30T04:23:16Z

+1 from my side. However, please follow @HyukjinKwon 's opinion.

LuciferYang · 2023-06-30T04:31:34Z

For this PR, I didn't choose this approach because this way was a little too intrusive.

Please wait a moment, let me try a new way to reduce code changes

LuciferYang · 2023-06-30T04:44:35Z

.../client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala

+   * SPARK-44259: override test function to skip `RemoteSparkSession-based` tests as default,
+   * we should delete this function after SPARK-44121 is completed.
+   */
+  override protected def test(testName: String, testTags: Tag*)(testFun: => Any)


@dongjoon-hyun @HyukjinKwon the new code override test function, all RemoteSparkSession-based tests using Java 21 are ignored by default, so there is no need to add the assume condition to test case one by one

Does this seem less intrusive ?

dongjoon-hyun

+1, LGTM. Yes, this looks much better.

dongjoon-hyun · 2023-07-01T00:31:00Z

Merged to master for Apache Spark 3.5.0.
Thank you, @LuciferYang , @yaooqinn , @HyukjinKwon .

HyukjinKwon

LGTM2

Midhunpottammal · 2024-03-05T09:52:03Z

Merged to master for Apache Spark 3.5.0. Thank you, @LuciferYang , @yaooqinn , @HyukjinKwon .

while java Java(TM) SE Runtime Environment (build 21.0.2+13-LTS-58)
Spark 3.5.0 giving error

Previous exception in task: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available org.apache.arrow.memory.util.MemoryUtil.directBuffer(MemoryUtil.java:174) org.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:229) org.apache.arrow.memory.ArrowBuf.nioBuffer(ArrowBuf.java:224) org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:133) org.apache.arrow.vector.ipc.message.MessageSerializer.writeBatchBuffers(MessageSerializer.java:303)

what is the solution for this : Arrow can work with Aparch spark 3.5.0 in java 21

code :

spark = SparkSession.builder \ .appName("ArrowPySparkExample") \ .getOrCreate() spark.conf.set("Dio.netty.tryReflectionSetAccessible", "true") spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true") pdf = pd.DataFrame(["midhun"]) df = spark.createDataFrame(pdf) result_pdf = df.select("*").toPandas()

LuciferYang · 2024-03-06T02:18:32Z

@Midhunpottammal Spark 3.5 has not announced support for Java 21, this feature is likely to be released in Spark 4.0 :)

dongjoon-hyun · 2024-03-06T05:27:44Z

Ya, @LuciferYang is right.

To @Midhunpottammal , you need SPARK-43831 for Java 21 support.

Midhunpottammal · 2024-03-07T06:04:25Z

@dongjoon-hyun @LuciferYang Thank you,

I experimented with different versions of Java, Spark, and Arrow.I managed to get Arrow working with a lower version of Java in Spark 3.5.0. Here's my stack:

pyarrow==15.0.0
pyspark==3.5.0
java == Java(TM) SE Runtime Environment (build 17.0.10+11-LTS-240)

When I try to move to Java version 21, I encounter the same error

LuciferYang · 2024-03-07T06:14:47Z

Ya, @LuciferYang is right.

To @Midhunpottammal , you need SPARK-43831 for Java 21 support.

@Midhunpottammal As @dongjoon-hyun said, all the relevant patches in SPARK-43831 are needed for Java 21 support. So this is not a job that can be accomplished with minor changes on Spark 3.5.

ignore

9ab839b

github-actions bot added SQL STRUCTURED STREAMING CONNECT labels Jun 30, 2023

LuciferYang changed the title ~~[SPARK-44259][CONNECT][TESTS] Ignore all Arrow-based connect tests for Java 21~~ [SPARK-44259][CONNECT][TESTS] Ignore all tests inherit RemoteSparkSession for Java 21 Jun 30, 2023

dongjoon-hyun changed the title ~~[SPARK-44259][CONNECT][TESTS] Ignore all tests inherit RemoteSparkSession for Java 21~~ [SPARK-44259][CONNECT][TESTS] Make connect-client-jvm pass on Java 21 except RemoteSparkSession-based tests Jun 30, 2023

dongjoon-hyun approved these changes Jun 30, 2023

View reviewed changes

override test function in RemoteSparkSession

6a1fbd5

github-actions bot removed the STRUCTURED STREAMING label Jun 30, 2023

LuciferYang commented Jun 30, 2023

View reviewed changes

LuciferYang added 2 commits June 30, 2023 12:51

revert change of ReplE2ESuite

3600238

format code

bcce9d2

yaooqinn approved these changes Jun 30, 2023

View reviewed changes

dongjoon-hyun approved these changes Jul 1, 2023

View reviewed changes

dongjoon-hyun closed this in 8c635a0 Jul 1, 2023

HyukjinKwon reviewed Jul 3, 2023

View reviewed changes

LuciferYang mentioned this pull request Jul 5, 2023

[SPARK-44259][CONNECT][TESTS][FOLLOWUP] No longer initializing Ammonite for Java 21 #41814

Closed

Midhunpottammal mentioned this pull request Mar 8, 2024

Apache Arrow with Apache Spark - UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer not available apache/arrow#40287

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-44259][CONNECT][TESTS] Make `connect-client-jvm` pass on Java 21 except `RemoteSparkSession`-based tests #41805

[SPARK-44259][CONNECT][TESTS] Make `connect-client-jvm` pass on Java 21 except `RemoteSparkSession`-based tests #41805

LuciferYang commented Jun 30, 2023 •

edited

Loading

HyukjinKwon commented Jun 30, 2023

LuciferYang commented Jun 30, 2023

LuciferYang commented Jun 30, 2023 •

edited

Loading

LuciferYang commented Jun 30, 2023

dongjoon-hyun commented Jun 30, 2023 •

edited

Loading

LuciferYang commented Jun 30, 2023

dongjoon-hyun commented Jun 30, 2023 •

edited

Loading

dongjoon-hyun commented Jun 30, 2023

dongjoon-hyun commented Jun 30, 2023

LuciferYang commented Jun 30, 2023

LuciferYang Jun 30, 2023 •

edited

Loading

LuciferYang Jun 30, 2023 •

edited

Loading

dongjoon-hyun left a comment

dongjoon-hyun commented Jul 1, 2023

HyukjinKwon left a comment

Midhunpottammal commented Mar 5, 2024 •

edited

Loading

LuciferYang commented Mar 6, 2024

dongjoon-hyun commented Mar 6, 2024

Midhunpottammal commented Mar 7, 2024

LuciferYang commented Mar 7, 2024

[SPARK-44259][CONNECT][TESTS] Make connect-client-jvm pass on Java 21 except RemoteSparkSession-based tests #41805

[SPARK-44259][CONNECT][TESTS] Make connect-client-jvm pass on Java 21 except RemoteSparkSession-based tests #41805

Conversation

LuciferYang commented Jun 30, 2023 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

HyukjinKwon commented Jun 30, 2023

LuciferYang commented Jun 30, 2023

LuciferYang commented Jun 30, 2023 • edited Loading

LuciferYang commented Jun 30, 2023

dongjoon-hyun commented Jun 30, 2023 • edited Loading

LuciferYang commented Jun 30, 2023

dongjoon-hyun commented Jun 30, 2023 • edited Loading

dongjoon-hyun commented Jun 30, 2023

dongjoon-hyun commented Jun 30, 2023

LuciferYang commented Jun 30, 2023

LuciferYang Jun 30, 2023 • edited Loading

Choose a reason for hiding this comment

LuciferYang Jun 30, 2023 • edited Loading

Choose a reason for hiding this comment

dongjoon-hyun left a comment

Choose a reason for hiding this comment

dongjoon-hyun commented Jul 1, 2023

HyukjinKwon left a comment

Choose a reason for hiding this comment

Midhunpottammal commented Mar 5, 2024 • edited Loading

LuciferYang commented Mar 6, 2024

dongjoon-hyun commented Mar 6, 2024

Midhunpottammal commented Mar 7, 2024

LuciferYang commented Mar 7, 2024

[SPARK-44259][CONNECT][TESTS] Make `connect-client-jvm` pass on Java 21 except `RemoteSparkSession`-based tests #41805

[SPARK-44259][CONNECT][TESTS] Make `connect-client-jvm` pass on Java 21 except `RemoteSparkSession`-based tests #41805

LuciferYang commented Jun 30, 2023 •

edited

Loading

LuciferYang commented Jun 30, 2023 •

edited

Loading

dongjoon-hyun commented Jun 30, 2023 •

edited

Loading

dongjoon-hyun commented Jun 30, 2023 •

edited

Loading

LuciferYang Jun 30, 2023 •

edited

Loading

LuciferYang Jun 30, 2023 •

edited

Loading

Midhunpottammal commented Mar 5, 2024 •

edited

Loading