[GLUTEN-4241][VL] Add plan node to convert Vanilla spark columnar format data to Velox columnar format data #4818

boneanxs · 2024-02-29T12:43:14Z

What changes were proposed in this pull request?

Add plan node to convert Vanilla spark columnar format data to Velox columnar format data

This pr tries to convert from Spark columnar batch -> arrow Array -> velox columnar batch, all primitive types(decimal types requires: facebookincubator/velox#8957) and Map/Array types are supported.

Users can enable it by setting spark.gluten.sql.columnar.vanillaColumnarToNativeColumnar, which is by default false.

(Fixes: #4241)

How was this patch tested?

Added tests

github-actions · 2024-02-29T12:43:39Z

#4241

github-actions · 2024-02-29T12:43:47Z

Run Gluten Clickhouse CI

github-actions · 2024-03-04T12:43:12Z

Run Gluten Clickhouse CI

boneanxs · 2024-03-04T12:49:47Z

cpp/velox/tests/VeloxColumnarToColumnarTest.cc

+
+  ArrowSchema cSchema;
+  ArrowArray arrowArray;
+  exportToArrow(vector, cSchema, ArrowUtils::getBridgeOptions());


If I'm understanding well, there might be a bug that velox doesn't handle short decimal well when converting arrowArray-> rowVector

In exportToArrow, we can see it needs to convert to int128_t since it's arrow short decimal.

rows.apply([&](vector_size_t i) { int128_t value = buf.as<int64_t>()[i]; memcpy(dst + (j++) * sizeof(int128_t), &value, sizeof(int128_t)); });

But in importFromArrow, velox directly handle it as int64_t, whereas this test won't pass.

values = wrapInBufferView( arrowArray.buffers[1], arrowArray.length * type->cppSizeInBytes());

cc @PHILO-HE @zhouyuan pls correct me if I'm wrong

+1 looks a bug

I see, let me try to fix velox firstly

fix it here: facebookincubator/velox#8957

github-actions · 2024-03-04T12:54:43Z

Run Gluten Clickhouse CI

zhztheplayer

Thanks! I didn't take look in the PR in detail but overall the direction looks promising.

Also, would you like to add a little bit more explanation in PR description ? Like it supports all datatypes or just some of them, it will be turned on or off by default, what's the config key to turn it on/off, etc. That will help further users getting started with the feature quickly.

zhztheplayer · 2024-03-05T02:09:39Z

backends-velox/src/main/scala/io/glutenproject/execution/ColumnarToVeloxColumnarExec.scala

+    new ValidatorApiImpl().doSchemaValidate(schema).foreach {
+      reason =>
+        throw new UnsupportedOperationException(
+          s"Input schema contains unsupported type when performing columnar" +
+            s" to columnar for $schema " + s"due to $reason")
+    }


It looks a little bit weird to do validation during execution. Would you like to share the reason doing this?

Yea, I follow RowToVeloxColumnarExec to do the validation here, you're right, it looks weird to do so during execution, I also have the same doubt when seeing RowToVeloxColumnarExec...

Let me fix it by overriding doValidateInternal instead.

gluten-core/src/main/scala/io/glutenproject/extension/ColumnarOverrides.scala

zhztheplayer · 2024-03-05T02:49:44Z

gluten-core/src/main/scala/io/glutenproject/extension/ColumnarOverrides.scala

+      case p: RowToColumnarExecBase if p.child.isInstanceOf[ColumnarToRowExec] =>
+        val replacedChild = replaceWithColumnarToColumnar(
+          p.child.asInstanceOf[ColumnarToRowExec].child)
+        BackendsApiManager.getSparkPlanExecApiInstance.genColumnarToColumnarExec(replacedChild)


The case condition says it matches vanilla-columnar->row->gluten-columnar but it calls a general interface to do columnar->columnar. Do we need to specialize the interface name to make sure we exactly request for vanilla-columnar->gluten-columnar transition here?

The case condition here tries to match rowTocolumnar -> ColumnarToRow, for e.g.

Filter RowToVeloxColumnar ColumnarToRow FileScan xx

We tries to match RowToVeloxColumnar->ColumnarToRow here with ColumnarToColumnar

Thanks for the explanation.

If it's for ColumnarToRow + RowToVeloxColumnar,

Would it be better to rename the method genColumnarToColumnarExec with genVanillaColumnarToGlutenColumnarExec or something?

So it's clearer that we are not doing Gluten -> Vanilla c2c transition here.

FelixYBW · 2024-03-05T06:13:16Z

Do you convert to Velox format directly? or convert to Arrow then to Velox?

boneanxs · 2024-03-05T10:30:26Z

Do you convert to Velox format directly? or convert to Arrow then to Velox?

Convert to Arrow firstly, then to velox

FelixYBW · 2024-03-05T17:57:35Z

Do you convert to Velox format directly? or convert to Arrow then to Velox?

Convert to Arrow firstly, then to velox

make sense. We may upstream the parquet columnar format => arrow format to Spark

github-actions · 2024-03-06T06:35:51Z

Run Gluten Clickhouse CI

github-actions · 2024-03-07T03:05:16Z

Run Gluten Clickhouse CI

zhztheplayer · 2024-03-07T05:27:21Z

...nds-velox/src/main/scala/io/glutenproject/execution/VanillaColumnarToVeloxColumnarExec.scala

+        if (arrowArray != null) {
+          arrowArray.release()
+          arrowArray.close()
+          arrowConverter.reset()
+          arrowArray = null
+        }


Seems to be that the file includes a bunch of manual life cycle management calls like _.close _.release _.reset, etc. Would you like to review them to see if we can eliminate some? I recall that recycleIterator / recyclePayloads can do some of the clean up jobs automatically.

It's a non-blocking comment so feel free to do that in another PR. But anyway we'd try best to make code here easier to maintain from further developers.

We need to reset arrowArray and arrowConverter when iterate each item, so recyclePayload is reasonable, but it only passes the result of the next(), which is ColumnarBatch, we cannot pass arrowArray to close them.

So here instead I add an extra method releaseArrowBuf to release them to reduce duplicates.

zhztheplayer

Overall it's good to me. Just a non-blocking comment. Thanks!

github-actions · 2024-03-07T07:16:13Z

Run Gluten Clickhouse CI

github-actions · 2024-03-07T08:22:57Z

Run Gluten Clickhouse CI

FelixYBW · 2024-03-07T19:05:10Z

@zhztheplayer can you check how the memory is allocated during the conversion? Where the arrow memory is allocated? how many memcpy during the conversion? Is there onheap=>offheap copy?

zhztheplayer · 2024-03-07T23:32:11Z

@zhztheplayer can you check how the memory is allocated during the conversion? Where the arrow memory is allocated? how many memcpy during the conversion? Is there onheap=>offheap copy?

@boneanxs If you'd like to address the questions also, thanks.

I believe the patch ~~reused our old ArrowWritableColumnarVector code~~ to write Spark columnar data to native so there should be a bunch of "onheap => offheap" copies. And we should count on how much of copies the implementation exactly does ideally. @boneanxs You can also check on this part.

~~What I was worried about is ArrowWritableColumnarVector have not actually been under active maintenance for a period of time~~ so we should have more tests here especially for complex data types.

Also would be great if you could share thoughts about the risk of memory leaks this approach may bring @boneanxs . Overall the PR's writing looks fine to me and we had removed most of the unsafe APIs but still there might be some. Let's check this part carefully too.

FelixYBW · 2024-03-08T01:27:13Z

Let's document the conversion clearly here. I have a impression that parquet-mr can take use of offheap memory for columnar data. If so the best case is that we can avoid any memcpy during the conversion (not considered about arrow=>velox conversion). But we will needs some work to avoid it.

If parquet scan is onheap, we can't avoid the onheap->offheap copy, but then we should reuse the offheap buffer.

If we can't make sure how the conversion happens, let's use memcpy as long we need, make sure it's 100% no memleak for now.

boneanxs · 2024-03-08T10:08:01Z

@zhztheplayer can you check how the memory is allocated during the conversion? Where the arrow memory is allocated? how many memcpy during the conversion? Is there onheap=>offheap copy?

Hey, @FelixYBW for each columnar batch, before the conversion, this pr tries to allocate offheap memory to perform spark columnar batch -> arrow array(here this pr doesn't treat onheap/offheap spark columnar batch, it uses ArrowFieldWriter(which is implemented by spark) to do this transformation, can see ArrowColumnarBatchConverter#write, which is this pr newly added).

So yes, there will be one memcpy from spark columnar batch -> arrow array, no matter spark columnar batch is on heap or off heap.

In native side, I simply uses importFromArrow to convert arrow array to velox columnar batch, there will be some memcpy either for String, timestamp, shortDecimal, etc(I'm not fully checked).

What I was worried about is ArrowWritableColumnarVector have not actually been under active maintenance for a period of time so we should have more tests here especially for complex data types.

Sry @zhztheplayer, I might miss something, are you saying ArrowFieldWriter? This pr uses ArrowFieldWriter to do the conversion, it's wildly used by pyspark, so reusing it here should be safe.

Also would be great if you could share thoughts about the risk of memory leaks this approach may bring @boneanxs .

The extra memory here acquired are arrowArray, cSchema, velox columnar batch, and they're all well handled by TaskResources and recyclePayload, arrowArray and velox columnar batch will be released during each iterator, and cSchema will be released until the iterator ends, TaskResources is the extra ensureance that all allocated memory will be released if the iterator interrupted abnormally.

I have a impression that parquet-mr can take use of offheap memory for columnar data. If so the best case is that we can avoid any memcpy during the conversion

I'm not sure whether parquet-mr can directly use offheap memory, but spark can export parquet data as offheap columnar batch by enabling spark.sql.columnVector.offheap.enabled, but directly reuse offheap columnar batch to arrow array is not supported in this pr(by the way, can we do so? Not sure abt this)

I can also try to do benchmarks comparing with VanillarColumnarBatchToRow->RowToVeloxColumnar if necessary, I think at least for memcpy, RowToVeloxColumnar also does an extra copy even if the row might from an offheap ColumnarBatch

zhztheplayer · 2024-03-08T14:33:40Z

This pr uses ArrowFieldWriter to do the conversion, it's wildly used by pyspark, so reusing it here should be safe.

Ahh then it's fine enough. Some of the code is so similar to Gluten's ArrowWritableColumnarVector which had pasted some code from vanilla Spark so I was led by wrong intuition. Sorry for the mistake.

FelixYBW · 2024-03-08T18:11:10Z

does an extra copy even if the row might from an off

Thank you for explanation. You may try to enable spark.sql.columnVector.offheap.enabled.

onheap to offheap memcpy is more expensive than offheap to offheap. Do you know how ArrowFieldWriter allocate memory? is it from unsafe API or direct memory?

@zhztheplayer Do you know?

boneanxs · 2024-03-12T10:54:47Z

Hey @FelixYBW ArrowFieldWriter calls arrow ValueVector, which is internally uses ArrowBuf to store values, so it should be offheap memory.
https://github.com/apache/spark/blob/1f58f4c68e8de03a8b4c314488dd4f342beb8de2/sql/catalyst/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala#L53

for example, IntVector
https://github.com/apache/arrow/blob/b202ede131e3c54628616330162f7854ba0c0d70/java/vector/src/main/java/org/apache/arrow/vector/IntVector.java#L151

ValueBuffer is allocated from arrow ArrowBuf
https://github.com/apache/arrow/blob/b202ede131e3c54628616330162f7854ba0c0d70/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java#L73

zhztheplayer · 2024-04-10T02:11:34Z

I think it's OK to have it disabled by default.

@boneanxs Can you add a CI case for the feature to run TPC-H / TPC-DS tests?

Example:

incubator-gluten/.github/workflows/velox_docker.yml

Line 293 in 8ab9b01

run-tpc-test-ubuntu-randomkill:

You can use gluten-it arg --extra-conf=... to enable C2C during testing.

Also if -s=30.0 (SF 30) is too large, we can make it -s=1.0.

And sorry for late response. Let's have this merged asap.

FelixYBW · 2024-04-10T02:14:30Z

Oh, just noted the PR is still open and have many conflict. @boneanxs would you like to continue?

boneanxs · 2024-04-10T06:49:46Z

Hey @FelixYBW @zhztheplayer, yea, I'm willing to continue this pr. After last comment, I actually have run some benchmarks in my local environment, and found there's no obviously improvement comparing VanillaColumnar->row->VeloxColumnar(Also I notice there's a compatible issue since this pr relies on spark arrow version, which is conflict with gluten). I'm still checking why this(performance not improved) could happen(but didn't catch time recently), will update here once I'm done other things.

zhouyuan · 2024-05-20T01:46:26Z

@boneanxs hi, could you please help to do rebase? There's a big commit(renaming io.glutneproject to org.apache.gluten) when we migrate to apache repo.

thanks
-yuan

github-actions · 2024-07-04T01:48:39Z

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

zhztheplayer · 2024-07-12T00:55:56Z

comment to keep the PR open as it could be a valuable topic

github-actions · 2024-09-09T01:56:46Z

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions · 2024-09-19T01:56:50Z

This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it is still valid. Thanks.

boneanxs marked this pull request as draft February 29, 2024 12:43

boneanxs force-pushed the c2c branch from 48caff2 to 7f3709f Compare March 4, 2024 12:42

boneanxs marked this pull request as ready for review March 4, 2024 12:42

boneanxs commented Mar 4, 2024

View reviewed changes

boneanxs force-pushed the c2c branch from 7f3709f to 2a66ebb Compare March 4, 2024 12:54

zhztheplayer reviewed Mar 5, 2024

View reviewed changes

zhouyuan assigned zhztheplayer Mar 5, 2024

zhztheplayer reviewed Mar 5, 2024

View reviewed changes

gluten-core/src/main/scala/io/glutenproject/extension/ColumnarOverrides.scala Outdated Show resolved Hide resolved

zhztheplayer reviewed Mar 5, 2024

View reviewed changes

zhouyuan mentioned this pull request Mar 5, 2024

[VL] Result mismatch issues Tracker #4652

Open

31 tasks

boneanxs force-pushed the c2c branch from 2a66ebb to 8bb437a Compare March 6, 2024 06:35

boneanxs added 4 commits March 6, 2024 22:04

Support Spark ColumnarBatch to ArrowArray

7784e72

Add c2c support

58f4e6f

Cover more tests

017c99e

Address comments

502cf62

boneanxs force-pushed the c2c branch from 8bb437a to 9891359 Compare March 7, 2024 03:04

boneanxs requested a review from zhztheplayer March 7, 2024 03:07

zhztheplayer reviewed Mar 7, 2024

View reviewed changes

zhztheplayer previously approved these changes Mar 7, 2024

View reviewed changes

boneanxs dismissed zhztheplayer’s stale review via dc03b33 March 7, 2024 07:15

boneanxs force-pushed the c2c branch from 9891359 to dc03b33 Compare March 7, 2024 07:15

Address comments

5bf0e5b

boneanxs force-pushed the c2c branch from dc03b33 to 5bf0e5b Compare March 7, 2024 08:22

github-actions bot added the stale stale label Jul 4, 2024

github-actions bot removed the stale stale label Jul 26, 2024

github-actions bot added the stale stale label Sep 9, 2024

github-actions bot closed this Sep 19, 2024

[GLUTEN-4241][VL] Add plan node to convert Vanilla spark columnar format data to Velox columnar format data #4818

[GLUTEN-4241][VL] Add plan node to convert Vanilla spark columnar format data to Velox columnar format data #4818

Conversation

boneanxs commented Feb 29, 2024 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

github-actions bot commented Feb 29, 2024

github-actions bot commented Feb 29, 2024

github-actions bot commented Mar 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Mar 4, 2024

zhztheplayer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FelixYBW commented Mar 5, 2024

boneanxs commented Mar 5, 2024

FelixYBW commented Mar 5, 2024

github-actions bot commented Mar 6, 2024

github-actions bot commented Mar 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhztheplayer left a comment

Choose a reason for hiding this comment

github-actions bot commented Mar 7, 2024

github-actions bot commented Mar 7, 2024

FelixYBW commented Mar 7, 2024

zhztheplayer commented Mar 7, 2024 • edited Loading

FelixYBW commented Mar 8, 2024

boneanxs commented Mar 8, 2024 • edited Loading

zhztheplayer commented Mar 8, 2024

FelixYBW commented Mar 8, 2024

boneanxs commented Mar 12, 2024

zhztheplayer commented Apr 10, 2024 • edited Loading

FelixYBW commented Apr 10, 2024

boneanxs commented Apr 10, 2024

zhouyuan commented May 20, 2024

github-actions bot commented Jul 4, 2024

zhztheplayer commented Jul 12, 2024

github-actions bot commented Sep 9, 2024

github-actions bot commented Sep 19, 2024

boneanxs commented Feb 29, 2024 •

edited

Loading

zhztheplayer commented Mar 7, 2024 •

edited

Loading

boneanxs commented Mar 8, 2024 •

edited

Loading

zhztheplayer commented Apr 10, 2024 •

edited

Loading