[VL] Use multiple threads in the same executor #7810

FelixYBW · 2024-11-05T00:29:09Z

Description

Velox is initially designed to use multiple threads in each context. To fit spark's working model, we limit 1 thread per context. It's needed by fallback.

Since Velox backend can fully offload lots of queries, it will be interesting to investigate multiple threads per context. We can set task.cores to more than 1 then set to Velox's backend threads. Then use larger partition and see how the performance looks like.

@zhztheplayer Is there any specific consideration on memory management side?

zhztheplayer · 2024-11-05T07:18:59Z

@zhztheplayer Is there any specific consideration on memory management side?

There should be no hard limitation from Spark's / Gluten's memory management system against using multiple threads in the same Spark task. There were only a few of lock issues that have been resolved already. The only rule is we'd report the allocations to the task's memory manager, and this is something against the approach running a background execution thread pool for all executor's tasks.

We can set task.cores to more than 1 then set to Velox's backend threads.

Velox's top-level threading model should be designed for Presto's push model. Which doesn't allow Spark's use case that creates an iterator over Velox's task. The parallelism we can easily adopt from Velox is the in-pipeline background thread pools, for example, parallel join build or scan pre-fetch. But again using the thread pools will require us to handle the memory allocation carefully speaking of Spark's memory manager model.

Having said that, there could be a hacky way to combine Spark's pull model with Velox's push model, which requires creating a queue in between them as a data broker. We can do some researches and PoCs to see if this way really brings performance advantages, if it does, we could consider doing refactors on our memory management code and other arch code to switch to it. But it's a big deal and is something more or less violates Spark's design, so we may need to think more before coding.

Yohahaha · 2024-11-05T07:26:22Z

I think we need fix and reenable spark.gluten.sql.columnar.backend.velox.IOThreads.

zhztheplayer · 2024-11-05T07:34:14Z

I think we need fix and reenable spark.gluten.sql.columnar.backend.velox.IOThreads.

I guess the option can be enabled now since we had solved the related lock problems in code (see #7243). But I may not be capable to test that in production cases. Helps are appreciated. cc @zhli1142015

zhli1142015 · 2024-11-05T07:46:14Z

I think we need fix and reenable spark.gluten.sql.columnar.backend.velox.IOThreads.

I guess the option can be enabled now since we had solved the related lock problems in code.

I saw the related bug is still open, should we close it also?
#7161

But I may not be capable to test that in production cases. Helps are appreciated. cc @zhli1142015

I think we can enable it first if the lock issue is fixed.

zhztheplayer · 2024-11-05T07:52:23Z

I saw the related bug is still open, should we close it also?

Yes, I am closing it. Could reopen once we found it still applies.

FelixYBW · 2024-11-05T21:01:14Z

I think we need fix and reenable spark.gluten.sql.columnar.backend.velox.IOThreads.

we should enable it anyway even in single thread model. It's used to solve IO bottleneck.

FelixYBW · 2024-11-05T21:01:40Z

Having said that, there could be a hacky way to combine Spark's pull model with Velox's push model, which requires creating a queue in between them as a data broker. We can do some researches and PoCs to see if this way really brings performance advantages, if it does, we could consider doing refactors on our memory management code and other arch code to switch to it. But it's a big deal and is something more or less violates Spark's design, so we may need to think more before coding.

Let's plan it in 2025.

FelixYBW added the enhancement New feature or request label Nov 5, 2024

zhztheplayer changed the title ~~[VL] use multiple threads in the same executor~~ [VL] Use multiple threads in the same executor Nov 5, 2024

zhztheplayer mentioned this issue Nov 7, 2024

[VL] Re-enable background IO threads by default #7845

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] Use multiple threads in the same executor #7810

[VL] Use multiple threads in the same executor #7810

FelixYBW commented Nov 5, 2024

zhztheplayer commented Nov 5, 2024

Yohahaha commented Nov 5, 2024

zhztheplayer commented Nov 5, 2024 •

edited

Loading

zhli1142015 commented Nov 5, 2024 •

edited

Loading

zhztheplayer commented Nov 5, 2024

FelixYBW commented Nov 5, 2024

FelixYBW commented Nov 5, 2024

[VL] Use multiple threads in the same executor #7810

[VL] Use multiple threads in the same executor #7810

Comments

FelixYBW commented Nov 5, 2024

Description

zhztheplayer commented Nov 5, 2024

Yohahaha commented Nov 5, 2024

zhztheplayer commented Nov 5, 2024 • edited Loading

zhli1142015 commented Nov 5, 2024 • edited Loading

zhztheplayer commented Nov 5, 2024

FelixYBW commented Nov 5, 2024

FelixYBW commented Nov 5, 2024

zhztheplayer commented Nov 5, 2024 •

edited

Loading

zhli1142015 commented Nov 5, 2024 •

edited

Loading