Memory allocation in Native SQL engine

Java memory allocation

By default, Arrow columnar vector Java API is using netty pooledbytebuffer allocator, which will try to hold on the "free memory" by not returning back to System immediately for better performance. This will result big memory footprint on operators relying on this API, e.g., CoalesceBatches. We changed to use unsafe API since 1.2 release, which means the freed memory will be returned to system directly. Performance tests showed the performance of this change is negatable.

Native memory allocation

Modern memory allocators like jemalloc will not return the just freed memory to system to achieve better performance, also a set of memory allocation pools will be used to reduce the lock contention. Recent version of glibc also used similar design - by default it has 8 * cores pools, each with 64MB size. This will introduce a big memory footprint. Jemalloc's pool is smaller relatively - it has 4 * cores pools, each with 2MB size. Both glibc/jemalloc provides turning knobs to control these behaviors. Using lower number of pools can reduce the memory footprint, but also may impact the performance. It's a tradeoff on memory vs. performance.

Turnings to reduce memory footprint

Native SQL engine 1.2+
Using jemalloc in Arrow build

-DARROW_USEJEMALLOC=True

Build Native SQL engine with arrow-unsafe profile

mvn clean pacakge -DskipTests -Parrow-unsafe

LD_PRELOAD jemalloc in each Spark executor

--conf spark.executorEnv.LD_PRELOAD=/path/to/libjemalloc.so

Reduce number memory pools in jemalloc

--conf spark.executorEnv.MALLOC_CONF=narenas:2

Reduce number memory pools in glibc

 --conf spark.executorEnv.MALLOC_ARENA_MAX=2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory.md

memory.md

Memory allocation in Native SQL engine

Java memory allocation

Native memory allocation

Turnings to reduce memory footprint

Files

memory.md

Latest commit

History

memory.md

File metadata and controls

Memory allocation in Native SQL engine

Java memory allocation

Native memory allocation

Turnings to reduce memory footprint