Skip to content
Paul Rogers edited this page Jan 23, 2017 · 15 revisions

Memory management in Drill is a complex topic. Here we describe just enough to explain a serious design flaw that impairs our ability to implement effective resource management for memory. We come at the topic from three directions:

  • Explanation of Drill's memory allocator
  • Explanation of Drill batch sizes
  • Explanation of how the above two are in direct conflict.

Memory Manager

Drill is a columnar system, with data for each column stored in a value vector. Each vector is backed by a direct memory allocation. While Java provides robust management of heap memory, each application must invent its own manager for direct memory. Drill's manager is based on that provided by Netty, since Drill uses Netty to send (and thus release) and receive (and thus allocate) value vectors.

Memory management is a very complex topic; this discussion does not attempt to repeat that story. Rather, we focus on a number of critical aspects.

Java provides the Unsafe class that, via devious means, and application can use to allocate blocks of memory directly from the OS. Java provides no management at all: the application is responsible for freeing the memory when no longer needed (unlike heap memory, which is automatically garbage collected.)

This writer does not know how Java handles memory management below the Unsafe level. Is memory obtained from a call to the C malloc routine? Does Java provide its own memory manager above malloc? Presumably, the Unsafe memory manager is inefficient because Netty layers its own memory manager on top of Unsafe.

The Netty manager is implemented in the io.netty.buffer.PooledByteBufAllocator class. This class, in Drill, provides a per-thread, lock-free memory allocator. Basically, each minor fragment has its own pooled allocator instance. The "pooled" in the name is the key: the allocator maintains a free list of direct memory "chunks". In Drill, each chunk is 16 MB in size.

Allocation proceeds as follows:

  • Round the allocation up to the next power of 2. (A 5K request, say, is rounded up to 8K.)
  • Scan the pool's free list for an available chunk.
  • If a free chunk is available, and has sufficient capacity, carve off the desired allocation.
  • If the request is 16MB in size, the request uses the entire chunk. If smaller, the request slices up a chunk to provide the allocation.
  • If no free chunks exist or the request is larger than 16MB, request the memory directly from Unsafe.

Freeing proceeds as follows:

  • If the request is 16MB or larger, release memory back to the pool in 16 MB chunks.
  • If the request is smaller than 16 MB, find the existing chunk in the memory pool and mark the used portion as now free.

Notice the asymmetry: any size request can be made. But, releasing memory carves it up into 16 MB chunks. Large requests come from Unsafe, but releases of those blocks go into the pooled allocator as 16 MB chunks.

Clone this wiki locally