Velox memory consumption #9008
Replies: 6 comments 13 replies
-
@spershin @xiaoxmeng @oerling @mbasmanova @pedroerp @czentgr @majetideepak @yingsu00 |
Beta Was this translation helpful? Give feedback.
-
Folks, I had a chat with Orri yesterday about this. Also, when initializing the memory, we have two parameters: memory for memory map and memory for arbitrator. We also have indication of memory being leaked, in terms of mapped memory not being released. @bikramSingh91 would work on these leaks after he is back from PTO. |
Beta Was this translation helpful? Give feedback.
-
@aditi-pandit, per our offline discussion, we could leave headroom for non-velox controlled memory usage through system-gb-memory config which is our practice in Meta as @spershin mentioned above. We might also need to investigate the non-trivial parquet reader memory allocated through std::malloc and change to allocate from velox memory pool which also provide STL allocator compatible interface. Velox doesn't have capacity enforcement for async data cache and the actual velox memory capacity is enforced by the memory allocator. The memory allocator dynamic adjusts the cache memory usage per query memory demand (see velox memory doc for details). We have built memory pushback mechanism in Prestissimo to shrink the cache when server is under memory pressure. Since the actual server memory usage detection is platform specific, the pushback component is not in OSS. But we could consider to move the control logic like the connection with async data cache to OSS and different setups can customize the server memory pressure detection logic cc @bikramSingh91 @tanjialiang . I am not sure if it is good idea to detect the server memory condition inside the async data cache. It is better to keep them separate. |
Beta Was this translation helpful? Give feedback.
-
Thanks @spershin and @xiaoxmeng. Some follow up:
|
Beta Was this translation helpful? Give feedback.
-
@xiaoxmeng : We had also spoken about the assumptions around "fair" memory use between queries during arbitration. You said that Velox expects users to over-provision per-query memory and that all queries are around similar shape, so picking up the biggest memory consumer is most efficient. Please can you elaborate on this if I missed something. We might need to refine this assumption for use at IBM. |
Beta Was this translation helpful? Give feedback.
-
WHen running multiple tests I've noticed that the MmapAllocator will eventually unadvise the mapped pages if no allocation has taken place for a while (hours). I looked into the code to find out what the parameters for this are. But neither in the Prestissimo code (which creates the memory manager and has periodic tasks it performs) nor the velox code I could find what is unadvising the pages. E.g. we might start out with
after finishing a query. But after a long time of running, the mapped pages are eventually unadvised to 0 again (I didn't collect the evidence but saw it on the console). We talked about forcing the Allocator to unadvise all pages after finishing a query. I looked into the code but there is no wrapper for On the overall investigation.
Tried Q1 subquery that only does tablescan as we suspected the ParquetReader to be involved. Next steps:
|
Beta Was this translation helpful? Give feedback.
-
Credits : @czentgr
Velox Memory issue summarizes particular Prestissimo behavior we observe at IBM.
The issue happens when running consecutive TPC-DS 10 K queries on a cluster with 4 workers of the following configuration:
The worker was OOM killed after 3 consecutive queries (though each run one at a time).
As described in the presentation:
The profile of memory after Q2 looks like:
After Q2 there was a sudden spike of memory consumption of almost 850 MB. This spike is not explainable. The symptoms point to a memory leak. Has anyone else observed such scenarios ?
But beyond this specific situation, general observations about Velox memory consumptions raise some questions:
3rd party libraries like proxygen allocate memory that can't be controlled. Meta throttled exchanges for this. At IBM, we observed the thrift library in Parquet reader to allocate memory as well. In general, operators (new connectors, file/table format readers) are likely to allocate memory. Can Velox leave some head-room for these ?
The AsynDataCache has greedy behavior and consumes all
system-gb-memory
. This memory is always in use even if the system is idle for a while. This comes by as strange. It seems more natural for cache entries to expire and leave space for new queries to consume. Meta observes external memory use and shrinks cache periodically. But could this be done by the AsyncDataCache more organically ?The AsyncDataCache using the entire
system-gb-memory
makes it seem like Velox requires much more memory than Java presto. Many times folks comment that Prestissimo needed 2x of JVM requirements for the same query. IBM uses beefy machines so this makes memory use look very bloated. Can the AsyncDataCache be bound to a limit ?Would be great to hear more thoughts on this.
Beta Was this translation helpful? Give feedback.
All reactions