Skip to content

[Memory Management] GSoC 2024 Final Report

imbajin edited this page Nov 1, 2024 · 1 revision

Google Summer of Code' 24 Final Report

[HugeGraph] Support Memory Management Module

GSoC 2024 - Apache HugeGraph

Role: Open Source Contributor
Duration: 2024.05 - 2024.10
Project: Apache HugeGraph

Project Overview 🔍

When the JVM GC performs a large amount of garbage collection, the latency of the request is often high, and the response time becomes uncontrollable. To reduce request latency and response time jitter, the hugegraph-server graph query engine has already used off-heap memory in most OLTP algorithms.

However, at present, hugegraph cannot control memory based on a single request Query, so a Query may exhaust the memory of the entire process and cause OOM, or even cause the service to be unable to respond to other requests. Need to implement a memory management module based on a single Query.

Key goals include:

  • Implement a unified memory manager, independently manage JVM off-heap memory, and adapt the memory allocation methods of various native collections, so that the memory mainly used by the algorithm comes from the unified memory pool, and it is returned to the memory pool after release.
  • Each request corresponds to a unified memory manager, and the memory usage of a request can be controlled by counting the memory usage of a request.
  • Complete related unit tests UT and basic documentation.

Feature Implemented 🛠️

Feature1: Hierarchical Memory Allocation

Design:

image-20241030174412113

As shown in the figure above, we allocate different levels of MemoryPool (referred to as MemoryContext) for various execution components, corresponding to different layers of memory management:

  • Each Query corresponds to a QueryMemoryPool.
  • Each Task derived from a Query (representing each thread) has a corresponding TaskMemoryPool.
  • Each Operator utilized by a Task (such as iterators in HugeGraph) corresponds to an OperatorMemoryPool.
  • The MemoryManager oversees all MemoryPools.
  • Objects with similar lifespans are stored within the same MemoryContext to minimize cross-physical memory access and enable unified memory release.

This structure allows fine-grained tracking of memory usage for each query down to the operator level, enabling optimal control over global memory usage.

These three types of MemoryPools can form a MemoryPool Tree with MemoryManager as the root and OperatorPool as the leaf nodes:

  • Leaf nodes are the actual memory consumers (OperatorPool).
  • Higher-level TaskPool and QueryPool do not directly consume memory; instead, they act as aggregators, consolidating the usage of leaf nodes and managing their lifecycle.

MemoryPool Lifecycle:

  • QueryMemoryPool: Created at the start of each query (e.g., an HTTP query request). It is released, along with its allocated memory, at the query’s conclusion.
  • TaskMemoryPool: Created when a new thread task is submitted (e.g., Executor.submit(() -> doSomething())). It is released, along with its allocated memory, at the end of the thread task.
  • OperatorMemoryPool: Created whenever an iterator is instantiated. It is not explicitly released but is automatically released when its parent TaskMemoryPool is released.

Effect:

image-20241031205631529

Feature2: On-Demand Memory Allocation

Design

All memory allocation requests are initiated by the leaf node OperatorPool, which passes the usage upward in a bottom-up manner to the QueryMemoryPool.

  1. Memory is consumed centrally by the QueryMemoryPool from the MemoryManager.
  2. The QueryMemoryPool has a capacity limit, so if it lacks sufficient capacity, it requests more memory from the MemoryManager.
  3. Memory allocation follows a two-step process: logical allocation (requestMemory) and then physical allocation (allocate). Once the request process completes, the statistics in both the MemoryManager and Pool are updated.

If the QueryMemoryPool has enough memory, it returns the allocation result, which propagates top-down through each layer back to the leaf node.

  1. If the QueryMemoryPool lacks sufficient memory, it initiates a memory arbitration request with the root node MemoryManager.

  2. Each memory allocation uses a "reservation" strategy to reduce the frequency of requests by applying quantization:

    • If the requested size is less than 16MB, it rounds up to the nearest 1MB.
    • If the requested size is less than 64MB, it rounds up to the nearest 4MB.
    • If the requested size is 64MB or more, it rounds up to the nearest 8MB.

Each MemoryPool has a concept of capacity, representing the maximum capacity of each pool, the currently requested capacity, and the reserved capacity for all leaf nodes from the root pool.

  1. For instance, if a certain leaf pool requires additional memory, it allocates from the available capacity in the green section. If that capacity is insufficient, it expands rightward to request additional capacity. Should global memory still fall short, memory arbitration will be triggered.

The primary role of memory arbitration is to manage the available capacity of all queries and perform arbitration across queries.

  • It determines which query or operator's idle capacity should be returned to the corresponding query or operator.
  • Alternatively, it may employ spill techniques, offloading capacity and state data used by bi-operators and returning the freed capacity to the query.

A memory arbitration process is triggered by a memory allocation request:

  • When a memory allocation request reaches the QueryPool and finds insufficient memory, the MemoryManager initiates an arbitration process.
  • Essentially, memory arbitration is a memory recovery process, deciding which queries' memory to recover and how to perform the recovery.
  • If the memory recovery target is identified, the recovery request propagates top-down to the leaf node.
  1. Local Arbitration

    If OperatorA in QueryA triggers arbitration, the Local strategy only reclaims idle memory within other operators inside QueryA.

    For instance, if OperatorA under QueryA requires memory, all free memory in QueryA except that of OperatorA will be reclaimed and allocated to OperatorA.

  2. Global Arbitration

    If arbitration is triggered by QueryA, the local strategy attempts to reclaim idle memory from other queries.

    • The process follows the same pattern.
    • It chooses another query, say QueryB, to reclaim.
      • Selection strategy: Queries are sorted by the amount of free memory, with the one having the most free memory chosen for reclamation.
      • Default reclamation strategy: Only idle portions of memory are reclaimed.

Feature3: Concurrency Safety and OOM Handle Strategy

Design:

If an Operator requests memory allocation and, even after one round of memory arbitration, memory is still insufficient, an OOM (Out of Memory) exception will be thrown (custom-defined). This will terminate the current query task (self-abort) and release all allocated memory.

Requirement: For the same pool, memory arbitration, memory allocation, and memory release must all be serialized, with no two processes allowed to run concurrently.

  • During memory arbitration, neither allocation nor release of memory is permitted, as arbitration requires the memory state of the pool to remain unchanged. This means Query processes must be “paused” during arbitration (i.e., memory allocation and release are not allowed).
    • This "pause" does not need to be an actual pause; only the memory state needs to remain constant, not the execution state.
    • This is implemented through a flag: after arbitration is triggered, a flag is set.
    • If the pool subsequently attempts to allocate or release memory while the flag is set, it will enter a condition wait (achieving a “paused” state effect for the query task). Once arbitration is complete, a signal will be sent to resume operations.
  • Memory allocation shares a lock with memory arbitration to ensure that memory arbitration and memory allocation are serialized (i.e., non-concurrent).
  • Memory release in the pool must not be concurrent with any other processes (allocation or arbitration).
    • When releasing memory, a closed flag is set, preventing any concurrent actions with the release process.

Effect:

image-20241031210033733

Feature4: Off-Heap Transformation of Core Data Structures

Design:

Inspiration: By analyzing the dumped files, it was observed that over 90% of query memory is consumed by labels, properties, edges, and vertices. Memory usage in queries is concentrated within a small number of immutable classes with a large number of instances.

Content: Instead of moving the entire data structure off-heap, the objects in data structures remain on-heap, while certain memory-heavy, immutable member variables (such as label, which is essentially an int) are stored off-heap. The memory management framework then manages this portion of off-heap memory.

  • Off-heap management of these immutable member variables facilitates zero-copy read/write operations.

image-20241031210243918

  1. Identify all objects that are both [memory-intensive] and [immutable] and provide an API for direct access and reading from off-heap memory.
  • Object[]: The main memory usage comes from the HugeEdge list, so only the modification of HugeEdge needs to be addressed.
  • HugeVertex: Its main memory usage also comes from the HugeEdge list, so only the modification of HugeEdge is needed.
  • HugeEdge:
    • A portion of memory is consumed by id:
      • IdGenerator.Id: transformation completed.
    • Another portion comes from IntObjectMap, which is mutable and therefore cannot be modified.
  • label: Essentially an Id, so once Id transformation is completed, this is also handled.
  • HugeProperty: Essentially a value, which can be stored off-heap.

Implementation: PR Link

  • A Factory was implemented to replace all instances of new for the objects listed above.
  • The Factory interface is consistent with the original new syntax.
  • Within the Factory, depending on the user-defined memory_mode, it constructs either the original on-heap version of the object or the off-heap version.

Effect: image-20241030175040503

Future Work 💪🏻

  1. We have now completed an independent memory management framework and have successfully moved memory-intensive, immutable objects off-heap in the query path of Apache HugeGraph. Notably, this memory management framework is decoupled from the core of Apache HugeGraph, allowing it to function independently.

  2. Next, we need to integrate this memory management framework into the Apache HugeGraph query path. Specifically, QueryMemoryPool, TaskMemoryPool, and OperatorMemoryPool should be allocated at each stage of the query path. For memory allocation points, on-heap objects need to be replaced with off-heap objects.

  3. After adapting the Apache HugeGraph query path, the plan is to use the twitter-2010.txt dataset to test Apache HugeGraph’s query performance, including multi-hop queries, kout queries, etc. Memory usage comparisons will be conducted between the original on-heap mode (Baseline) and the off-heap modified version.

Acknowledgements 👏🏻

I would like to express my heartfelt gratitude to my mentors @Imba Jin and @Jermy for their invaluable support and guidance throughout my Google Summer of Code journey. Their expertise and dedication were essential in navigating the challenges of Apache HugeGraph. Their insights on best practices and deep understanding of the project were instrumental in helping me grow as a developer and contributor. Thanks for this incredible opportunity to learn and collaborate!