From 7cb68a8d9315bd3c3c769e47ee3752867854ee12 Mon Sep 17 00:00:00 2001 From: Yi Wang Date: Wed, 21 Jun 2017 13:19:40 -0700 Subject: [PATCH 1/7] Add paddle/memory/README.md --- paddle/README.md | 141 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 141 insertions(+) create mode 100644 paddle/README.md diff --git a/paddle/README.md b/paddle/README.md new file mode 100644 index 0000000000000..24af37987e057 --- /dev/null +++ b/paddle/README.md @@ -0,0 +1,141 @@ +In my mind, the memory package works like the following: + +## Design + +### Usage + +To allocate 4KB CPU memory: + +```cpp +p = memory::Alloc(platform::CPUPlace(), 4*1024); +``` + +To allocate 4KB memory on the 3rd GPU: + +```cpp +p = memory::Alloc(platform::GPUPlace(2), 4*1024); +``` + +To free memory and check the so-far used amount of memory on a place: + +```cpp +auto pl = platform::GPUPlace(0); +p = memory::Alloc(pl, 4*1024); +cout << memory::Used(pl); +memory::Free(pl, p); +``` + +### The API + +In `paddle/memory/memory.h` we have: + +```cpp +template void* Alloc(Place, size_t); +template void Free(Place, void*); +} +``` + +These function templates have specializations on either `platform::CPUPlace` or `platform::GPUPlace`: + +```cpp +template<> +void Alloc(CPUPlace p, size_t size) { + return GetCPUBuddyAllocator()->Alloc(size); +} +``` + +and + +```cpp +template<> +void Alloc(GPUPlace)(GPUPlace p, size_t size) { + return GetGPUBuddyAllocator(p.id)->Alloc(size); +} +``` + +### The Implementation + +`GetCPUBuddyAllocator` and `GetGPUBuddyAllocator` are singletions. + +```cpp +BuddyAllocator* GetCPUBuddyAllocator() { + static BuddyAllocator* a = NULL; + if (a == NULL) { + a = new BuddyAllocator(new CPUAllocator /*backup allocator*/, ...); + } + return a; +} + +BuddyAllocator* GetGPUBuddyAllocator(int gpu_id) { + static BuddyAllocator* as = NULL; + if (as == NULL) { + as = new BuddyAllocator*[platform::NumGPUs()]; + for (int gpu = 0; gpu < platform::NumGPUs(); gpu++) { + as[gpu] = new BuddyAllocator(new GPUAllocator(gpu) /* backup allocator */, ...); + } + } + return as[gpu_id); +``` + +#### `BuddyAllocator` + +`BuddyAllocator` implements the buddy allocation algorithm. Its constructor takes parameters only related with the algorithm: + +```cpp +BuddyAllocator::BuddyAllocator(initial_pool_size, max_pool_size) { + ... +} +``` + +Please be aware that **`BuddyAllocator` always allocate aligned memory**, aligned on 32-bytes, which can hold a `BuddyAllocator::Block` object: + +```cpp +class BuddyAllocator { + private: + struct Block { + size_t size; + Blobk* left, right; + }; + ... +}; +``` + +#### System Allocators + +The `GPUAllocator` and `CPUAllocator` are calls *system allocators*. They hold information about the device, including the amount of memory has been allocated. So that we can call + +- `GPUAllocator::Used` and +- `CPUAllocator::Used` + +to get the amount of memory that has been allocated so far. + + +## Why Such a Design + +I got inspiration from Majel and Caffe2, though above design look different from both. + +### Caffe2 + +In Caffe2, `Tensor::mutable_data()` allocates the memroy. In particular, [`Tensor::mutable_data`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L523) calls [`Tensor::raw_mutable_data`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L459), which in turn calls [`Context::New`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L479). + +There are two implementations of `Context`: + +1. [`CPUContext`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.h#L105), whose [`New` method](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.h#L131) calls [`g_cpu_allocator.get()->New(size_t)`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.cc#L15) to allocate the memory. + +1. [`CUDAContext`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.h#L99), which has a data member [`int gpu_id_`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.h#L202). This looks very similar to class `majel::GPUPlace`, who also has an `int id_` data member. `CUDAContext::New(size_t)` calls [`g_cub_allocator->DeviceAllocate(&ptr, nbytes)`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.cu#L355) to allocate the memory. + +### Majel + +In Majel, there are basically two allocator types: + +1. `cpu::SystemAllocator`, which has similar functionality to `caffe2::CPUContext::New/Delete`. +1. `gpu::SystemAllocator`, which has similar functionality to `caffe2::CUDAContext::New/Delete`. + +However, memory allocation is not via these two allocators. Instead, these two allocators are defined in hidden namespaces. + +In Majel there are hidden global variables like: + +1. `cpu::SystemAllocator g_cpu_allocator`, and +1. `vector g_gpu_allocators(NUM_GPUS)`. + +Programs allocate memory via a BuddyAllocator, which can take the `g_cpu_allocator` or a `g_gpu_allocators[gpu_id]` as its *fallback allocator*, so that if BuddyAllocator cannot find a block in its memory pool, it extends its memory pool by calling the fallback allocator's `New(size_t)`. From 0a92908b5ea68daa040155a7088b7f520c16c51d Mon Sep 17 00:00:00 2001 From: Yi Wang Date: Wed, 21 Jun 2017 17:02:30 -0700 Subject: [PATCH 2/7] Has to auto format networks.py because CI complains about it. --- python/paddle/trainer_config_helpers/networks.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/python/paddle/trainer_config_helpers/networks.py b/python/paddle/trainer_config_helpers/networks.py index 1bf59ed4840ae..67154a8d7d366 100755 --- a/python/paddle/trainer_config_helpers/networks.py +++ b/python/paddle/trainer_config_helpers/networks.py @@ -1381,7 +1381,7 @@ def inputs(layers, *args): if len(args) != 0: layers.extend(args) - Inputs(*[l.name for l in layers]) + Inputs(* [l.name for l in layers]) def outputs(layers, *args): @@ -1424,7 +1424,7 @@ def __dfs_travel__(layer, assert len(layers) > 0 if HasInputsSet(): # input already set - Outputs(*[l.name for l in layers]) + Outputs(* [l.name for l in layers]) return # just return outputs. if len(layers) != 1: From d3e2db4b4f3efa537a2b85bb88d8d8f3e780f09c Mon Sep 17 00:00:00 2001 From: Yi Wang Date: Thu, 22 Jun 2017 08:12:10 -0700 Subject: [PATCH 3/7] Revert changes made by misleading errors from Travis CI --- python/paddle/trainer_config_helpers/networks.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/python/paddle/trainer_config_helpers/networks.py b/python/paddle/trainer_config_helpers/networks.py index 67154a8d7d366..1bf59ed4840ae 100755 --- a/python/paddle/trainer_config_helpers/networks.py +++ b/python/paddle/trainer_config_helpers/networks.py @@ -1381,7 +1381,7 @@ def inputs(layers, *args): if len(args) != 0: layers.extend(args) - Inputs(* [l.name for l in layers]) + Inputs(*[l.name for l in layers]) def outputs(layers, *args): @@ -1424,7 +1424,7 @@ def __dfs_travel__(layer, assert len(layers) > 0 if HasInputsSet(): # input already set - Outputs(* [l.name for l in layers]) + Outputs(*[l.name for l in layers]) return # just return outputs. if len(layers) != 1: From 8cfa48dc88c0c702b30094ca558bf2182e00faba Mon Sep 17 00:00:00 2001 From: Yi Wang Date: Thu, 22 Jun 2017 10:27:36 -0700 Subject: [PATCH 4/7] Move README.md from paddle/ to paddle/memory/ --- paddle/{ => memory}/README.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename paddle/{ => memory}/README.md (100%) diff --git a/paddle/README.md b/paddle/memory/README.md similarity index 100% rename from paddle/README.md rename to paddle/memory/README.md From c617520776c58791d77d1382eba67ac4264916f0 Mon Sep 17 00:00:00 2001 From: Yi Wang Date: Thu, 22 Jun 2017 10:35:52 -0700 Subject: [PATCH 5/7] In response to comments from Liao Gang and Yu Yang --- paddle/memory/README.md | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/paddle/memory/README.md b/paddle/memory/README.md index 24af37987e057..b71ca29696513 100644 --- a/paddle/memory/README.md +++ b/paddle/memory/README.md @@ -25,14 +25,16 @@ cout << memory::Used(pl); memory::Free(pl, p); ``` -### The API +### API In `paddle/memory/memory.h` we have: ```cpp -template void* Alloc(Place, size_t); -template void Free(Place, void*); -} +namespace memory { +template void* Alloc(Place, size_t); +template void Free(Place, void*); +template void Used(Place); +} // namespace memory ``` These function templates have specializations on either `platform::CPUPlace` or `platform::GPUPlace`: @@ -48,12 +50,14 @@ and ```cpp template<> -void Alloc(GPUPlace)(GPUPlace p, size_t size) { +void Alloc(GPUPlace p, size_t size) { return GetGPUBuddyAllocator(p.id)->Alloc(size); } ``` -### The Implementation +Similar specializations exist for `Free` and `Used`. + +### Implementation `GetCPUBuddyAllocator` and `GetGPUBuddyAllocator` are singletions. @@ -94,7 +98,7 @@ class BuddyAllocator { private: struct Block { size_t size; - Blobk* left, right; + Block* left, right; }; ... }; @@ -102,15 +106,15 @@ class BuddyAllocator { #### System Allocators -The `GPUAllocator` and `CPUAllocator` are calls *system allocators*. They hold information about the device, including the amount of memory has been allocated. So that we can call +The `GPUAllocator` and `CPUAllocator` are calls *system allocators*. They work as the fallback allocators of `BuddyAllocator`. A system allocator holds information about a device, including the amount of memory has been allocated, so we can call -- `GPUAllocator::Used` and -- `CPUAllocator::Used` +- `GPUAllocator::Used()` and +- `CPUAllocator::Used()` to get the amount of memory that has been allocated so far. -## Why Such a Design +## Justification I got inspiration from Majel and Caffe2, though above design look different from both. From b55df90dfdf6b9720548613885d291ae8769705b Mon Sep 17 00:00:00 2001 From: Yi Wang Date: Fri, 23 Jun 2017 11:42:48 -0700 Subject: [PATCH 6/7] Remove unnecessary preamble --- paddle/memory/README.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/paddle/memory/README.md b/paddle/memory/README.md index b71ca29696513..fd32d07ef40fc 100644 --- a/paddle/memory/README.md +++ b/paddle/memory/README.md @@ -1,5 +1,3 @@ -In my mind, the memory package works like the following: - ## Design ### Usage From ab2550c6400bce5d2596f5bff8629ef67ed195b8 Mon Sep 17 00:00:00 2001 From: Yi Wang Date: Sun, 25 Jun 2017 15:44:55 -0700 Subject: [PATCH 7/7] Update design --- paddle/memory/README.md | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/paddle/memory/README.md b/paddle/memory/README.md index fd32d07ef40fc..e5f7880e4cad3 100644 --- a/paddle/memory/README.md +++ b/paddle/memory/README.md @@ -31,7 +31,7 @@ In `paddle/memory/memory.h` we have: namespace memory { template void* Alloc(Place, size_t); template void Free(Place, void*); -template void Used(Place); +template size_t Used(Place); } // namespace memory ``` @@ -39,7 +39,7 @@ These function templates have specializations on either `platform::CPUPlace` or ```cpp template<> -void Alloc(CPUPlace p, size_t size) { +void* Alloc(CPUPlace p, size_t size) { return GetCPUBuddyAllocator()->Alloc(size); } ``` @@ -102,15 +102,11 @@ class BuddyAllocator { }; ``` -#### System Allocators - -The `GPUAllocator` and `CPUAllocator` are calls *system allocators*. They work as the fallback allocators of `BuddyAllocator`. A system allocator holds information about a device, including the amount of memory has been allocated, so we can call +Because BuddyAllocator has the meta-data of each block, it can trace the used memory -- record the amount returned by `Alloc` freed in `Free`. Instead, `CPUAllocator` and `GPUAllocator` doesn't know the size of freed memory block and cannot do the trace. -- `GPUAllocator::Used()` and -- `CPUAllocator::Used()` - -to get the amount of memory that has been allocated so far. +#### System Allocators +The `GPUAllocator` and `CPUAllocator` are calls *system allocators*. They work as the fallback allocators of `BuddyAllocator`. ## Justification