-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add paddle/memory/README.md #2552
Changes from 2 commits
7cb68a8
0a92908
e732bdd
d3e2db4
8cfa48d
c617520
b55df90
ab2550c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,141 @@ | ||
In my mind, the memory package works like the following: | ||
|
||
## Design | ||
|
||
### Usage | ||
|
||
To allocate 4KB CPU memory: | ||
|
||
```cpp | ||
p = memory::Alloc(platform::CPUPlace(), 4*1024); | ||
``` | ||
|
||
To allocate 4KB memory on the 3rd GPU: | ||
|
||
```cpp | ||
p = memory::Alloc(platform::GPUPlace(2), 4*1024); | ||
``` | ||
|
||
To free memory and check the so-far used amount of memory on a place: | ||
|
||
```cpp | ||
auto pl = platform::GPUPlace(0); | ||
p = memory::Alloc(pl, 4*1024); | ||
cout << memory::Used(pl); | ||
memory::Free(pl, p); | ||
``` | ||
|
||
### The API | ||
|
||
In `paddle/memory/memory.h` we have: | ||
|
||
```cpp | ||
template <typeanme Place> void* Alloc(Place, size_t); | ||
template <typeanme Place> void Free(Place, void*); | ||
} | ||
``` | ||
|
||
These function templates have specializations on either `platform::CPUPlace` or `platform::GPUPlace`: | ||
|
||
```cpp | ||
template<> | ||
void Alloc<CPUPlace>(CPUPlace p, size_t size) { | ||
return GetCPUBuddyAllocator()->Alloc(size); | ||
} | ||
``` | ||
|
||
and | ||
|
||
```cpp | ||
template<> | ||
void Alloc(GPUPlace)(GPUPlace p, size_t size) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
return GetGPUBuddyAllocator(p.id)->Alloc(size); | ||
} | ||
``` | ||
|
||
### The Implementation | ||
|
||
`GetCPUBuddyAllocator` and `GetGPUBuddyAllocator` are singletions. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Singleton is great in here. |
||
|
||
```cpp | ||
BuddyAllocator* GetCPUBuddyAllocator() { | ||
static BuddyAllocator* a = NULL; | ||
if (a == NULL) { | ||
a = new BuddyAllocator(new CPUAllocator /*backup allocator*/, ...); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All member functions in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, I guess we do not need the member variables in here.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean, in this design, each There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, It's useful. I mean it's unnecessary to save it in GPUAllocator::used_. because its behavior always to change and fluctuate. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. I cannot tell how valuable |
||
} | ||
return a; | ||
} | ||
|
||
BuddyAllocator* GetGPUBuddyAllocator(int gpu_id) { | ||
static BuddyAllocator* as = NULL; | ||
if (as == NULL) { | ||
as = new BuddyAllocator*[platform::NumGPUs()]; | ||
for (int gpu = 0; gpu < platform::NumGPUs(); gpu++) { | ||
as[gpu] = new BuddyAllocator(new GPUAllocator(gpu) /* backup allocator */, ...); | ||
} | ||
} | ||
return as[gpu_id); | ||
``` | ||
|
||
#### `BuddyAllocator` | ||
|
||
`BuddyAllocator` implements the buddy allocation algorithm. Its constructor takes parameters only related with the algorithm: | ||
|
||
```cpp | ||
BuddyAllocator::BuddyAllocator(initial_pool_size, max_pool_size) { | ||
... | ||
} | ||
``` | ||
|
||
Please be aware that **`BuddyAllocator` always allocate aligned memory**, aligned on 32-bytes, which can hold a `BuddyAllocator::Block` object: | ||
|
||
```cpp | ||
class BuddyAllocator { | ||
private: | ||
struct Block { | ||
size_t size; | ||
Blobk* left, right; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Blobk -> Block. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
}; | ||
... | ||
}; | ||
``` | ||
|
||
#### System Allocators | ||
|
||
The `GPUAllocator` and `CPUAllocator` are calls *system allocators*. They hold information about the device, including the amount of memory has been allocated. So that we can call | ||
|
||
- `GPUAllocator::Used` and | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems that System Allocators are used when constructing a BuddyAllocator and they are private data member inside a auto* buddyAllocator = GetGPUBuddyAllocator(0);
buddyAllocator->SystemAllocator()->Used(); There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. or GetGPUAllocator(0)->Used(); There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the reminder. Let me explain more here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
- `CPUAllocator::Used` | ||
|
||
to get the amount of memory that has been allocated so far. | ||
|
||
|
||
## Why Such a Design | ||
|
||
I got inspiration from Majel and Caffe2, though above design look different from both. | ||
|
||
### Caffe2 | ||
|
||
In Caffe2, `Tensor<Context>::mutable_data()` allocates the memroy. In particular, [`Tensor<Context>::mutable_data`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L523) calls [`Tensor<Context>::raw_mutable_data`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L459), which in turn calls [`Context::New`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L479). | ||
|
||
There are two implementations of `Context`: | ||
|
||
1. [`CPUContext`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.h#L105), whose [`New` method](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.h#L131) calls [`g_cpu_allocator.get()->New(size_t)`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context.cc#L15) to allocate the memory. | ||
|
||
1. [`CUDAContext`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.h#L99), which has a data member [`int gpu_id_`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.h#L202). This looks very similar to class `majel::GPUPlace`, who also has an `int id_` data member. `CUDAContext::New(size_t)` calls [`g_cub_allocator->DeviceAllocate(&ptr, nbytes)`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/context_gpu.cu#L355) to allocate the memory. | ||
|
||
### Majel | ||
|
||
In Majel, there are basically two allocator types: | ||
|
||
1. `cpu::SystemAllocator`, which has similar functionality to `caffe2::CPUContext::New/Delete`. | ||
1. `gpu::SystemAllocator`, which has similar functionality to `caffe2::CUDAContext::New/Delete`. | ||
|
||
However, memory allocation is not via these two allocators. Instead, these two allocators are defined in hidden namespaces. | ||
|
||
In Majel there are hidden global variables like: | ||
|
||
1. `cpu::SystemAllocator g_cpu_allocator`, and | ||
1. `vector<gpu::SystemAllocator*> g_gpu_allocators(NUM_GPUS)`. | ||
|
||
Programs allocate memory via a BuddyAllocator, which can take the `g_cpu_allocator` or a `g_gpu_allocators[gpu_id]` as its *fallback allocator*, so that if BuddyAllocator cannot find a block in its memory pool, it extends its memory pool by calling the fallback allocator's `New(size_t)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typeame -> typename
Since Place is a variant, why not directly use the
typeid
to static analyzeplace
type during compilation?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we want the mismatched-type error get reported as early as possible. Err at compile time is earlier than runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it. Thanks.
But variant also supports throw the mismatched-type error during compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think if we regulate the way we use GPUPlace and CPUPlace, like here -- two and only two specializations of
Alloc<Place>
, we might not needtypedef boost::variant<GPUPlace, CPUPlace> Place
at all. And this might help us remove the dependency to boost.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great to think about this. The hard part to remove boost::variant is Dim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Agree. And thanks for reminding of boost::variant. Let's start by minimizing dependencies of each piece of our work.