Memory CPU allocator #2596

wangkuiyi · 2017-06-26T00:55:57Z

This PR is a successor of #2552. Please review #2552 before this.

… memory_cpu_allocator

gangliao · 2017-06-26T02:12:26Z

cmake/generic.cmake

@@ -78,6 +78,10 @@
 #
 #   cc_test(example_test SRCS example_test.cc DEPS example glog gflags)

+if(WITH_GPU)
+  add_definitions(-DPADDLE_WITH_GPU)


Why we need a new flag PADDLE_WITH_GPU? I think it's duplicate

Do you mean PADDLE_ONLY_CPU? OK. switch to use it.

gangliao · 2017-06-26T02:17:42Z

paddle/memory/detail/CMakeLists.txt

@@ -0,0 +1,5 @@
+if(${WITH_GPU})


It's unnecessary to wrap nv_test in if(${WITH_GPU}), because the internal implementation of nv_test already equipped this.

nv_test was designed to handle *.cu files. But here it is a .cc file, because it doesn't contain CUDA code. However, here the source code calls cudaMallocHost defined in CUDA libraries, but we don't have a external/cuda.cmake.

I think a complete solution here should be a single line:

cc_library(cpu_allocator_test SRCS cpu_allocator_test.cc DEPS cuda cudart)

Actually, I tried and succeeded to add cmake/external/cuda.cmake that defines two CMake targets -- cuda and cudart, but the building of cpu_allocator_test target complains that it cannot find libpthread.

I will try this.

gangliao · 2017-06-26T02:23:26Z

paddle/memory/detail/cpu_allocator.h

+public:
+  void* Alloc(size_t size) {
+    void* p;
+    if (cudaMallocHost(&p, size) != cudaSuccess) {


Why use cudaMallocHost(&p, size) in here? It should be mlock. cudaMallocHost should be used in class GPUAllocator<true>. Because cudaMallocHost's p can be accessed directly by GPU device.

I think cudaMallocHost allocates CPU memory, other than GPU memory, so I used it here. Am I wrong?

According to the documentation, cudaMallocHost is more than malloc + mlock, it actually makes the CUDA driver tracks OS paging to make sure that cudaMemcpy works in an efficient way with the allocate memory block.

gangliao · 2017-06-26T02:59:59Z

paddle/memory/memory.h

+// the CUDA memory space and accessed by the device rapidly.  Don't
+// allocate too much staging memory; otherwise system performance will
+// degrade because the OS cannot find enough swap memory space.
+void* AllocStaging(CPUPlace, size_t);


Why we have to expose pinned memory interfaces? I don't think the developer could explicitly invoke them, that will be easily out of control.

I am just not sure how the pinned interface would be called, so I just exposed it.

What do you think -- how could we know if GPUAllocator::Alloc should return a malloc-ed block or a pinned block?

gangliao · 2017-06-26T15:34:04Z

paddle/memory/README.md

@@ -97,6 +97,7 @@ class BuddyAllocator {
  struct Block {
    size_t size;
    Block* left, right;
+    size_t index; // allocator id


different allocation has different malloc and free methods.

gangliao · 2017-06-26T15:34:52Z

paddle/memory/detail/cpu_allocator.h

 // between host and device.  Allocates too much would reduce the
 // amount of memory available to the system for paging.  So, by
 // default, we should use CPUAllocator<staging=false>.
 template <bool staging>
 class CPUAllocator {
 public:
  void* Alloc(size_t size);
-  void Free(void* p);
+  void Free(void* p, size_t size);


Need parameter size for unmlock

gangliao · 2017-06-26T16:19:20Z

paddle/memory/detail/gpu_allocator.h

+    return nullptr;
+  }
+
+  void Free(void* p, size_t size) {


the reason why we still keep size in here is that we need to count the allocated / released size of pinned memory for performance protection.

gangliao · 2017-06-26T16:22:10Z

paddle/memory/detail/gpu_allocator.h

+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License. */
+


This source file is not a final, just first draft. We need to add more features like remaining size, total size and so on.

wangkuiyi · 2017-06-26T18:09:11Z

@gangliao I just noticed that as long as we want to call paddle::platform::get_place, we'd have to have typedef boost::variant<CPUPlace, GPUPlace> Place; otherwise, we cannot give get_place a return type. This seems that we cannot remove the dependency to boost.

wangkuiyi · 2017-06-26T18:11:37Z

@gangliao @reyoung Another point I noticed when I was working on #2611 is that we need a deleter functor class for unique_ptr. Please find the detail here.

…em_allocator{.h,_test.cc}

… memory_cpu_allocator

… cpu_mem

wangkuiyi · 2017-06-27T02:29:03Z

#include <stdlib.h>
#include <assert.h>
#include <memory>
#include <iostream>

template <bool locking>
void* CPUAlloc(size_t size) {
  std::cout << "malloc" << std::endl;
  return std::malloc(size);
}

template <bool locking>
void CPUFree(void* p, size_t size, boo locking) {
  std::cout << "free" << std::endl;
  if (locking) {
    munlock(p, size);
  }
  std::free(p);
}


struct CPUDeleter {
  CPUDeleter(void* p, size_t size, bool locking) :
      p_(p), size_(size), locking_(locking) {}

  template <typename T>
  void operator()(T* p) {
    assert(static_cast<T*>(p_) == p);
    std::cout << "Deleter" << std::endl;
    CPUFree(p_, size_, locking_);
  }

  void* p_;
  size_t size_;
  bool locking_;
};

int main() {
  void* p = CPUAlloc<false>(1024); // GPUAllocator<false>::Alloc
  int* i = static_cast<int*>(p);
  std::unique_ptr<int, CPUDeleter> ptr(i, CPUDeleter(p, 1024, false));
  std::cout << ptr.get() << std::endl;
}

… memory_cpu_allocator

…u_mem

gangliao · 2017-06-27T17:36:53Z

paddle/memory/detail/system_allocator_test.cc

+  p = Allocator::Alloc(1024);
+
+  int* i = static_cast<int*>(p);
+  std::shared_ptr<int> ptr(i, [](int* p) { Allocator::Free(p, 1024); });


@wangkuiyi we can use this method to replace deleter

It is a good idea! Just would a lambda be too lengthy for the callers of Alloc?

Yeah, maybe.

But you can name it as follows:

auto deleter = [](int* p) { Allocator::Free(p, 1024); } int* i = static_cast<int*>(p); std::shared_ptr<int> ptr(i, deleter);

…u_mem

gangliao · 2017-06-28T02:58:05Z

@wangkuiyi Shall we merge this pull request? Starting a new one. Because its title is Memory CPU allocator, we already did this.

wangkuiyi · 2017-06-28T20:22:01Z

Sure. It's just that I am the ower of this PR so I cannot approve and merge it by myself. @gangliao

helinwang

LGTM.

wangkuiyi added 3 commits June 25, 2017 15:40

add paddle/memory/detail/cpu_allocator*

84d1c73

Merge branch 'develop' of https://github.com/paddlepaddle/paddle into…

67481ca

… memory_cpu_allocator

Pass cpu_allocator_test

db128c4

wangkuiyi requested review from reyoung and gangliao June 26, 2017 00:55

gangliao reviewed Jun 26, 2017

View reviewed changes

FIX: Pinned memory

ce938ae

gangliao reviewed Jun 26, 2017

View reviewed changes

Add gpu_allocator

ce70df8

gangliao reviewed Jun 26, 2017

View reviewed changes

Yi Wang and others added 4 commits June 26, 2017 15:27

Replace {cpu,gpu}_allocator.h and {cpu,gpu}_allocator_test.cc by syst…

e02859c

…em_allocator{.h,_test.cc}

Merge branch 'develop' of https://github.com/paddlepaddle/paddle into…

f7530e8

… memory_cpu_allocator

FIX: clang-format

6250d10

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

f329454

… cpu_mem

Yi Wang and others added 8 commits June 26, 2017 20:41

Add system_allocator

f149d18

Merge branch 'develop' of https://github.com/paddlepaddle/paddle into…

cd16192

… memory_cpu_allocator

Resolve conflict

e14e687

Merge remote-tracking branch 'wangkuiyi/memory_cpu_allocator' into cp…

09d9794

…u_mem

FIX: fix cmake type error

dd08d33

ENH: Add cuda.h in platform

dde0da9

FIX: fix memory.h/cc

29c7512

ENH: Add buddy allocator draft

b22dd12

TEST: Add test for system allocator and deleter

79373da

gangliao reviewed Jun 27, 2017

View reviewed changes

Yi Wang and others added 5 commits June 27, 2017 16:32

Make CPUAllocator and GPUAllocator subclasses of SystemAllocator

b8f5922

Add buddy_allocator.cc and system_allocator.cc

3e087f7

Merge remote-tracking branch 'wangkuiyi/memory_cpu_allocator' into cp…

55648b4

…u_mem

FIX: Pass CI

3e9aa7f

ENH: clang-format

9490d24

helinwang approved these changes Jun 28, 2017

View reviewed changes

wangkuiyi merged commit 2d840ea into PaddlePaddle:develop Jun 28, 2017

wangkuiyi deleted the memory_cpu_allocator branch June 28, 2017 21:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory CPU allocator #2596

Memory CPU allocator #2596

wangkuiyi commented Jun 26, 2017 •

edited

Loading

gangliao Jun 26, 2017

wangkuiyi Jun 26, 2017

gangliao Jun 26, 2017

wangkuiyi Jun 26, 2017

gangliao Jun 28, 2017

gangliao Jun 26, 2017 •

edited

Loading

wangkuiyi Jun 26, 2017

gangliao Jun 26, 2017

wangkuiyi Jun 26, 2017

gangliao Jun 26, 2017

gangliao Jun 26, 2017

gangliao Jun 26, 2017 •

edited

Loading

gangliao Jun 26, 2017

wangkuiyi commented Jun 26, 2017 •

edited

Loading

wangkuiyi commented Jun 26, 2017

wangkuiyi commented Jun 27, 2017

gangliao Jun 27, 2017

wangkuiyi Jun 27, 2017 •

edited

Loading

gangliao Jun 28, 2017

gangliao commented Jun 28, 2017

wangkuiyi commented Jun 28, 2017

helinwang left a comment

Memory CPU allocator #2596

Memory CPU allocator #2596

Conversation

wangkuiyi commented Jun 26, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gangliao Jun 26, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gangliao Jun 26, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangkuiyi commented Jun 26, 2017 • edited Loading

wangkuiyi commented Jun 26, 2017

wangkuiyi commented Jun 27, 2017

Choose a reason for hiding this comment

wangkuiyi Jun 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gangliao commented Jun 28, 2017

wangkuiyi commented Jun 28, 2017

helinwang left a comment

Choose a reason for hiding this comment

wangkuiyi commented Jun 26, 2017 •

edited

Loading

gangliao Jun 26, 2017 •

edited

Loading

gangliao Jun 26, 2017 •

edited

Loading

wangkuiyi commented Jun 26, 2017 •

edited

Loading

wangkuiyi Jun 27, 2017 •

edited

Loading