Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory CPU allocator #2596

Merged
merged 23 commits into from
Jun 28, 2017
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
84d1c73
add paddle/memory/detail/cpu_allocator*
wangkuiyi Jun 25, 2017
67481ca
Merge branch 'develop' of https://github.com/paddlepaddle/paddle into…
wangkuiyi Jun 25, 2017
db128c4
Pass cpu_allocator_test
wangkuiyi Jun 26, 2017
ce938ae
FIX: Pinned memory
gangliao Jun 26, 2017
ce70df8
Add gpu_allocator
gangliao Jun 26, 2017
e02859c
Replace {cpu,gpu}_allocator.h and {cpu,gpu}_allocator_test.cc by syst…
Jun 26, 2017
f7530e8
Merge branch 'develop' of https://github.com/paddlepaddle/paddle into…
Jun 27, 2017
6250d10
FIX: clang-format
gangliao Jun 27, 2017
f329454
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
gangliao Jun 27, 2017
f149d18
Add system_allocator
Jun 27, 2017
cd16192
Merge branch 'develop' of https://github.com/paddlepaddle/paddle into…
Jun 27, 2017
e14e687
Resolve conflict
Jun 27, 2017
09d9794
Merge remote-tracking branch 'wangkuiyi/memory_cpu_allocator' into cp…
gangliao Jun 27, 2017
dd08d33
FIX: fix cmake type error
gangliao Jun 27, 2017
dde0da9
ENH: Add cuda.h in platform
gangliao Jun 27, 2017
29c7512
FIX: fix memory.h/cc
gangliao Jun 27, 2017
b22dd12
ENH: Add buddy allocator draft
gangliao Jun 27, 2017
79373da
TEST: Add test for system allocator and deleter
gangliao Jun 27, 2017
b8f5922
Make CPUAllocator and GPUAllocator subclasses of SystemAllocator
Jun 27, 2017
3e087f7
Add buddy_allocator.cc and system_allocator.cc
Jun 27, 2017
55648b4
Merge remote-tracking branch 'wangkuiyi/memory_cpu_allocator' into cp…
gangliao Jun 28, 2017
3e9aa7f
FIX: Pass CI
gangliao Jun 28, 2017
9490d24
ENH: clang-format
gangliao Jun 28, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ if(NOT CMAKE_CROSSCOMPILING)
endif(NOT CMAKE_CROSSCOMPILING)
find_package(Git REQUIRED)
find_package(Threads REQUIRED)
find_package(Boost QUIET)

include(simd)

Expand Down Expand Up @@ -109,6 +110,7 @@ include_directories("${PROJ_ROOT}")
include_directories("${PROJ_ROOT}/paddle/cuda/include")
include_directories("${CMAKE_CURRENT_BINARY_DIR}/proto")
include_directories("${CMAKE_CURRENT_BINARY_DIR}/go/pserver/cclient")
include_directories(${Boost_INCLUDE_DIRS})

set(EXTERNAL_LIBS
${GFLAGS_LIBRARIES}
Expand Down
4 changes: 4 additions & 0 deletions cmake/generic.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,10 @@
#
# cc_test(example_test SRCS example_test.cc DEPS example glog gflags)

if(WITH_GPU)
add_definitions(-DPADDLE_WITH_GPU)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need a new flag PADDLE_WITH_GPU? I think it's duplicate

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean PADDLE_ONLY_CPU? OK. switch to use it.

endif()

if(NOT APPLE)
find_package(Threads REQUIRED)
link_libraries(${CMAKE_THREAD_LIBS_INIT})
Expand Down
9 changes: 1 addition & 8 deletions paddle/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,8 @@ add_subdirectory(scripts)
add_subdirectory(optimizer)
add_subdirectory(strings)

# Do not build go directory until go cmake is working smoothly.
# if(CMAKE_Go_COMPILER)
# add_subdirectory(go)
# endif()

find_package(Boost QUIET)

if(Boost_FOUND)
include_directories(${Boost_INCLUDE_DIRS})
add_subdirectory(memory)
add_subdirectory(platform)
add_subdirectory(framework)
endif()
Expand Down
5 changes: 5 additions & 0 deletions paddle/memory/.clang-format
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
Language: Cpp
BasedOnStyle: Google
Standard: Cpp11
...
7 changes: 7 additions & 0 deletions paddle/memory/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
add_subdirectory(detail)

if(${WITH_GPU})
nv_library(memory SRCS memory.cc)
else(${WITH_GPU})
cc_library(memory SRCS memory.cc)
endif(${WITH_GPU})
1 change: 1 addition & 0 deletions paddle/memory/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ class BuddyAllocator {
struct Block {
size_t size;
Block* left, right;
size_t index; // allocator id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

different allocation has different malloc and free methods.

};
...
};
Expand Down
5 changes: 5 additions & 0 deletions paddle/memory/detail/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
if(${WITH_GPU})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unnecessary to wrap nv_test in if(${WITH_GPU}), because the internal implementation of nv_test already equipped this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nv_test was designed to handle *.cu files. But here it is a .cc file, because it doesn't contain CUDA code. However, here the source code calls cudaMallocHost defined in CUDA libraries, but we don't have a external/cuda.cmake.

I think a complete solution here should be a single line:

cc_library(cpu_allocator_test SRCS cpu_allocator_test.cc DEPS cuda cudart)

Actually, I tried and succeeded to add cmake/external/cuda.cmake that defines two CMake targets -- cuda and cudart, but the building of cpu_allocator_test target complains that it cannot find libpthread.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try this.

nv_test(system_allocator_test SRCS system_allocator_test.cc DEPS gflags glog)
else(${WITH_GPU})
cc_test(system_allocator_test SRCS system_allocator_test.cc DEPS gflags glog)
endif(${WITH_GPU})
79 changes: 79 additions & 0 deletions paddle/memory/detail/buddy_allocator.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */

#pragma once

#include "paddle/memory/detail/system_allocator.h"

namespace paddle {
namespace memory {
namespace detail {

template<typename Allocator>
class BuddyAllocator {
public:
// TODO(gangliao): This is a draft, add Buddy Allocator Algorithm soon
BuddyAllocator() {}
~BuddyAllocator() {}

public:
void* Alloc(size_t size) {
return Allocator::Alloc(size);
}
void Free(void*) {
// Because all info like size are stored in meta data,
// thus it's duplicate if add the parameter `size` in
// `Free(void*)` interface.
}
size_t Used();

public:
BuddyAllocator(const BuddyAllocator&) = delete;
BuddyAllocator& operator=(const BuddyAllocator&) = delete;

private:
size_t min_alloc_size_;
size_t max_alloc_size_;

private:
std::mutex mutex_;
};

BuddyAllocator<CPUAllocator>* GetCPUBuddyAllocator() {
static BuddyAllocator<CPUAllocator>* a = nullptr;
if (a == nullptr) {
a = new BuddyAllocator<CPUAllocator>();
}
return a;
}

#ifndef PADDLE_ONLY_CPU // The following code are for CUDA.

BuddyAllocator<GPUAllocator>* GetGPUBuddyAllocator(int gpu_id) {
static BuddyAllocator<GPUAllocator>** as = NULL;
if (as == NULL) {
int gpu_num = platform::GetDeviceCount();
as = new BuddyAllocator<GPUAllocator>*[gpu_num];
for (int gpu = 0; gpu < gpu_num; gpu++) {
as[gpu] = new BuddyAllocator<GPUAllocator>();
}
}
return as[gpu_id];
}

#endif // PADDLE_ONLY_CPU

} // namespace detail
} // namespace memory
} // namespace paddle
91 changes: 91 additions & 0 deletions paddle/memory/detail/system_allocator.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */

#pragma once

#include <stddef.h> // for size_t
#include <sys/mman.h> // for mlock and munlock
#include <cstdlib> // for malloc and free

#include <gflags/gflags.h>
#include "paddle/platform/assert.h"
#include "paddle/platform/cuda.h"

DEFINE_bool(uses_pinned_memory, false,
"If set, allocate cpu/gpu pinned memory.");

namespace paddle {
namespace memory {
namespace detail {

// If uses_pinned_memory is true, CPUAllocator calls mlock, which
// returns pinned and locked memory as staging areas for data exchange
// between host and device. Allocates too much would reduce the amount
// of memory available to the system for paging. So, by default, we
// should set false to uses_pinned_memory.
class CPUAllocator {
public:
static void* Alloc(size_t size) {
void* p = std::malloc(size);
if (p != nullptr && FLAGS_uses_pinned_memory) {
mlock(p, size);
}
return p;
}

static void Free(void* p, size_t size) {
if (p != nullptr && FLAGS_uses_pinned_memory) {
munlock(p, size);
}
std::free(p);
}
};

#ifndef PADDLE_ONLY_CPU // The following code are for CUDA.

// GPUAllocator<staging=true> calls cudaHostMalloc, which returns
// pinned and locked memory as staging areas for data exchange
// between host and device. Allocates too much would reduce the
// amount of memory available to the system for paging. So, by
// default, we should use GPUAllocator<staging=false>.
class GPUAllocator {
public:
static void* Alloc(size_t size) {
void* p = 0;
cudaError_t result = FLAGS_uses_pinned_memory ? cudaMallocHost(&p, size)
: cudaMalloc(&p, size);
if (result != cudaSuccess) {
cudaGetLastError(); // clear error if there is any.
}
return result == cudaSuccess ? p : nullptr;
}

static void Free(void* p, size_t size) {
// Purposefully allow cudaErrorCudartUnloading, because
// that is returned if you ever call cudaFree after the
// driver has already shutdown. This happens only if the
// process is terminating, in which case we don't care if
// cudaFree succeeds.
cudaError_t err = FLAGS_uses_pinned_memory ? cudaFreeHost(p) : cudaFree(p);
if (err != cudaErrorCudartUnloading) {
platform::throw_on_error(err, "cudaFree{Host} failed");
}
}
};

#endif // PADDLE_ONLY_CPU

} // namespace detail
} // namespace memory
} // namespace paddle
60 changes: 60 additions & 0 deletions paddle/memory/detail/system_allocator_test.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */

#include "paddle/memory/detail/system_allocator.h"

#include <memory>
#include <vector>

#include "glog/logging.h"
#include "gtest/gtest.h"

template <typename Allocator>
void TestAllocator(void* p) {
p = Allocator::Alloc(1024);

int* i = static_cast<int*>(p);
std::shared_ptr<int> ptr(i, [](int* p) { Allocator::Free(p, 1024); });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangkuiyi we can use this method to replace deleter

Copy link
Collaborator Author

@wangkuiyi wangkuiyi Jun 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good idea! Just would a lambda be too lengthy for the callers of Alloc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe.

But you can name it as follows:

auto deleter =  [](int* p) { Allocator::Free(p, 1024); }

int* i = static_cast<int*>(p);
std::shared_ptr<int> ptr(i, deleter);


EXPECT_NE(p, nullptr);
}

TEST(CPUAllocator, NoLockMem) {
void* p = nullptr;
FLAGS_uses_pinned_memory = false;
TestAllocator<paddle::memory::detail::CPUAllocator>(p);
EXPECT_EQ(p, nullptr);
}

TEST(CPUAllocator, LockMem) {
void* p = nullptr;
FLAGS_uses_pinned_memory = true;
TestAllocator<paddle::memory::detail::CPUAllocator>(p);
EXPECT_EQ(p, nullptr);
}

#ifndef PADDLE_ONLY_CPU
TEST(GPUAllocator, NoStaging) {
void* p = nullptr;
FLAGS_uses_pinned_memory = false;
TestAllocator<paddle::memory::detail::GPUAllocator>(p);
EXPECT_EQ(p, nullptr);
}
TEST(GPUAllocator, Staging) {
void* p = nullptr;
FLAGS_uses_pinned_memory = true;
TestAllocator<paddle::memory::detail::GPUAllocator>(p);
EXPECT_EQ(p, nullptr);
}
#endif // PADDLE_ONLY_CPU
59 changes: 59 additions & 0 deletions paddle/memory/memory.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */

#include "paddle/memory/memory.h"
#include "paddle/memory/detail/buddy_allocator.h"
#include "paddle/memory/detail/system_allocator.h"
#include "paddle/platform/assert.h"

#include <boost/variant.hpp>

namespace paddle {
namespace memory {

void* Alloc(platform::Place pl, size_t size) {
#ifndef PADDLE_ONLY_CPU
if (paddle::platform::is_gpu_place(pl)) {
size_t gpu_id = boost::get<platform::GPUPlace>(pl).device;
return detail::GetGPUBuddyAllocator(gpu_id)->Alloc(size);
}
#endif // PADDLE_ONLY_CPU
PADDLE_ASSERT(paddle::platform::is_cpu_place(pl));
return detail::GetCPUBuddyAllocator()->Alloc(size);
}

void Free(paddle::platform::Place pl, void* p) {
#ifndef PADDLE_ONLY_CPU
if (paddle::platform::is_gpu_place(pl)) {
size_t gpu_id = boost::get<platform::GPUPlace>(pl).device;
detail::GetGPUBuddyAllocator(gpu_id)->Free(p);
}
#endif // PADDLE_ONLY_CPU
PADDLE_ASSERT(paddle::platform::is_cpu_place(pl));
detail::GetCPUBuddyAllocator()->Free(p);
}

size_t Used(paddle::platform::Place pl) {
#ifndef PADDLE_ONLY_CPU
if (paddle::platform::is_gpu_place(pl)) {
size_t gpu_id = boost::get<platform::GPUPlace>(pl).device;
return detail::GetGPUBuddyAllocator(gpu_id)->Used();
}
#endif // PADDLE_ONLY_CPU
PADDLE_ASSERT(paddle::platform::is_cpu_place(pl));
return detail::GetCPUBuddyAllocator()->Used();
}

} // namespace memory
} // namespace paddle
27 changes: 27 additions & 0 deletions paddle/memory/memory.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */

#pragma once

#include "paddle/platform/place.h"

namespace paddle {
namespace memory {

void* Alloc(paddle::platform::Place, size_t);
void Free(paddle::platform::Place, void*);
size_t Used(paddle::platform::Place);

} // namespace memory
} // namespace paddle
Loading