-
Notifications
You must be signed in to change notification settings - Fork 8
A fast and highly scalable GPU dynamic memory allocator
License
Unknown, GPL-2.0 licenses found
Licenses found
Unknown
LICENSE.txt
GPL-2.0
COPYING
canonizer/halloc
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Halloc GPU memory allocator, version 0.11 INTRO Halloc is a high-throughput malloc/free-style dynamic memory allocator for NVidia Kepler GPUs. It is based on using bit arrays to represent free blocks and using a hash function to quickly search for free blocks. This idea, combined with clever slab management and performance tuning, enables a really fast allocator. Halloc achieves more than 1.5 bln. mallocs/s (more than 1 bln. malloc/free pairs/s) on K20X and 16-byte allocations, with tens of thousands of GPU threads and more than 100MiB allocated. This is much higher than other state-of-the-art GPU allocators. In addition, halloc's performance is also more stable. This makes halloc suitable for use in GPGPU applications requiring fast dynamic memory management. Halloc is mainly designed for small allocation sizes, and delegates allocations larger than 3KiB to CUDA allocator. REQUIREMENTS Software: CUDA 5.0 or higher (tested with 6.5) Hardware: Compute Capability 2.0 or higher (tested on CC 3.5 devices K20X and K40). Note: libraries and tests are currently not compiled for compute_50/sm_50, i.e. Maxwell. COMPILING To compile halloc library, type (in project's top directory): make To run correctness tests (CAUTION: takes a lot of time!): make test To build correctness tests without running them: make build-corr To build performance tests without running them: make build-perf Performance tests are then located in ./tst/perf/bin directory, and can be invoked individually, e.g. ./tst/perf/bin/throughput ./tst/perf/bin/phase-throughput -f0.95 -F0.05 -e0.91 -g5 -t128 To install, edit PREFIX variable in the makefile to your desired install directory (default ~/usr) and type: make install To uninstall: make uninstall USING HALLOC See samples/ directory for samples using Halloc. Compiling Your Program The GPU application then needs to be compiled with halloc static library using separate device compilation and linking. Assuming that the variable $PREFIX contains the installation prefix, and myprog.cu is the file being compiled, this can be done as follows: nvcc -arch=sm_35 -O3 -I $(PREFIX)/include -dc myprog.cu -o myprog.o nvcc -arch=sm_35 -O3 -L $(PREFIX)/lib -lhalloc -o myprog myprog.o Halloc API The functions defined by halloc are in the halloc.h file, which needs to be included into your code to use halloc: #include <halloc.h> Before using halloc, in device functions it has to be initialized with ha_init() function: void ha_init(halloc_opts_t opts = halloc_opts_t()); It can be given a full halloc_opts_t structure to control fine halloc parameters, such as slab size or fraction of used chunks at which the slab is considered "busy". It can also be called just with specifying amount of memory to allocate, or completely without any parameter list to preserve defaults: ha_init(512 * 1024 * 1024); // pass memory to allocate ha_init(); // use default amount of memory Halloc defines two functions, hamalloc to allocate and hafree to free memory (malloc and free are used by CUDA allocator, therefore halloc has to use other names). These functions can only be called from device code. void *hamalloc(size_t nbytes); void hafree(void *p); Otherwise, these functions have pretty much the same behavior as standard C malloc/free, e.g.: // allocate an array int *p = (int *)hamalloc(8 * sizeof(int)); p[0] = 0; p[1] = threadIdx.x; p[2] = 2; // ... // free the array hafree(p); // allocate a list typedef struct list_ { int element; struct list_ *next; } list; // ... list *l = (list *)hamalloc(sizeof(list)); l->element = 1; l->next = (list *)hamalloc(sizeof(list)); l->next->element = 2; l->next->next = NULL; The functions can be used in pretty much the same way as in C programs. hamalloc accepts the number of bytes to allocate, and returns the pointer to allocated memory, or NULL if memory cannot be allocated. Similarly, hafree accepts either a pointer returned by hamalloc or NULL, and frees the memory previously allocated. Naturally, hamalloc and hafree are thread-safe, and can be called simultaneously by threads of the same or different kernels. hamalloc allocations persist across kernel invocations, and can be used in other kernel calls. Pointers allocated by hamalloc can only be freed by hafree; they cannot be deallocated, e.g., by host/device cudaFree/free. ha_shutdown() is intended to free resources used by halloc, but is currently a no-op. LIMITATIONS There is currently no way to change parameters or allocate more memory after halloc has been initialized. BUGS Though correctness tests pass successfully, this provies nothing, of course. Some bugs are most likely there ;)
About
A fast and highly scalable GPU dynamic memory allocator
Resources
License
Unknown, GPL-2.0 licenses found
Licenses found
Unknown
LICENSE.txt
GPL-2.0
COPYING
Stars
Watchers
Forks
Packages 0
No packages published