-
-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Align tensors with 64 bytes #102
Comments
I'm gona note here that this can be already done inside the nim sources, see https://github.com/nim-lang/Nim/blob/1063085850d9d32e82302854cf3ac64049bf998f/lib/system/mmdisp.nim#L49 If we manually change Maybe we can make a PR to make MemAlign be a define in Nim sources and then we compile arraymancer apps with something like |
Following reference semantics (cd21f32) we can reuse BlasBufferArray structure (https://github.com/mratsim/Arraymancer/blob/b1cda0a6f5f0aefdb5302142374a58bf29a03727/src/tensor/fallback/blas_l3_gemm_data_structure.nim). We shouldn't even need a deepcopy function as Also: I've changed MetadataArray to fit in a cache line (64 Bytes) in b1cda0a. The C struct needs attribute(aligned(64)) to make sure it's always loaded in a single cache read. Upstream related: nim-lang/Nim#5315, nim-lang/Nim#1930, nim-lang/Nim#6696 |
The destructors wiki has been updated with a section for custom allocators support, including alignment. type
Allocator* {.inheritable.} = ptr object
alloc*: proc (a: Allocator; size: int; alignment = 8): pointer {.nimcall.}
dealloc*: proc (a: Allocator; p: pointer; size: int) {.nimcall.}
realloc*: proc (a: Allocator; p: pointer; oldSize, newSize: int): pointer {.nimcall.}
var
currentAllocator {.threadvar.}: Allocator
proc getCurrentAllocator*(): Allocator =
result = currentAllocator
proc setCurrentAllocator*(a: Allocator) =
currentAllocator = a
proc alloc*(size: int): pointer =
let a = getCurrentAllocator()
result = a.alloc(a, size)
proc dealloc*(p: pointer; size: int) =
let a = getCurrentAllocator()
a.dealloc(a, size)
proc realloc*(p: pointer; oldSize, newSize: int): pointer =
let a = getCurrentAllocator()
result = a.realloc(a, oldSize, newSize) |
Done in Laser |
MKL, and maybe other future used libraries (NNPack, MKLDNN, ..) can work faster on aligned data (usually 64 byte aligned, see reference), also would make possible to create a version of strided iterator with vectorized instructions on aligned data, so would be good to have newly created tensors already aligned by default.
This could be done with a custom allocator or offseting tensor data on creation, maybe some restrictions could be done for small tensors to avoid excessive memory usage caused by aligning small tensors like a scalar tensor.
Reference:
https://software.intel.com/en-us/mkl-linux-developer-guide-coding-techniques
The text was updated successfully, but these errors were encountered: