Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align tensors with 64 bytes #102

Closed
edubart opened this issue Oct 9, 2017 · 4 comments
Closed

Align tensors with 64 bytes #102

edubart opened this issue Oct 9, 2017 · 4 comments

Comments

@edubart
Copy link
Contributor

edubart commented Oct 9, 2017

MKL, and maybe other future used libraries (NNPack, MKLDNN, ..) can work faster on aligned data (usually 64 byte aligned, see reference), also would make possible to create a version of strided iterator with vectorized instructions on aligned data, so would be good to have newly created tensors already aligned by default.

This could be done with a custom allocator or offseting tensor data on creation, maybe some restrictions could be done for small tensors to avoid excessive memory usage caused by aligning small tensors like a scalar tensor.

Reference:
https://software.intel.com/en-us/mkl-linux-developer-guide-coding-techniques

@edubart
Copy link
Contributor Author

edubart commented Oct 11, 2017

I'm gona note here that this can be already done inside the nim sources, see https://github.com/nim-lang/Nim/blob/1063085850d9d32e82302854cf3ac64049bf998f/lib/system/mmdisp.nim#L49
So this can be done without changes in arraymancer, or doing a custom allocator.

If we manually change MemAlign to 64 everything allocated by nim will be already aligned to 64, also as nim uses PageSize of 4096 is very probable that tensors with more than 4096 bytes at least in the startup is already aligned to 4096, hence aligned to 64 too. At the moment I don't know if new tensors created after garbage collection will continue to be aligned.

Maybe we can make a PR to make MemAlign be a define in Nim sources and then we compile arraymancer apps with something like -d:memalign=64.

@mratsim
Copy link
Owner

mratsim commented Nov 27, 2017

Following reference semantics (cd21f32) we can reuse BlasBufferArray structure (https://github.com/mratsim/Arraymancer/blob/b1cda0a6f5f0aefdb5302142374a58bf29a03727/src/tensor/fallback/blas_l3_gemm_data_structure.nim). We shouldn't even need a deepcopy function as clone is implemented with map_inline.

Also: I've changed MetadataArray to fit in a cache line (64 Bytes) in b1cda0a. The C struct needs attribute(aligned(64)) to make sure it's always loaded in a single cache read.

Upstream related: nim-lang/Nim#5315, nim-lang/Nim#1930, nim-lang/Nim#6696

@mratsim
Copy link
Owner

mratsim commented May 10, 2018

The destructors wiki has been updated with a section for custom allocators support, including alignment.

type
  Allocator* {.inheritable.} = ptr object
    alloc*: proc (a: Allocator; size: int; alignment = 8): pointer {.nimcall.}
    dealloc*: proc (a: Allocator; p: pointer; size: int) {.nimcall.}
    realloc*: proc (a: Allocator; p: pointer; oldSize, newSize: int): pointer {.nimcall.}

var
  currentAllocator {.threadvar.}: Allocator

proc getCurrentAllocator*(): Allocator =
  result = currentAllocator

proc setCurrentAllocator*(a: Allocator) =
  currentAllocator = a

proc alloc*(size: int): pointer =
  let a = getCurrentAllocator()
  result = a.alloc(a, size)

proc dealloc*(p: pointer; size: int) =
  let a = getCurrentAllocator()
  a.dealloc(a, size)

proc realloc*(p: pointer; oldSize, newSize: int): pointer =
  let a = getCurrentAllocator()
  result = a.realloc(a, oldSize, newSize)

@mratsim
Copy link
Owner

mratsim commented Oct 25, 2018

Done in Laser

@mratsim mratsim closed this as completed Oct 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants