Align tensors with 64 bytes #102

edubart · 2017-10-09T12:04:31Z

MKL, and maybe other future used libraries (NNPack, MKLDNN, ..) can work faster on aligned data (usually 64 byte aligned, see reference), also would make possible to create a version of strided iterator with vectorized instructions on aligned data, so would be good to have newly created tensors already aligned by default.

This could be done with a custom allocator or offseting tensor data on creation, maybe some restrictions could be done for small tensors to avoid excessive memory usage caused by aligning small tensors like a scalar tensor.

Reference:
https://software.intel.com/en-us/mkl-linux-developer-guide-coding-techniques

edubart · 2017-10-11T03:02:26Z

I'm gona note here that this can be already done inside the nim sources, see https://github.com/nim-lang/Nim/blob/1063085850d9d32e82302854cf3ac64049bf998f/lib/system/mmdisp.nim#L49
So this can be done without changes in arraymancer, or doing a custom allocator.

If we manually change MemAlign to 64 everything allocated by nim will be already aligned to 64, also as nim uses PageSize of 4096 is very probable that tensors with more than 4096 bytes at least in the startup is already aligned to 4096, hence aligned to 64 too. At the moment I don't know if new tensors created after garbage collection will continue to be aligned.

Maybe we can make a PR to make MemAlign be a define in Nim sources and then we compile arraymancer apps with something like -d:memalign=64.

mratsim · 2017-11-27T20:30:53Z

Following reference semantics (cd21f32) we can reuse BlasBufferArray structure (https://github.com/mratsim/Arraymancer/blob/b1cda0a6f5f0aefdb5302142374a58bf29a03727/src/tensor/fallback/blas_l3_gemm_data_structure.nim). We shouldn't even need a deepcopy function as clone is implemented with map_inline.

Also: I've changed MetadataArray to fit in a cache line (64 Bytes) in b1cda0a. The C struct needs attribute(aligned(64)) to make sure it's always loaded in a single cache read.

Upstream related: nim-lang/Nim#5315, nim-lang/Nim#1930, nim-lang/Nim#6696

mratsim · 2018-05-10T11:15:26Z

The destructors wiki has been updated with a section for custom allocators support, including alignment.

type
  Allocator* {.inheritable.} = ptr object
    alloc*: proc (a: Allocator; size: int; alignment = 8): pointer {.nimcall.}
    dealloc*: proc (a: Allocator; p: pointer; size: int) {.nimcall.}
    realloc*: proc (a: Allocator; p: pointer; oldSize, newSize: int): pointer {.nimcall.}

var
  currentAllocator {.threadvar.}: Allocator

proc getCurrentAllocator*(): Allocator =
  result = currentAllocator

proc setCurrentAllocator*(a: Allocator) =
  currentAllocator = a

proc alloc*(size: int): pointer =
  let a = getCurrentAllocator()
  result = a.alloc(a, size)

proc dealloc*(p: pointer; size: int) =
  let a = getCurrentAllocator()
  a.dealloc(a, size)

proc realloc*(p: pointer; oldSize, newSize: int): pointer =
  let a = getCurrentAllocator()
  result = a.realloc(a, oldSize, newSize)

mratsim · 2018-10-25T10:16:08Z

Done in Laser

mratsim added the optimization label Oct 9, 2017

mratsim mentioned this issue Aug 25, 2018

design discussion 1: slicing should work same as in D's mir N-D tensor library (and same as numpy) #262

Closed

mratsim added the Laser label Oct 25, 2018

mratsim closed this as completed Oct 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align tensors with 64 bytes #102

Align tensors with 64 bytes #102

edubart commented Oct 9, 2017

edubart commented Oct 11, 2017

mratsim commented Nov 27, 2017

mratsim commented May 10, 2018

mratsim commented Oct 25, 2018

Align tensors with 64 bytes #102

Align tensors with 64 bytes #102

Comments

edubart commented Oct 9, 2017

edubart commented Oct 11, 2017

mratsim commented Nov 27, 2017

mratsim commented May 10, 2018

mratsim commented Oct 25, 2018