Skip to content

Latest commit

 

History

History
172 lines (133 loc) · 5.03 KB

memory-alignment.asciidoc

File metadata and controls

172 lines (133 loc) · 5.03 KB

Memory Alignment

Table of Contents

Reading and writing memory does not happen on a byte-per-byte basis. Instead, they are read or written in small chunks called memory words. These words are generally 4 bytes long on 32 bit systems and 8 bytes long on 64 bit systems. This text assumes a 4 byte word size.

+---------------+---------------+---------------+---------------+
|     WORD 0    |     WORD 1    |     WORD 2    |     WORD 3    |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

Say you need to read in a 32-bit integer x from RAM. Assume its address &x is 8.

+---------------+---------------+---------------+---------------+
|     WORD 0    |     WORD 1    |     WORD 2    |     WORD 3    |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|                               |       x       |               |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

Reading this x is straightforward: word 2 is read in one go. Consider, however, the case where &x is 6.

+---------------+---------------+---------------+---------------+
|     WORD 0    |     WORD 1    |     WORD 2    |     WORD 3    |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|                       |       x       |                       |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

In order to read x, two memory accesses are necessary:

  1. Word 1 is read.

  2. Word 2 is read.

  3. The redundant bits are cut off.

  4. The remaining bits are shifted around so as to form x.

In this case, reading x is twice as slow, if not more. To avoid this unnecessary speed penalty, 4 byte integers should have an address that is divisible by 4. This way, we are certain they fit inside a single word and a single memory access will suffice.

These rule is applied by optimizing compilers in a more generalized form:

  • 2 byte values are stored on addresses divisible by 2.

  • 4 byte values are stored on addresses divisible by 4.

  • 8 byte values are stored on addresses divisible by 8.

When the address of a N-byte value is stored on an address divisible by N, we way it is memory aligned. Unaligned memory access need to be avoided.

When defining a struct, the compiler respects the order in which you declared the fields, i.e., it will not move them around. However, the compiler will by default inject gaps between them so that each field has an address divisible by its own size, making each field memory aligned.

For example,

struct Foo
{
    uint8_t  a;
    uint16_t b;
    uint8_t  c;
    uint32_t d;
};

Without padding, i.e., not trying to memory align the fields, sizeof(Foo) would be 1+2+1+4 = 8. The memory layout would look as follows:

01234567
abbcdddd

With padding, however, the memory layout becomes

0123456789AB
a.bbc...dddd

where . designate padding, which is basically unused memory. sizeof(Foo) is now 12. Giving up these 4 bytes should make working with Foo objects more CPU-efficient. We are trading in memory for speed.

Arrays

Consider

struct Bar
{
    uint32_t a;
    uint8_t  b;
};

You might think that, even with padding, sizeof(Bar) equals 5: each field is perfectly memory-aligned. However, you’ll find that if you write code that outputs sizeof(Bar), you’ll get 8.

This is due to the fact that the compiler also takes into account the possibility of having multiple Bar`s in a row (i.e., an array.) Take for example `Bar* p = new Bar[2]:

0123456789
aaaabaaaab

Now &p[1].a has address 5 while it has size 4.

To remedy this, the compiler adds further padding at the end of the struct so as to make its size a multiple of the size of the largest member. In Bar's case, a is largest, counting 4 bytes. Without padding, sizeof(Bar) == 5, so an extra 3 bytes are required to make sizeof(Bar) a multiple of 4. Bar's memory layout thus becomes

01234567
aaaab...

Every field is perfectly aligned even in the case of arrays:

0123456789ABCDEF
aaaab...aaaab...

Forcing Memory Alignment

Sometimes we need to prevent the compiler from padding structures. This is done using the #pragma directive:

#pragma pack(push, 1)

struct Foo
{
    uint8_t  a;
    uint16_t b;
    uint8_t  c;
    uint32_t d;
};

#pragma pack(pop)

The 1 indicates that we wish to align fields on multiples of 1, meaning there should be no padding at all.