ARM supports big- and little-endian memory access modes. Representation in
registers stays the same but loads and stores switch bytes. This has to be
taken into account in various cases.
Two m4 macros are provided to handle these special cases in assembly source:
respectively expand to <if-true> if the target system's endianness is
little-endian or big-endian. Otherwise they expand to <if-false>.
1. ldr/str
Loading and storing 32-bit words will reverse the words' bytes in little-endian
mode. If the handled data is actually a byte sequence or data in network byte
order (big-endian), the loaded word needs to be reversed after load to get it
back into correct sequence. See v6/sha1-compress.asm LOAD macro for example.
2. shifts
If data is to be processed with bit operations only, endianness can be ignored
because byte-swapping on load and store will cancel each other out. Shifts
however have to be inverted. See arm/memxor.asm for an example.
3. vld1.8
NEON's vld instruction can be used to produce endianness-neutral code. vld1.8
will load a byte sequence into a register regardless of memory endianness. This
can be used to process byte sequences. See arm/neon/umac-nh.asm for example.
4. vldm/vstm
Care has to be taken when using vldm/vstm because they have two non-obvious
a. vldm/vstm do normal byte-swapping on each value they load. When loading into
d (doubleword) registers, this means that bytes, halfwords and words of the
doubleword get swapped. When the data loaded actually represents e.g.
vectors of 32-bit words this will swap columns.
a. vldm/vstm on q (quadword) registers get translated into lvdm/vstm on the
equivalent number of d (doubleword) registers. Instead of a 128-bit load it
does two 64-bit loads. When again handling vectors of 32-bit words this will
still swap adjacent columns but will not reverse all four columns.
memory adr0: w0 w1 w2 w3
register q0: w1 w0 w3 w2
See arm/neon/chacha-core-internal.asm for an example.
5. simple byte store
Sometimes it is necessary to store remaining single bytes to memory. A simple
logic will store the lowest byte from a register, then do a right shift and
start over until all bytes are stored. Since this constitutes a
least-significant-byte-first store, the data to be stored needs to be reversed
first on a big-endian system. See arm/memxor.asm Lmemxor_leftover for an
6. Function parameters/return values
AAPCS requires 64-bit parameters to be passed to and returned from functions
"in two consecutive registers [...] as if the value had been loaded from memory
representation with a single LDM instruction." Since loading a big-endian
doubleword using ldm transposes its words, the same has to be done when e.g.
returning a 64-bit value from an assembler routine. See arm/neon/umac-nh.asm
for an example.
