    • Michael Weiser's avatar
      arm: Unify neon asm for big- and little-endian modes · 62dc4ce4
      Michael Weiser authored
      Switch arm neon assembler routines to endianness-agnostic loads and
      stores where possible to avoid modifications to the rest of the code.
      This involves switching to vld1.32 for loading consecutive 32-bit words
      in host endianness as well as vst1.8 for storing back to memory in
      little-endian order as required by the caller. Where necessary, r3 is
      used to store the precalculated offset into the source vector for the
      secondary load operations. vstm is kept for little-endian platforms
      because it is faster than vst1 on most ARM implementations.
      vst1.x (at least on the Allwinner A20 Cortex-A7 implementation) seems to
      interfer with itself on subsequent calls, slowing it down further. So we
      reschedule some instructions to do stores as soon as results become
      available to have some other calculations or loads before the next
      vst1.x. This reliably saves two additional cycles per block on salsa20
      and chacha which would otherwise be incurred.
      vld1.x does not seem to suffer from this or at least not to a level
      where two consecutive vld1.x run slower than an equivalent vldm.
      Rescheduling them similarly did not improve performance beyond that of
      Signed-off-by: Michael Weiser's avatarMichael Weiser <michael.weiser@gmx.de>
    • Niels Möller's avatar
      ppc: Fix use of __GLIBC_PREREQ in fat-ppc.c. · 49cb4039
      Niels Möller authored
      * fat-ppc.c: Don't use __GLIBC_PREREQ in the same preprocessor
      conditional as defined(__GLIBC_PREREQ), but move to a nested #if
      conditional. Fixes compile error on OpenBSD/powerpc64, reported by
      Jasper Lievisse Adriaanse.
