[Arm64] Optimize Chacha20
This patch optimizes Chacha20 for arm64 architecture by following the approach used in powerpc implementation.
testsuite passes all tests of this patch.
Benchmark of chacha encrypt/decrypt using nettle-benchmark on gfarm 117
C | This patch |
---|---|
197.72 Mbyte/s | 357.58 Mbyte/s |
NOTE: This patch is implemented while both endianess modes are in mind but has been tested only on little-endian variant because of lack of big-endian access.
Edited by Maamoun TK