[PowerPC] Implement Poly1305 multi block update based on radix 2^44
This patch optimizes Poly1305 for powerpc64 architecture by utilizing POWER9-specific instruction vmsumudm
for full 64-bit multiplication applied on 4-blocks at parallel based on radix 2^44
testsuite passes all tests of this patch.
Benchmark of poly1305 update using nettle-benchmark on Power9
C | This patch |
---|---|
472.63 Mbyte/s | 2136.30 Mbyte/s |