[S390x] Optimize Poly1305 based on radix 2^44 with fat build support
This patch optimizes Poly1305 for s390x architecture by utilizing Z14-specific instruction vmslg
for full 64-bit multiplication applied on 4-blocks at parallel based on radix 2^44
testsuite passes all tests of this patch.
Benchmark of poly1305 update using nettle-benchmark on z15
C | This patch |
---|---|
656.83 Mbyte/s | 5852.57 Mbyte/s |
Edited by Maamoun TK