Skip to content

[x86_64] Implement Poly1305 based on 2^26 using AVX2

Maamoun TK requested to merge mamonet/nettle:poly_avx2 into master

This patch adds optimized version of Poly1305 based on 2^26 using AVX2 instructions and YMM registers, it interleaves four-blocks horizontally for each loop iteration.

The patch adds new option --enable-x86-avx2 for configuration to compile AVX2 files.

testsuite passes all tests of this patch.

Benchmark of poly1305 update on intel Core i5-10300H CPU

Upstream (Standard based on radix 64) This patch (AVX2 based on radix 26)
3900.75 Mbyte/s (1.136 cpb) 6490.70 Mbyte/s (0.691 cpb)

Merge request reports