[S390x] Optimize SHA3 permute using vector facility
This patch optimizes SHA3 permute function by taking advantage of supported vector facility. Vectorizing SHA3 permute fits more than applying SHA3 hardware-accelerator for s390x architecture in terms of implementing the actual permute procedure only rather than executing unneeded extra procedures which are handled by other functions in nettle library. Applying SHA3 hardware-accelerator in a previous patch yielded 12% performance boost while this patch has ~105% performance increase for SHA3 functions. The optimized core follows the same optimization procedure that used in SHA3 permute implementation for x86_64 architecture.
|Algorithm||C (Mbyte/s)||Vectorized (Mbyte/s)|