[x86_64] Use 2-way GHASH pclmul update
I observed that pclmulqdq has latency of 7 cycles on Comet Lake arch and a reciprocal throughput of 7/7 = 1 so 2-way GHASH block update nearly doubles the performance speed on that architecture.
Tested on Intel Core i5-10300H
1-way (Former) | 2-way |
---|---|
3014.85 Mbyte/s | 6010.53 Mbyte/s |