[AArch64] Optimize GHASH

This patch optimizes GHASH on AArch64 architecture. The patch check for little-endian mode to enable the optimized GHASH core, Optimizing GHASH on little-endian mode using PMULL instruction is a little bit tricky because the 64-bit operations on SIMD registers are byte-reversed in little-endian mode so in order to get a correct result the input must be 64-bit byte-reversed and in this case the output of PMULL instruction will be 128-bit byte-reversed.

GCM Benchmark result:

Version Mbyte/s
C 208
Optimized GHASH 3255

