[S390x] Optimize memxor
This patch optimizes memxor function for s390x architecture. The optimized core takes advantage of xc instruction "Storage-to-storage xor" to implement high performance memxor function. Unfortunately, xc instruction processes the bytes in left-to-right order which is not suitable to assist implementing memxor3 function, I tried to make a workaround for that issue but it yields a slower performance than the one implementing in C so I dropped that implementation.
Benchmark of memxor run on z15 with 5.2 GHz CPU frequency
| mode | C | xc-assisted implementation |
|---|---|---|
| aligned | 22552.01 Mbyte/s | 32331.91 Mbyte/s |
| unaligned | 13152.09 Mbyte/s | 32086.29 Mbyte/s |