[S390x] Optimize memxor
This patch optimizes memxor
function for s390x
architecture. The optimized core takes advantage of xc
instruction "Storage-to-storage xor" to implement high performance memxor
function. Unfortunately, xc
instruction processes the bytes in left-to-right order which is not suitable to assist implementing memxor3
function, I tried to make a workaround for that issue but it yields a slower performance than the one implementing in C so I dropped that implementation.
Benchmark of memxor
run on z15 with 5.2 GHz CPU frequency
mode | C | xc-assisted implementation |
---|---|---|
aligned | 22552.01 Mbyte/s | 32331.91 Mbyte/s |
unaligned | 13152.09 Mbyte/s | 32086.29 Mbyte/s |