nettle merge requestshttps://git.lysator.liu.se/nettle/nettle/-/merge_requests2024-03-20T04:25:25Zhttps://git.lysator.liu.se/nettle/nettle/-/merge_requests/25arm: Use getauxval for feature detection2024-03-20T04:25:25ZRichard Hendersonarm: Use getauxval for feature detectionIn certain minimal chroot /proc is not mounted, but the kernel always fills in AT_PLATFORM and AT_HWCAP.
I think using configure to detect <sys/auxv.h> is cleaner than the current practice of glibc version detection.In certain minimal chroot /proc is not mounted, but the kernel always fills in AT_PLATFORM and AT_HWCAP.
I think using configure to detect <sys/auxv.h> is cleaner than the current practice of glibc version detection.https://git.lysator.liu.se/nettle/nettle/-/merge_requests/62Support ML-KEM (Kyber) key encapsulation mechanism2024-03-05T01:20:11ZDaiki UenoSupport ML-KEM (Kyber) key encapsulation mechanismThis adds support for ML-KEM (Kyber) key encapsulation mechanism being
standardized in FIPS 203[1], based also the explanation in
draft-schwabe-cfrg-kyber[2]. A couple of notes on the implementation:
- While the algorithm itself does n...This adds support for ML-KEM (Kyber) key encapsulation mechanism being
standardized in FIPS 203[1], based also the explanation in
draft-schwabe-cfrg-kyber[2]. A couple of notes on the implementation:
- While the algorithm itself does not require bignum arithmetic, it is
implemented as part of libhogweed, as polynomials are represented as
an mp_limb_t array on heap allocated using GMP allocation functions.
- There is a slight difference between the NIST draft and the round 3
submission which [2] is based on. KYBER_ROUND3 macro is added to
control the behavior.
1. https://csrc.nist.gov/pubs/fips/203/ipd
2. https://datatracker.ietf.org/doc/draft-cfrg-schwabe-kyber/https://git.lysator.liu.se/nettle/nettle/-/merge_requests/27Implement HPKE2024-02-05T08:22:52ZNorbert PócsImplement HPKEImplementing Hybrid Public Key Encryption draft version 9 [0] with all modes.
Test included with test vectors from the draft.
[0]- [https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-hpke-09](https://datatracker.ietf.org/doc/html/dra...Implementing Hybrid Public Key Encryption draft version 9 [0] with all modes.
Test included with test vectors from the draft.
[0]- [https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-hpke-09](https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-hpke-09)https://git.lysator.liu.se/nettle/nettle/-/merge_requests/20RSA-OAEP encryption/decryption2024-01-06T01:15:43ZNicolas MoraRSA-OAEP encryption/decryptionThis is the implementation of the RSAES-OAEP as defined in the [RFC 3347](https://tools.ietf.org/html/rfc3447#section-7.1
)
The added test suite verifies the test vectors provided in the [RFC 7516](https://tools.ietf.org/html/rfc7516#ap...This is the implementation of the RSAES-OAEP as defined in the [RFC 3347](https://tools.ietf.org/html/rfc3447#section-7.1
)
The added test suite verifies the test vectors provided in the [RFC 7516](https://tools.ietf.org/html/rfc7516#appendix-A.1) and the Document "RSAES-OAEP Encryption Scheme Algorithm specification and supporting documentation"https://git.lysator.liu.se/nettle/nettle/-/merge_requests/58[x86_64] Use 4-way poly1305 update2023-03-25T23:50:55ZMaamoun TK[x86_64] Use 4-way poly1305 updateUsing 4-way poly1305 block update by taking advantage of AVX2 instructions based on radix 26 yields significant performance improvement on Comet Lake arch.
The avx2 code has threshold of 32 blocks as it starts to make performance differ...Using 4-way poly1305 block update by taking advantage of AVX2 instructions based on radix 26 yields significant performance improvement on Comet Lake arch.
The avx2 code has threshold of 32 blocks as it starts to make performance difference. Fat build support is to be added later once MR is approved.
Tested on Intel Core i5-10300H
| 1-way (Radix 64) | 4-way (AVX2 Radix 26) |
| ------ | ------ |
| 4820.42 Mbyte/s | 6256.05 Mbyte/s |https://git.lysator.liu.se/nettle/nettle/-/merge_requests/38[Arm64] Optimize Poly1305 with fat build support2022-05-09T07:06:40ZMaamoun TK[Arm64] Optimize Poly1305 with fat build supportThis patch optimizes Poly1305 for arm64 architecture by using 2-way interleaving.
testsuite passes all tests of this patch.
Benchmark of poly1305 update using nettle-benchmark on gfarm 117
| C | This patch |
| ------ | ------ |
| 650.6...This patch optimizes Poly1305 for arm64 architecture by using 2-way interleaving.
testsuite passes all tests of this patch.
Benchmark of poly1305 update using nettle-benchmark on gfarm 117
| C | This patch |
| ------ | ------ |
| 650.67 Mbyte/s | 923.22 Mbyte/s |
NOTE: This patch is implemented while both endianess modes are in mind but has been tested only on little-endian variant because of lack of big-endian access.https://git.lysator.liu.se/nettle/nettle/-/merge_requests/46[x86_64] Implement Poly1305 based on 2^26 using AVX22022-05-09T07:04:34ZMaamoun TK[x86_64] Implement Poly1305 based on 2^26 using AVX2This patch adds optimized version of Poly1305 based on 2^26 using AVX2 instructions and YMM registers, it interleaves four-blocks horizontally for each loop iteration.
The patch adds new option `--enable-x86-avx2` for configuration to c...This patch adds optimized version of Poly1305 based on 2^26 using AVX2 instructions and YMM registers, it interleaves four-blocks horizontally for each loop iteration.
The patch adds new option `--enable-x86-avx2` for configuration to compile AVX2 files.
testsuite passes all tests of this patch.
Benchmark of poly1305 update on intel Core i5-10300H CPU
| Upstream (Standard based on radix 64) | This patch (AVX2 based on radix 26) |
| ------ | ------ |
| 3900.75 Mbyte/s (1.136 cpb) | 6490.70 Mbyte/s (0.691 cpb) |https://git.lysator.liu.se/nettle/nettle/-/merge_requests/41[S390x] Optimize Poly1305 based on radix 2^44 with fat build support2022-04-30T12:26:54ZMaamoun TK[S390x] Optimize Poly1305 based on radix 2^44 with fat build supportThis patch optimizes Poly1305 for s390x architecture by utilizing Z14-specific instruction `vmslg` for full 64-bit multiplication applied on 4-blocks at parallel based on radix 2^44
testsuite passes all tests of this patch.
Benchmark o...This patch optimizes Poly1305 for s390x architecture by utilizing Z14-specific instruction `vmslg` for full 64-bit multiplication applied on 4-blocks at parallel based on radix 2^44
testsuite passes all tests of this patch.
Benchmark of poly1305 update using nettle-benchmark on z15
| C | This patch |
| ------ | ------ |
| 656.83 Mbyte/s | 5852.57 Mbyte/s |https://git.lysator.liu.se/nettle/nettle/-/merge_requests/43[S390x] Optimize scalar multiply of Curve25519 and Curve448 defined in RFC-77482022-01-19T16:31:31ZMaamoun TK[S390x] Optimize scalar multiply of Curve25519 and Curve448 defined in RFC-7748This patch implements scalar multiply of Curve25519 and Curve448 defined in RFC-7748 (also support group functions) on S390x architecture using hardware-accelerated instruction `pcc`.
testsuite passes all tests of this patch.
Benchmark...This patch implements scalar multiply of Curve25519 and Curve448 defined in RFC-7748 (also support group functions) on S390x architecture using hardware-accelerated instruction `pcc`.
testsuite passes all tests of this patch.
Benchmark on z15
| Function | C | This patch |
| ------ | ------ | ------ |
| curve25519_mul | 366 (us) | 17 (us) |
| curve25519_mul_g | 129 (us) | 17 (us) |
| curve448_mul | 1748 (us) | 35 (us) |
| curve448_mul_g | 624 (us) | 35 (us) |https://git.lysator.liu.se/nettle/nettle/-/merge_requests/42[S390x] Optimize prime modulo functions of elliptic curves2022-01-19T16:22:25ZMaamoun TK[S390x] Optimize prime modulo functions of elliptic curvesThis patch implements prime modulo functions of elliptic curves on S390x architecture.
testsuite passes all tests of this patch.
Benchmark using ecc-benchmark on z15
| Function | C | This patch |
| ------ | ------ | ------ |
| secp192r...This patch implements prime modulo functions of elliptic curves on S390x architecture.
testsuite passes all tests of this patch.
Benchmark using ecc-benchmark on z15
| Function | C | This patch |
| ------ | ------ | ------ |
| secp192r1_modp | 0.0324 (us) | 0.0112 (us) |
| secp224r1_modp | 0.0873 (us) | 0.0154 (us) |
| curve25519_modp | 0.0285 (us) | 0.0132 (us) |
| secp256r1_redc | 0.0832 (us) | 0.0157 (us) |
| secp384r1_modp | 0.1131 (us) | 0.0276 (us) |
| curve448_modp | 0.0910 (us) | 0.0224 (us) |
| secp521r1_modp | 0.0782 (us) | 0.0207 (us) |https://git.lysator.liu.se/nettle/nettle/-/merge_requests/19Implement aes key wrap and key unwrap (RFC 3394)2021-07-01T13:50:38ZNicolas MoraImplement aes key wrap and key unwrap (RFC 3394)https://git.lysator.liu.se/nettle/nettle/-/merge_requests/24Add AES API for CBC mode2021-05-10T08:51:09ZMaamoun TKAdd AES API for CBC modeAdd AES API for CBC mode in order to add the optimized CBC cores later.Add AES API for CBC mode in order to add the optimized CBC cores later.https://git.lysator.liu.se/nettle/nettle/-/merge_requests/17[S390x] Optimize AES modes2021-03-21T19:27:34ZMaamoun TK[S390x] Optimize AES modesThis patch takes advantage of built-in AES functions to optimize AES modes.
Added configurable options:
--enable-s390x-msa (Enable message-security assist on z/Architecture)
--enable-s390x-msa-x4 (Enable message-security-assist extensio...This patch takes advantage of built-in AES functions to optimize AES modes.
Added configurable options:
--enable-s390x-msa (Enable message-security assist on z/Architecture)
--enable-s390x-msa-x4 (Enable message-security-assist extension 4 on z/Architecture)
--enable-s390x-msa-x8 (Enable message-security-assist extension 8 on z/Architecture)
The patch contains fat support that checks the CPU features at runtime and run the optimized cores when the corresponding features are enabled.
**Benchmark**:
This benchmark is run on z15 with 5.2 GHz CPU frequency.
benchmark of AES functions measured by cycles per byte when message-security-assist extension 8 is enabled (KMA-GCM-AES is used to optimize AES-GCM mode).
| Function | C (CPB) | [MSA-X8] Hardware accelerated (CPB) |
| ------ | ------ | ------ |
| AES128 Encrypt | 21.7 | 0.9 |
| AES128 Decrypt | 23.7 | 0.8 |
| AES192 Encrypt | 25.7 | 0.7 |
| AES192 Decrypt | 26.6 | 0.7 |
| AES256 Encrypt | 28.7 | 0.7 |
| AES256 Decrypt | 30.3 | 0.7 |
| CBC-AES128 Encrypt | 27.2 | 1.2 |
| CBC-AES128 Decrypt | 26.6 | 0.8 |
| CBC-AES192 Encrypt | 31.5 | 1.4 |
| CBC-AES192 Decrypt | 29.6 | 0.8 |
| CBC-AES256 Encrypt | 34.7 | 1.6 |
| CBC-AES256 Decrypt | 33.3 | 0.8 |
| CFB-AES128 Encrypt | 28.6 | 1.3 |
| CFB-AES128 Decrypt | 23.5 | 1.3 |
| CFB-AES192 Encrypt | 32.7 | 1.6 |
| CFB-AES192 Decrypt | 28.4 | 1.5 |
| CFB-AES256 Encrypt | 35.8 | 1.7 |
| CFB-AES256 Decrypt | 31.2 | 1.7 |
| CFB8-AES128 Encrypt | 341.6 | 17.3 |
| CFB8-AES128 Decrypt | 328.3 | 17.4 |
| CFB8-AES192 Encrypt | 398.2 | 20.4 |
| CFB8-AES192 Decrypt | 385.0 | 20.4 |
| CFB8-AES256 Encrypt | 453.3 | 23.4 |
| CFB8-AES256 Decrypt | 440.7 | 23.4 |
| CMAC-AES128 Update | 21.9 | 1.0 |
| CMAC-AES256 Update | 28.8 | 1.3 |
| CCM-AES128 Encrypt | 44.3 | 1.8 |
| CCM-AES128 Decrypt | 44.0 | 3.0 |
| CCM-AES128 Update | 21.6 | 1.0 |
| CCM-AES192 Encrypt | 52.0 | 2.0 |
| CCM-AES192 Decrypt | 52.0 | 3.2 |
| CCM-AES192 Update | 25.3 | 1.2 |
| CCM-AES256 Encrypt | 58.6 | 2.2 |
| CCM-AES256 Decrypt | 58.6 | 3.3 |
| CCM-AES256 Update | 28.4 | 1.4 |
| CTR-AES128 Crypt | 22.6 | 0.8 |
| CTR-AES192 Crypt | 26.7 | 0.8 |
| CTR-AES256 Crypt | 29.9 | 0.8 |
| XTS-AES128 Encrypt | 26.5 | 0.8 |
| XTS-AES128 Decrypt | 27.2 | 0.8 |
| XTS-AES256 Encrypt | 33.4 | 0.8 |
| XTS-AES256 Decrypt | 35.9 | 0.8 |
| GCM-AES128 Encrypt | 33.8 | 0.8 | 6.8 |
| GCM-AES128 Decrypt | 34.0 | 0.8 | 5.0 |
| GCM-AES128 Update | 11.6 | 0.5 | 0.4 |
| GCM-AES192 Encrypt | 38.4 | 0.8 | 6.8 |
| GCM-AES192 Decrypt | 39.1 | 0.8 | 5.0 |
| GCM-AES192 Update | 11.6 | 0.5 | 0.4 |
| GCM-AES256 Encrypt | 41.7 | 0.8 | 6.6 |
| GCM-AES256 Decrypt | 41.7 | 0.8 | 4.6 |
| GCM-AES256 Update | 11.5 | 0.5 | 0.4 |
benchmark of AES-GCM mode functions measured by cycles per byte when message-security-assist extension 4 is enabled (KM-AES and KIMD-GHASH are used to optimize AES-GCM mode).
| Function | C (CPB) | [MSA-X4] Hardware accelerated (CPB) |
| ------ | ------ | ------ |
| GCM-AES128 Encrypt | 33.8 | 6.8 |
| GCM-AES128 Decrypt | 34.0 | 5.0 |
| GCM-AES128 Update | 11.6 | 0.4 |
| GCM-AES192 Encrypt | 38.4 | 6.8 |
| GCM-AES192 Decrypt | 39.1 | 5.0 |
| GCM-AES192 Update | 11.6 | 0.4 |
| GCM-AES256 Encrypt | 41.7 | 6.6 |
| GCM-AES256 Decrypt | 41.7 | 4.6 |
| GCM-AES256 Update | 11.5 | 0.4 |https://git.lysator.liu.se/nettle/nettle/-/merge_requests/8Big-endian ARM CI employing a fat build2020-07-21T09:44:23ZMichael WeiserBig-endian ARM CI employing a fat buildThis change adds a big-endian ARM CI build to the infrastructure. For now it uses external images (built by me) from Docker Hub, not GnuTLS's build-images and the Gitlab registry. That can (and IMO should) be changed, of course.This change adds a big-endian ARM CI build to the infrastructure. For now it uses external images (built by me) from Docker Hub, not GnuTLS's build-images and the Gitlab registry. That can (and IMO should) be changed, of course.https://git.lysator.liu.se/nettle/nettle/-/merge_requests/6WIP: Support GOST R 34.11-2012 (Streebog) hash function2020-06-14T21:21:57ZDmitry BaryshkovWIP: Support GOST R 34.11-2012 (Streebog) hash functionhttps://git.lysator.liu.se/nettle/nettle/-/merge_requests/7GOST 28147-89 support2020-05-22T14:57:23ZDmitry BaryshkovGOST 28147-89 support