Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | chapoly: Process two Poly1305 blocks in parallel in SSSE3 driver | Martin Willi | 2015-07-12 | 1 | -85/+291 |
| | | | | | | | | | | By using a derived key r^2 we can improve performance, as we can do loop unrolling and slightly better utilize SIMD instructions. Overall ChaCha20-Poly1305 performance increases by ~12%. Converting integers to/from our 5-word representation in SSE does not seem to pay off, so we work on individual words. | ||||
* | chapoly: Process four ChaCha20 blocks in parallel in SSSE3 driver | Martin Willi | 2015-07-12 | 1 | -16/+207 |
| | | | | | As we don't have to shuffle the state in each ChaCha round, overall performance for ChaCha20-Poly1305 increases by ~40%. | ||||
* | chapoly: Add an SSSE3 based driver | Martin Willi | 2015-06-29 | 4 | -1/+514 |
| | | | | | | | | | | | | | We always build the driver on x86/x64, but enable it only if SSSE3 support is detected during runtime. Poly1305 uses parallel 32-bit multiplication operands yielding a 64-bit result, for which two can be done in parallel in SSE. This is minimally faster than multiplication with 64-bit operands, and also works on 32-bit builds not having a __int128 result type. On a 32-bit architecture, this is more than twice as fast as the portable driver, and on 64-bit it is ~30% faster. | ||||
* | chapoly: Add a ChaCha20/Poly1305 driver implemented in portable C | Martin Willi | 2015-06-29 | 4 | -0/+488 |
| | |||||
* | chapoly: Provide a generic ChaCha20/Poly1305 AEAD supporting driver backends | Martin Willi | 2015-06-29 | 7 | -0/+672 |
| | |||||
* | test-vectors: Add some initial ChaCha20/Poly1305 AEAD test vector | Martin Willi | 2015-06-29 | 3 | -0/+112 |
| | |||||
* | openssl: Don't refer to EVP_des_ecb() if OpenSSL is built without DES support | Tobias Brunner | 2015-04-17 | 1 | -0/+2 |
| | | | | | | While DES-ECB is not registered by the plugin in this case (so the function will never actually be called), the compiler still warns about the implicitly declared function. | ||||
* | test-vectors: Define test vector symbols as extern | Martin Willi | 2015-04-16 | 1 | -7/+7 |
| | | | | | | We don't actually define a vector, but only prototype the test vector implemented in a different file. GCC uses the correct symbol during testing, but clang correctly complains about duplicated symbols during linking. | ||||
* | aesni: Fix doxygen groups | Martin Willi | 2015-04-15 | 1 | -2/+2 |
| | |||||
* | gcrypt: Explicitly initialize RNG backend to allocate static data | Martin Willi | 2015-04-15 | 1 | -0/+3 |
| | | | | | | The libgcrypt RNG implementation uses static buffer allocation which it does not free. There is no symbol we can catch in leak-detective, hence we explicitly initialize the RNG during the whitelisted gcrypt_plugin_create() function. | ||||
* | gcrypt: Support setting private value and testing of DH backend | Martin Willi | 2015-04-15 | 1 | -0/+19 |
| | |||||
* | openssl: Support setting ECDH private values | Martin Willi | 2015-04-15 | 1 | -0/+44 |
| | |||||
* | openssl: Support setting private Diffie-Hellman values | Martin Willi | 2015-04-15 | 1 | -0/+13 |
| | |||||
* | gmp: Support setting Diffie-Hellman private values | Martin Willi | 2015-04-15 | 1 | -0/+10 |
| | |||||
* | test-vectors: Add DH vectors for Brainpool groups | Martin Willi | 2015-04-15 | 3 | -0/+118 |
| | |||||
* | test-vectors: Add DH vectors for ECDH groups | Martin Willi | 2015-04-15 | 3 | -0/+140 |
| | |||||
* | test-vectors: Add DH vectors for subgroup MODP groups | Martin Willi | 2015-04-15 | 3 | -0/+168 |
| | |||||
* | test-vectors: Add DH vectors for normal MODP groups | Martin Willi | 2015-04-15 | 3 | -0/+741 |
| | |||||
* | test-vectors: Support testing DH groups | Martin Willi | 2015-04-15 | 1 | -1/+16 |
| | |||||
* | aesni: Avoid loading AES/GHASH round keys into local variables | Martin Willi | 2015-04-15 | 6 | -1568/+1244 |
| | | | | | | | | | | The performance impact is not measurable, as the compiler loads these variables in xmm registers in unrolled loops anyway. However, we avoid loading these sensitive keys onto the stack. This happens for larger key schedules, where the register count is insufficient. If that key material is not on the stack, we can avoid to wipe it explicitly after crypto operations. | ||||
* | aesni: Align all class instances to 16 byte boundaries | Martin Willi | 2015-04-15 | 7 | -14/+14 |
| | | | | | | While the required members are aligned in the struct as required, on 32-bit platforms the allocator aligns the structures itself to 8 bytes only. This results in non-aligned struct members, and invalid memory accesses. | ||||
* | aesni: Calculate GHASH for 4 blocks of associated data in parallel | Martin Willi | 2015-04-15 | 1 | -2/+18 |
| | | | | | While associated data is usually not that large, in some specific cases this can bring a significant performance boost. | ||||
* | aesni: Calculate GHASH for 4 blocks of encryption data in parallel | Martin Willi | 2015-04-15 | 1 | -40/+180 |
| | | | | Increases performance by another ~30%. | ||||
* | aesni: Use 4-way parallel en/decryption in GCM | Martin Willi | 2015-04-15 | 1 | -132/+635 |
| | | | | Increases overall performance by ~25%. | ||||
* | aesni: Use dedicated key size specific en-/decryption functions in GCM | Martin Willi | 2015-04-15 | 1 | -24/+353 |
| | | | | | This gives not much more than ~5% increase in performance, but allows us to improve further. | ||||
* | aesni: Add a GCM AEAD based on the AES-NI key schedule | Martin Willi | 2015-04-15 | 4 | -1/+627 |
| | |||||
* | aesni: Implement CMAC mode to provide a signer/prf | Martin Willi | 2015-04-15 | 4 | -0/+441 |
| | | | | | Compared to the cmac plugin using AESNI-CBC as backend, this improves performance of AES-CMAC by ~45%. | ||||
* | aesni: Implement XCBC mode to provide a signer/prf | Martin Willi | 2015-04-15 | 4 | -0/+436 |
| | | | | | Compared to the xcbc plugin using AESNI-CBC as backend, this improves performance of AES-XCBC by ~45%. | ||||
* | aesni: Partially use separate code paths for different key sizes in CCM | Martin Willi | 2015-04-15 | 1 | -33/+438 |
| | | | | Due to the serial nature of the CBC mac, this brings only a marginal speedup. | ||||
* | aesni: Add a CCM AEAD reusing the key schedule | Martin Willi | 2015-04-15 | 4 | -0/+645 |
| | |||||
* | aesni: Use 4-way parallel AES-NI instructions for CTR en/decryption | Martin Willi | 2015-04-15 | 1 | -115/+354 |
| | | | | | | | CTR can be parallelized, and we do so by queueing instructions to the processor pipeline. While we have enough registers for 128-bit decryption, the register count is insufficient to hold all variables with larger key sizes. Nonetheless is 4-way parallelism faster, depending on key size between ~10% and ~25%. | ||||
* | aesni: Use dedicated round count specific encryption functions in CTR mode | Martin Willi | 2015-04-15 | 1 | -23/+243 |
| | | | | | This allows us to unroll loops and hold the key schedule in local (register) variables. This brings an impressive speedup of ~45%. | ||||
* | aesni: Implement a AES-NI based CTR crypter using the key schedule | Martin Willi | 2015-04-15 | 4 | -0/+278 |
| | |||||
* | aesni: Use 4-way parallel AES-NI instructions for CBC decryption | Martin Willi | 2015-04-15 | 1 | -66/+314 |
| | | | | | | | CBC decryption can be parallelized, and we do so by queueing instructions to the processor pipeline. While we have enough registers for 128-bit decryption, the register count is insufficient to hold all variables with larger key sizes. Nonetheless is 4-way parallelism faster, roughly by ~8%. | ||||
* | aesni: Use separate en-/decryption CBC code paths for different key sizes | Martin Willi | 2015-04-15 | 1 | -22/+290 |
| | | | | | | This allows us to unroll loops, and use local (register) variables for the key schedule. This improves performance slightly for encryption, but a lot for reorderable decryption (>30%). | ||||
* | aesni: Implement a AES-NI based CBC crypter using the key schedule | Martin Willi | 2015-04-15 | 4 | -0/+293 |
| | |||||
* | aesni: Implement 256-bit key schedule | Martin Willi | 2015-04-15 | 1 | -0/+77 |
| | |||||
* | aesni: Implement 192-bit key schedule | Martin Willi | 2015-04-15 | 1 | -0/+81 |
| | |||||
* | aesni: Implement 128-bit key schedule | Martin Willi | 2015-04-15 | 1 | -0/+45 |
| | |||||
* | aesni: Add a common key schedule class for AES | Martin Willi | 2015-04-15 | 3 | -0/+165 |
| | |||||
* | aesni: Provide a plugin stub for AES-NI instruction based crypto primitives | Martin Willi | 2015-04-15 | 3 | -0/+141 |
| | |||||
* | test-vectors: Add some self-made additional AES-GCM test vectors | Martin Willi | 2015-04-15 | 2 | -0/+157 |
| | | | | | We missed test vectors for 192/256-bit key vectors for ICV8/12, and should also have some for larger associated data chunk. | ||||
* | test-vectors: Define some additional CCM test vectors | Martin Willi | 2015-04-15 | 2 | -1/+84 |
| | | | | | | We don't have any where plain or associated data is not a multiple of the block size, but it is likely to find bugs here. Also, we miss some ICV12 test vectors using 128- and 192-bit key sizes. | ||||
* | crypto-tester: Use the plugin feature key size to benchmark crypters/aeads | Martin Willi | 2015-04-15 | 1 | -0/+2 |
| | | | | | | We previously didn't pass the key size during algorithm registration, but this resulted in benchmarking with the "default" key size the crypter uses when passing 0 as key size. | ||||
* | utils: Use chunk_equals_const() for all cryptographic purposes | Martin Willi | 2015-04-14 | 4 | -4/+4 |
| | |||||
* | utils: Use memeq_const() for all cryptographic purposes | Martin Willi | 2015-04-14 | 4 | -6/+5 |
| | |||||
* | rdrand: Reuse CPU feature detection to check for RDRAND instructions | Martin Willi | 2015-04-13 | 1 | -51/+4 |
| | |||||
* | padlock: Reuse common CPU feature detection to check for Padlock features | Martin Willi | 2015-04-13 | 1 | -80/+17 |
| | |||||
* | sqlite: Use our locking mechanism also when sqlite3_threadsafe() returns 0 | Martin Willi | 2015-04-13 | 1 | -7/+20 |
| | | | | | | We previously checked for older library versions without locking support at all. But newer libraries can be built in single-threading mode as well, where we have to care about the locking. | ||||
* | sqlite: Show SQLite library version and thread safety flag during startup | Martin Willi | 2015-04-13 | 1 | -1/+8 |
| |