tteras/strongswan - tteras' strongSwan tree

	Commit message (Collapse)	Author	Age	Files	Lines
*	chapoly: Process two Poly1305 blocks in parallel in SSSE3 driver	Martin Willi	2015-07-12	1	-85/+291
\| \| \| \| \| \| \| \| \| \|	By using a derived key r^2 we can improve performance, as we can do loop unrolling and slightly better utilize SIMD instructions. Overall ChaCha20-Poly1305 performance increases by ~12%. Converting integers to/from our 5-word representation in SSE does not seem to pay off, so we work on individual words.
*	chapoly: Process four ChaCha20 blocks in parallel in SSSE3 driver	Martin Willi	2015-07-12	1	-16/+207
\| \| \| \| \|	As we don't have to shuffle the state in each ChaCha round, overall performance for ChaCha20-Poly1305 increases by ~40%.
*	chapoly: Add an SSSE3 based driver	Martin Willi	2015-06-29	4	-1/+514
\| \| \| \| \| \| \| \| \| \| \| \| \|	We always build the driver on x86/x64, but enable it only if SSSE3 support is detected during runtime. Poly1305 uses parallel 32-bit multiplication operands yielding a 64-bit result, for which two can be done in parallel in SSE. This is minimally faster than multiplication with 64-bit operands, and also works on 32-bit builds not having a __int128 result type. On a 32-bit architecture, this is more than twice as fast as the portable driver, and on 64-bit it is ~30% faster.
*	chapoly: Add a ChaCha20/Poly1305 driver implemented in portable C	Martin Willi	2015-06-29	4	-0/+488
\|
*	chapoly: Provide a generic ChaCha20/Poly1305 AEAD supporting driver backends	Martin Willi	2015-06-29	7	-0/+672
\|
*	test-vectors: Add some initial ChaCha20/Poly1305 AEAD test vector	Martin Willi	2015-06-29	3	-0/+112
\|
*	openssl: Don't refer to EVP_des_ecb() if OpenSSL is built without DES support	Tobias Brunner	2015-04-17	1	-0/+2
\| \| \| \| \| \|	While DES-ECB is not registered by the plugin in this case (so the function will never actually be called), the compiler still warns about the implicitly declared function.
*	test-vectors: Define test vector symbols as extern	Martin Willi	2015-04-16	1	-7/+7
\| \| \| \| \| \|	We don't actually define a vector, but only prototype the test vector implemented in a different file. GCC uses the correct symbol during testing, but clang correctly complains about duplicated symbols during linking.
*	aesni: Fix doxygen groups	Martin Willi	2015-04-15	1	-2/+2
\|
*	gcrypt: Explicitly initialize RNG backend to allocate static data	Martin Willi	2015-04-15	1	-0/+3
\| \| \| \| \| \|	The libgcrypt RNG implementation uses static buffer allocation which it does not free. There is no symbol we can catch in leak-detective, hence we explicitly initialize the RNG during the whitelisted gcrypt_plugin_create() function.
*	gcrypt: Support setting private value and testing of DH backend	Martin Willi	2015-04-15	1	-0/+19
\|
*	openssl: Support setting ECDH private values	Martin Willi	2015-04-15	1	-0/+44
\|
*	openssl: Support setting private Diffie-Hellman values	Martin Willi	2015-04-15	1	-0/+13
\|
*	gmp: Support setting Diffie-Hellman private values	Martin Willi	2015-04-15	1	-0/+10
\|
*	test-vectors: Add DH vectors for Brainpool groups	Martin Willi	2015-04-15	3	-0/+118
\|
*	test-vectors: Add DH vectors for ECDH groups	Martin Willi	2015-04-15	3	-0/+140
\|
*	test-vectors: Add DH vectors for subgroup MODP groups	Martin Willi	2015-04-15	3	-0/+168
\|
*	test-vectors: Add DH vectors for normal MODP groups	Martin Willi	2015-04-15	3	-0/+741
\|
*	test-vectors: Support testing DH groups	Martin Willi	2015-04-15	1	-1/+16
\|
*	aesni: Avoid loading AES/GHASH round keys into local variables	Martin Willi	2015-04-15	6	-1568/+1244
\| \| \| \| \| \| \| \| \| \|	The performance impact is not measurable, as the compiler loads these variables in xmm registers in unrolled loops anyway. However, we avoid loading these sensitive keys onto the stack. This happens for larger key schedules, where the register count is insufficient. If that key material is not on the stack, we can avoid to wipe it explicitly after crypto operations.
*	aesni: Align all class instances to 16 byte boundaries	Martin Willi	2015-04-15	7	-14/+14
\| \| \| \| \| \|	While the required members are aligned in the struct as required, on 32-bit platforms the allocator aligns the structures itself to 8 bytes only. This results in non-aligned struct members, and invalid memory accesses.
*	aesni: Calculate GHASH for 4 blocks of associated data in parallel	Martin Willi	2015-04-15	1	-2/+18
\| \| \| \| \|	While associated data is usually not that large, in some specific cases this can bring a significant performance boost.
*	aesni: Calculate GHASH for 4 blocks of encryption data in parallel	Martin Willi	2015-04-15	1	-40/+180
\| \| \| \|	Increases performance by another ~30%.
*	aesni: Use 4-way parallel en/decryption in GCM	Martin Willi	2015-04-15	1	-132/+635
\| \| \| \|	Increases overall performance by ~25%.
*	aesni: Use dedicated key size specific en-/decryption functions in GCM	Martin Willi	2015-04-15	1	-24/+353
\| \| \| \| \|	This gives not much more than ~5% increase in performance, but allows us to improve further.
*	aesni: Add a GCM AEAD based on the AES-NI key schedule	Martin Willi	2015-04-15	4	-1/+627
\|
*	aesni: Implement CMAC mode to provide a signer/prf	Martin Willi	2015-04-15	4	-0/+441
\| \| \| \| \|	Compared to the cmac plugin using AESNI-CBC as backend, this improves performance of AES-CMAC by ~45%.
*	aesni: Implement XCBC mode to provide a signer/prf	Martin Willi	2015-04-15	4	-0/+436
\| \| \| \| \|	Compared to the xcbc plugin using AESNI-CBC as backend, this improves performance of AES-XCBC by ~45%.
*	aesni: Partially use separate code paths for different key sizes in CCM	Martin Willi	2015-04-15	1	-33/+438
\| \| \| \|	Due to the serial nature of the CBC mac, this brings only a marginal speedup.
*	aesni: Add a CCM AEAD reusing the key schedule	Martin Willi	2015-04-15	4	-0/+645
\|
*	aesni: Use 4-way parallel AES-NI instructions for CTR en/decryption	Martin Willi	2015-04-15	1	-115/+354
\| \| \| \| \| \| \|	CTR can be parallelized, and we do so by queueing instructions to the processor pipeline. While we have enough registers for 128-bit decryption, the register count is insufficient to hold all variables with larger key sizes. Nonetheless is 4-way parallelism faster, depending on key size between ~10% and ~25%.
*	aesni: Use dedicated round count specific encryption functions in CTR mode	Martin Willi	2015-04-15	1	-23/+243
\| \| \| \| \|	This allows us to unroll loops and hold the key schedule in local (register) variables. This brings an impressive speedup of ~45%.
*	aesni: Implement a AES-NI based CTR crypter using the key schedule	Martin Willi	2015-04-15	4	-0/+278
\|
*	aesni: Use 4-way parallel AES-NI instructions for CBC decryption	Martin Willi	2015-04-15	1	-66/+314
\| \| \| \| \| \| \|	CBC decryption can be parallelized, and we do so by queueing instructions to the processor pipeline. While we have enough registers for 128-bit decryption, the register count is insufficient to hold all variables with larger key sizes. Nonetheless is 4-way parallelism faster, roughly by ~8%.
*	aesni: Use separate en-/decryption CBC code paths for different key sizes	Martin Willi	2015-04-15	1	-22/+290
\| \| \| \| \| \|	This allows us to unroll loops, and use local (register) variables for the key schedule. This improves performance slightly for encryption, but a lot for reorderable decryption (>30%).
*	aesni: Implement a AES-NI based CBC crypter using the key schedule	Martin Willi	2015-04-15	4	-0/+293
\|
*	aesni: Implement 256-bit key schedule	Martin Willi	2015-04-15	1	-0/+77
\|
*	aesni: Implement 192-bit key schedule	Martin Willi	2015-04-15	1	-0/+81
\|
*	aesni: Implement 128-bit key schedule	Martin Willi	2015-04-15	1	-0/+45
\|
*	aesni: Add a common key schedule class for AES	Martin Willi	2015-04-15	3	-0/+165
\|
*	aesni: Provide a plugin stub for AES-NI instruction based crypto primitives	Martin Willi	2015-04-15	3	-0/+141
\|
*	test-vectors: Add some self-made additional AES-GCM test vectors	Martin Willi	2015-04-15	2	-0/+157
\| \| \| \| \|	We missed test vectors for 192/256-bit key vectors for ICV8/12, and should also have some for larger associated data chunk.
*	test-vectors: Define some additional CCM test vectors	Martin Willi	2015-04-15	2	-1/+84
\| \| \| \| \| \|	We don't have any where plain or associated data is not a multiple of the block size, but it is likely to find bugs here. Also, we miss some ICV12 test vectors using 128- and 192-bit key sizes.
*	crypto-tester: Use the plugin feature key size to benchmark crypters/aeads	Martin Willi	2015-04-15	1	-0/+2
\| \| \| \| \| \|	We previously didn't pass the key size during algorithm registration, but this resulted in benchmarking with the "default" key size the crypter uses when passing 0 as key size.
*	utils: Use chunk_equals_const() for all cryptographic purposes	Martin Willi	2015-04-14	4	-4/+4
\|
*	utils: Use memeq_const() for all cryptographic purposes	Martin Willi	2015-04-14	4	-6/+5
\|
*	rdrand: Reuse CPU feature detection to check for RDRAND instructions	Martin Willi	2015-04-13	1	-51/+4
\|
*	padlock: Reuse common CPU feature detection to check for Padlock features	Martin Willi	2015-04-13	1	-80/+17
\|
*	sqlite: Use our locking mechanism also when sqlite3_threadsafe() returns 0	Martin Willi	2015-04-13	1	-7/+20
\| \| \| \| \| \|	We previously checked for older library versions without locking support at all. But newer libraries can be built in single-threading mode as well, where we have to care about the locking.
*	sqlite: Show SQLite library version and thread safety flag during startup	Martin Willi	2015-04-13	1	-1/+8
\|