| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some tokens/libraries seem to prefix all numbers with zero bytes even
if not necessary (e.g. the default exponent 0x010001). If we don't fix
that, the fingerprints calculated based on the retrieved values will be
incorrect.
Even if the pkcs1 plugin can properly handle numbers that are not in
two's complement since a81bd670b086 ("Added PUBKEY_RSA_MODULUS
encoding type") we prefix them with zero if necessary as other encoders
might expect them in two's complement.
Fixes #1012.
|
| |
|
|
|
|
|
|
| |
This generalization allows the ring dimension n to be different
from the current n = 512 and allows kappa to be > 56. Also the
hash octets are consumed in a more consistent manner.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The c_indices derived from the SHA-512 random oracle consist of
nine bits (0..511). The leftmost 8 bits of each index are taken
on an octet-by-octet basis from the 56 leftmost octets of the
SHA-512 hash. The 9th bit needed for the LSB is taken from the
extra_bits 64 bit unsigned integer which consists of the 8 rightmost
octets of the SHA-512 hash (in network order). If more than 56
indices must be derived then additional rounds of the random oracle
are executed until all kappa c_indices have been determined.
The bug fix shifts the extra_bits value by one bit in each loop
iteration so that the LSB of each index is random. Also iterate
through the hash array using the loop variable j not the c_indices
variable i.
|
|
|
|
|
|
|
|
|
|
| |
By using a derived key r^2 we can improve performance, as we can do loop
unrolling and slightly better utilize SIMD instructions.
Overall ChaCha20-Poly1305 performance increases by ~12%.
Converting integers to/from our 5-word representation in SSE does not seem
to pay off, so we work on individual words.
|
|
|
|
|
| |
As we don't have to shuffle the state in each ChaCha round, overall performance
for ChaCha20-Poly1305 increases by ~40%.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We always build the driver on x86/x64, but enable it only if SSSE3 support
is detected during runtime.
Poly1305 uses parallel 32-bit multiplication operands yielding a 64-bit result,
for which two can be done in parallel in SSE. This is minimally faster than
multiplication with 64-bit operands, and also works on 32-bit builds not having
a __int128 result type.
On a 32-bit architecture, this is more than twice as fast as the portable
driver, and on 64-bit it is ~30% faster.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
While DES-ECB is not registered by the plugin in this case (so the
function will never actually be called), the compiler still warns
about the implicitly declared function.
|
|
|
|
|
|
| |
We don't actually define a vector, but only prototype the test vector
implemented in a different file. GCC uses the correct symbol during testing,
but clang correctly complains about duplicated symbols during linking.
|
| |
|
|
|
|
|
|
| |
The libgcrypt RNG implementation uses static buffer allocation which it does
not free. There is no symbol we can catch in leak-detective, hence we explicitly
initialize the RNG during the whitelisted gcrypt_plugin_create() function.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The performance impact is not measurable, as the compiler loads these variables
in xmm registers in unrolled loops anyway.
However, we avoid loading these sensitive keys onto the stack. This happens for
larger key schedules, where the register count is insufficient. If that key
material is not on the stack, we can avoid to wipe it explicitly after
crypto operations.
|
|
|
|
|
|
| |
While the required members are aligned in the struct as required, on 32-bit
platforms the allocator aligns the structures itself to 8 bytes only. This
results in non-aligned struct members, and invalid memory accesses.
|
|
|
|
|
| |
While associated data is usually not that large, in some specific cases
this can bring a significant performance boost.
|
|
|
|
| |
Increases performance by another ~30%.
|
|
|
|
| |
Increases overall performance by ~25%.
|
|
|
|
|
| |
This gives not much more than ~5% increase in performance, but allows us to
improve further.
|
| |
|
|
|
|
|
| |
Compared to the cmac plugin using AESNI-CBC as backend, this improves
performance of AES-CMAC by ~45%.
|
|
|
|
|
| |
Compared to the xcbc plugin using AESNI-CBC as backend, this improves
performance of AES-XCBC by ~45%.
|
|
|
|
| |
Due to the serial nature of the CBC mac, this brings only a marginal speedup.
|
| |
|
|
|
|
|
|
|
| |
CTR can be parallelized, and we do so by queueing instructions to the processor
pipeline. While we have enough registers for 128-bit decryption, the register
count is insufficient to hold all variables with larger key sizes. Nonetheless
is 4-way parallelism faster, depending on key size between ~10% and ~25%.
|
|
|
|
|
| |
This allows us to unroll loops and hold the key schedule in local (register)
variables. This brings an impressive speedup of ~45%.
|
| |
|
|
|
|
|
|
|
| |
CBC decryption can be parallelized, and we do so by queueing instructions
to the processor pipeline. While we have enough registers for 128-bit
decryption, the register count is insufficient to hold all variables with
larger key sizes. Nonetheless is 4-way parallelism faster, roughly by ~8%.
|
|
|
|
|
|
| |
This allows us to unroll loops, and use local (register) variables for the
key schedule. This improves performance slightly for encryption, but a lot
for reorderable decryption (>30%).
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
We missed test vectors for 192/256-bit key vectors for ICV8/12, and should
also have some for larger associated data chunk.
|
|
|
|
|
|
| |
We don't have any where plain or associated data is not a multiple of the block
size, but it is likely to find bugs here. Also, we miss some ICV12 test vectors
using 128- and 192-bit key sizes.
|
|
|
|
|
|
| |
We previously didn't pass the key size during algorithm registration, but this
resulted in benchmarking with the "default" key size the crypter uses when
passing 0 as key size.
|
| |
|