| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
|
| |
|
| |
|
|
|
|
| |
Some build bots running make check seem to have longer for the DH testing.
|
|
|
|
|
|
| |
We don't actually define a vector, but only prototype the test vector
implemented in a different file. GCC uses the correct symbol during testing,
but clang correctly complains about duplicated symbols during linking.
|
| |
|
|
|
|
|
|
| |
We see any plugin startup messages during suite configuration, where
initialization is called once to query plugin features. No need to be verbose
and show these messages once again in the first test.
|
| |
|
|
|
|
|
|
| |
This allows us to show which transform from which plugin failed. Also, we use
the new cleanup handler functionality that allows proper deinitialization on
failure or timeout.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
If a test fails in a timeout or a test failure, longjmp() is used to restore
the thread context and handle test failure. However, there might be unreleased
resources, namely locks, which prevent the library to clean up properly after
finishing the test.
By using thread cleanup handlers, we can release any test subject internal or
test specific external resources on test failure. We do so by calling all
registered cleanup handlers.
|
| |
|
|
|
|
|
| |
This is called only by the thread for its own thread_t, and does not need
synchronization.
|
|
|
|
|
|
| |
The libgcrypt RNG implementation uses static buffer allocation which it does
not free. There is no symbol we can catch in leak-detective, hence we explicitly
initialize the RNG during the whitelisted gcrypt_plugin_create() function.
|
|
|
|
|
|
| |
gcry_check_version() does not free statically allocated resources. However,
we can't whitelist it in some versions, as it is not a resolvable symbol name.
Instead, whitelist our own plugin constructor function.
|
|
|
|
| |
This is often more convenient than specifying plugins in a configuration file.
|
|
|
|
|
| |
As we test DH calculations this now takes more time. If multiple DH backends
are enabled, we likely hit the default test timeout.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
This allows us to work with deterministic values for testing purposes.
|
|
|
|
|
|
|
|
|
|
| |
The performance impact is not measurable, as the compiler loads these variables
in xmm registers in unrolled loops anyway.
However, we avoid loading these sensitive keys onto the stack. This happens for
larger key schedules, where the register count is insufficient. If that key
material is not on the stack, we can avoid to wipe it explicitly after
crypto operations.
|
|
|
|
|
|
| |
While the required members are aligned in the struct as required, on 32-bit
platforms the allocator aligns the structures itself to 8 bytes only. This
results in non-aligned struct members, and invalid memory accesses.
|
| |
|
|
|
|
|
|
| |
If the assertion contains a modulo (%) operation, test_fail_msg() handles
this as printf() format specifier. Pass the assertion string as argument for
an explicit "%s" in the format string, instead.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While we could use posix_memalign(3), that is not fully portable. Further, it
might be difficult on some platforms to properly catch it in leak-detective,
which results in invalid free()s when releasing such memory.
We instead use a simple wrapper, which allocates larger data, and saves the
padding size in the allocated header. This requires that memory is released
using a dedicated function.
To reduce the risk of invalid free() when working on corrupted data, we fill up
all the padding with the padding length, and verify it during free_align().
|
|
|
|
|
| |
While associated data is usually not that large, in some specific cases
this can bring a significant performance boost.
|
|
|
|
| |
Increases performance by another ~30%.
|
|
|
|
| |
Increases overall performance by ~25%.
|
|
|
|
|
| |
This gives not much more than ~5% increase in performance, but allows us to
improve further.
|
| |
|
|
|
|
|
| |
Compared to the cmac plugin using AESNI-CBC as backend, this improves
performance of AES-CMAC by ~45%.
|
|
|
|
|
| |
Compared to the xcbc plugin using AESNI-CBC as backend, this improves
performance of AES-XCBC by ~45%.
|
|
|
|
| |
Due to the serial nature of the CBC mac, this brings only a marginal speedup.
|
| |
|
|
|
|
|
|
|
| |
CTR can be parallelized, and we do so by queueing instructions to the processor
pipeline. While we have enough registers for 128-bit decryption, the register
count is insufficient to hold all variables with larger key sizes. Nonetheless
is 4-way parallelism faster, depending on key size between ~10% and ~25%.
|
|
|
|
|
| |
This allows us to unroll loops and hold the key schedule in local (register)
variables. This brings an impressive speedup of ~45%.
|
| |
|
|
|
|
|
|
|
| |
CBC decryption can be parallelized, and we do so by queueing instructions
to the processor pipeline. While we have enough registers for 128-bit
decryption, the register count is insufficient to hold all variables with
larger key sizes. Nonetheless is 4-way parallelism faster, roughly by ~8%.
|
|
|
|
|
|
| |
This allows us to unroll loops, and use local (register) variables for the
key schedule. This improves performance slightly for encryption, but a lot
for reorderable decryption (>30%).
|
| |
|