Age | Commit message (Collapse) | Author |
|
Generating new RSA blinding factors turns out to be relatively
expensive, but we can amortize that cost by maintaining a small cache
and simply mutating old values after each use with a cheaper
operation. Squaring works, pretty much by definition.
Blinding factors are only sort-of-sensitive: we don't want them to
leak out of the HSM, but they're only based on the public modulus, not
the private key components, and we're only using them to foil side
channel attacks, so the risk involved in caching them seems small.
For the moment, the cache is very small, since we only care about this
for bulk signature operations. Tune this later if it becomes an issue.
|
|
|
|
hal_ks_fetch() was written as lock-at-the-top, unlock-at-the-bottom to
keep it as simple as possible, but this turns out to have bad
performance implications when unwrapping the key is slow. So now we
grab the wrapped key, release the lock, then unwrap, which should be
safe enough given that hal_ks_fetch() is read-only. This lets us make
better use of multiple AES cores to unwrap in parallel when we have
multiple active clients.
|
|
|
|
|
|
Copy ContextManagedUnpacker from latest version of libhal.py so that
this script won't depend on the current development code.
|
|
At the moment this only handles RSA keys, and can only handle one size
of key at a time. More bells and whistles will follow eventually,
now that the basic asynchronous API to our RPC protocol works.
|
|
Failing to clear the temporary buffer used to transfer bits from the
TRNG into a bignum was a real leak of something very close to keying
material, albeit only onto the local stack where it was almost certain
to have been overwritten by subsequent operations (generation of other
key components, wrap and PKCS #8 encoding) before pkey_generate_rsa()
ever returned to its caller. Still, bad coder, no biscuit.
Failing to clear the remainders array was probably harmless, but
doctrine says clear it anyway.
|
|
contextlib is cute, but incompatible with other coroutine schemes like
Tornado, so just write our own context manager for xdrlib.Unpacker.
|
|
Uncoordinated attempts to allocate two modexpa7 cores leads to deadlock if
multiple clients try to do concurrent RSA signing operations.
The simplest solution (back off and retry) could theoretically lead to
resource starvation, but we haven't seen it in actual testing.
|
|
This branch was sitting for long enough that master had been through a
cleanup pass, so beware of accidental reversions.
|
|
|
|
|
|
|
|
values.
|
|
|
|
|
|
|
|
|
|
Snapshot of mostly but not entirely working code to include the extra
ModExpA7 key components in the keystore. Need to investigate whether
a more compact representation is practical for these components, as
the current one bloats the key object so much that a bare 4096-bit key
won't fit in a single hash block, and there may not be enough room for
PKCS #11 attributes even for smaller keys.
If more compact representation not possible or insufficient, the other
option is to double the size of a keystore object, making it two flash
subsectors for a total of 8192 octets. Which would of course halve
the number of keys we can store and require a bunch of little tweaks
all through the ks code (particularly flash erase), so definitely
worth trying for a more compact representation first.
|
|
|
|
|
|
|
|
|
|
|
|
Work in progress. Probably won't even compile, much less run.
Requires corresponding new core/math/modexpa7 core.
No support (yet) for ASN.1 encoding of speedup factors or storage of
same in keystore.
No support (yet) for running CRT algorithm in parallel cores.
Minor cleanup of ancient bus I/O code, including EIM and I2C bus code
we'll probably never use again.
|
|
structure.
When running multiple concurrent unit tests, I observed multiple failures
in the hmac tests, which I ultimately tracked down to different clients
sharing the same hal_hmac_state struct.
hal_hash_initialize is called twice in hal_hmac_initialize (once to get
the state structure, then again if the supplied key is too long), and is
called in hal_hmac_finalize, to hash the digest with the supplied key. In
these subsequent cases, the caller supplies the state structure, which
hal_hash_initialize zeroes, but it doesn't set the allocated flag. This
marks an in-use struct as available, so it gets reassigned and
reinitialized, and Bad Things Happen for both clients that are trying to
use it.
|
|
At least for now, the speed tradeoff between software ModExp and our
Verilog ModExp core differs significantly between signature and key
generation. We don't really know why, but since key generation does
not need to be constant time, we split out control over whether to use
the software or FPGA implementation, so that we can use the FPGA for
signature while using software for key generation.
Revisit this if and when we figure out what the bottleneck is, as well
as any time that the FPGA core itself changes significantly.
|
|
Trying to make RSA key generation run in constant time is probably
both futile and unnecessary, so we can speed it up a bit by switching
the ModExpA7 core to use "fast" mode rather than "constant time" mode.
Sadly, while this change produces a measureable improvement, it
doesn't bring FGPA ModExp anywhere near the speed of the software
equivalent in this case. Don't really know why.
|
|
|
|
Initial version, very basic, RSA-only. Gussy up later.
|
|
|
|
Algorithm suggested by a note in Handbook of Applied Cryptography,
motivated by profiling of libtfm fp_isprime() function showing
something close to 50% of CPU time spent running Montgomery reductions
in the small primes test, before we even get to Miller-Rabin.
|
|
|
|
|
|
|
|
|
|
|
|
Except for torture tests, we never really used the hideously complex
multi-block capabilities of the ksng version of the flash keystore,
among other reasons because the only keys large enough to trigger the
multi-block code were slow enough to constitute torture on their own.
So we can preserve backwards compatabliity simply by including the
former *chunk fields (renamed legacy* here) in the CRC and checking
for the expected single-block key values. We probably want to include
everything in the CRC in any case except when there's an explicit
reason omit something, so, this is cheap, just a bit obscure.
At some point in the future we can phase out support for the backwards
compatible values, but there's no particular hurry about it unless we
want to reuse those fields for some other purpose.
|
|
cryptech_backup is designed to help the user transfer keys from one
Cryptech HSM to another, but what is is a user who has no second HSM
supposed to do for backup? The --soft-backup option enables a mode in
which cryptech_backup generates its own KEKEK instead of getting one
from the (nonexistent) target HSM. We make a best-effort attempt to
keep this soft KEKEK secure, by wrapping it with a symmetric key
derived from a passphrase, using AESKeyWrapWithPadding and PBKDF2,
but there's a limit to what a software-only solution can do here.
The --soft-backup code depends (heavily) on PyCrypto.
|
|
cryptech_backup is designed to help the user transfer keys from one
Cryptech HSM to another, but what is is a user who has no second HSM
supposed to do for backup? The --soft-backup option enables a mode in
which cryptech_backup generates its own KEKEK instead of getting one
from the (nonexistent) target HSM. We make a best-effort attempt to
keep this soft KEKEK secure, by wrapping it with a symmetric key
derived from a passphrase, using AESKeyWrapWithPadding and PBKDF2,
but there's a limit to what a software-only solution can do here.
The --soft-backup code depends (heavily) on PyCrypto.
|
|
We were XORing the low 32 bits of R[0] instead of the full 64 bits.
Makes no difference for small values of n, so we never detected it.
|
|
The HSM itself should be detecting carrier drop on its RPC port, but I
haven't figured out where the DCD bit is hiding in the STM32 UART API,
and the muxd has to be involved in this in any case, since only the
muxd knows when an individual client connection has dropped. So, for
the moment, we handle all of this in the muxd.
|
|
Most keystore methods already followed this rule, but hal_ks_*_init()
and hal_ks_*_logout() were confused, in different ways.
|
|
|
|
The internal keystore API has changed enough since where the "logout"
branch forked that a plain merge would have no prayer of compiling,
must less running. So this merge goes well beyond manual conflict
resolution: it salvages the useful code from the "logout" branch, with
additional code as needed to reimplement the functionality. Sorry.
|
|
|
|
|
|
Cosmetic cleanup of pkey_slot along the way.
|
|
|