Age | Commit message (Collapse) | Author |
|
key export/import (kekek = none, kek_len = 0), rather than separate RPCs.
|
|
Exported keys are wrapped with the MKM KEK, not a transit KEK, and can
only be imported back to the same HSM.
The idea is to support operators who have more keys than will fit on the
HSM, so they will cycle keys into and out of the HSM as needed.
NOTE that hashsig is, as always, special. The hashsig key has an internal
index that is updated on every signature. To prevent a hashsig key from
being re-imported with an old index (which would compromise the security
of the key), the hashsig key is disabled on export, and must be deleted
from the HSM before being re-imported.
|
|
Move lm[ot]s_algorithm_t definitions to hal.h, prefix all public symbols with 'hal_'.
Remove some unused functions.
Wrap hal_pkey_slot_t initializers in an extra set of curly braces.
Remove an unused-argument kludge (x=x;) because gcc doesn't care, and clang complains.
Make timersub a proper macro.
Add some casts to printf arguments, because !@#$ printf formats.
|
|
|
|
Various fixes extracted from the abandoned(-for-now?) reuse-cores
branch, principally:
* Change hal_core_alloc*() to support core reuse and to pick the
least-recently-used core of a particular type otherwise;
* Replace assert() and printf() calls with hal_assert() and hal_log(),
respectively. assert() is particularly useless on the HSM, since it
sends its error message into hyperspace then hangs the HSM.
|
|
Generating new RSA blinding factors turns out to be relatively
expensive, but we can amortize that cost by maintaining a small cache
and simply mutating old values after each use with a cheaper
operation. Squaring works, pretty much by definition.
Blinding factors are only sort-of-sensitive: we don't want them to
leak out of the HSM, but they're only based on the public modulus, not
the private key components, and we're only using them to foil side
channel attacks, so the risk involved in caching them seems small.
For the moment, the cache is very small, since we only care about this
for bulk signature operations. Tune this later if it becomes an issue.
|
|
|
|
|
|
|
|
|
|
|
|
Snapshot of mostly but not entirely working code to include the extra
ModExpA7 key components in the keystore. Need to investigate whether
a more compact representation is practical for these components, as
the current one bloats the key object so much that a bare 4096-bit key
won't fit in a single hash block, and there may not be enough room for
PKCS #11 attributes even for smaller keys.
If more compact representation not possible or insufficient, the other
option is to double the size of a keystore object, making it two flash
subsectors for a total of 8192 octets. Which would of course halve
the number of keys we can store and require a bunch of little tweaks
all through the ks code (particularly flash erase), so definitely
worth trying for a more compact representation first.
|
|
|
|
The internal keystore API has changed enough since where the "logout"
branch forked that a plain merge would have no prayer of compiling,
must less running. So this merge goes well beyond manual conflict
resolution: it salvages the useful code from the "logout" branch, with
additional code as needed to reimplement the functionality. Sorry.
|
|
Cosmetic cleanup of pkey_slot along the way.
|
|
Need to refactor init sequence slightly (again), this time to humor
the bootloader, which has its own special read-only view of the PIN
block in the token keystore.
|
|
Still not yet expected to compile, much less run, but getting closer.
|
|
|
|
|
|
The Novena-era mmap()-based keystore is far enough out of date that
it's not worth maintaining (and we haven't been doing so): if we ever
need one again, it would be easier to rewrite it from scratch.
|
|
Support for variable-length keystore objects significantly complicates
the keystore implementation, including serious some serious code bloat
and a complex recovery algorithm to deal with crashes or loss of power
at exactly the wrong time. Perhaps we don't really need this?
So this is an experiment to see whether we can replace variable-length
keystore objects with fixed-length, perhaps with a compile time option
to let us make the fixed object length be 8192 bytes instead of 4096
bytes when needed to hold things like large RSA keys.
First pass on this is just throwing away nearly 1,000 lines of
excessively complex code. The result probably won't even compile yet,
but it's already significantly easier to read.
|
|
|
|
Consistent user complaints about HSM login taking too long.
Underlying issue has both superficial and fundamental causes.
Superficial: Our PBKDF2 implementation is slow. We could almost
certainly make it faster by taking advantage of partial
pre-calculation (see notes in code) and by reenabling use of FPGA hash
cores when when checking passwords (which mgiht require linking the
bootloader against a separate libhal build to avoid chicken-and-egg
problem of needing FPGA to log into console to configure FPGA).
Fundamental: The PBKDF2 iteration counts we used to use (10,000
minimum, 20,000 default) are in line with current NIST
recommendations. The new, faster values (1,000 and 2,000,
respectively) are not, or, rather, they're in line with what NIST
recommended a decade ago. Well, OK, maybe the Coretex M4 is so slow
that it's living in the past, but still. The fundamental issue is
that anybody who can capture the encoded PIN can mount an offline
dictionary attack on it, so we'd like to make that expensive.
But the users are unhappy with the current behavior, so this change
falls back to the ancient technique of adding a delay (currently five
seconds, configurable at compile time) after a bad PIN, which makes it
painful to use the login function as an oracle but does nothing about
the offline dictionary attack problem.
Feh.
Note that users can still choose a higher iteration count, by setting
the iteration count via the console. It's just not the default out of
the box anymore.
|
|
|
|
What I get for writing code while build and test environment is tied
up with a multi-day run testing something else.
|
|
|
|
|
|
Static code analysis (Doxygen call graph) detected a low-probability
race condition which could have triggered a deadlock on the keystore
mutex if the mkmif code returns with an error like HAL_ERROR_CORE_BUSY
when we're trying to fetch the KEK.
This is a knock-on effect of the awful kludge of backing up the KEK in
the keystore flash as an alternative to powering the MKM with a
battery as called for in the design. This code path should not exist
at all, but, for now, we avoid the deadlock by making it the caller's
responsibility to grab the keystore mutex before looking up the KEK.
|
|
|
|
pkey_open() now looks in both keystores rather than requiring the user
to know. The chance of collision with randomly-generated UUID is low
enough that we really ought to be able to present a single namespace.
So now we do.
pkey_match() now takes a couple of extra arguments which allow a
single search to cover both keystores, as well as matching for
specific key flags. The former interface was pretty much useless for
anything involving flags, and required the user to issue a separate
call for each keystore.
User wheel is now exempt from the per-session key lookup constraints,
Whether this is a good idea or not is an interesting question, but the
whole PKCS #11 derived per-session key thing is weird to begin with,
and having keystore listings on the console deliberately ignore
session keys was just too confusing.
|
|
Now that we use PKCS #8 format for private keys, all key formats we
use include ASN.1 AlgorithmIdentifier field describing the key, so
specifying key type and curve as arguments to hal_rpc_pkey_load() is
neither necessary nor particularly useful.
|
|
Still missing Python script to drive backup process, and need to do
something about setting the EXPORTABLE key flag for this to be useful.
|
|
|
|
|
|
Renamed multiplexer to cryptech_muxd, since it now handles both RPC and CTY.
Added new program cryptech_console to act as client for CTY multiplexer.
Might want to add console logging capability eventually, not today.
Probably want to incorporate UART probing (what cryptech_probe does
now) eventually, also not today.
|
|
|
|
|
|
PKCS #11 supports zero-length attributes (eg, CKA_LABEL) so hack of
using zero length attribute as NIL value won't work, instead we use a
slightly more portable version of the hack PKCS #11 uses (PKCS #11
stuffs -1 into a CK_ULONG, we stuff 0xFFFFFFFF into a uint32_t).
ks_attribute.c code was trying too hard and tripping over its own
socks. Instead of trying to maintain attributes[] in place during
modification, we now perform the minimum necessary change then re-scan
the block. This is (very slightly) slower but more robust, both
because the scan code has better error checking and because it's the
scan code that we want to be sure is happy before committing a change.
Rename hal_rpc_pkey_attribute_t to hal_pkey_attribute_t.
|
|
|
|
Calling hal_rpc_pkey_get_attributes() with attribute_buffer_len = 0
now changes the return behavior so that it reports the lengths of
attributes listed in the query, with a length of zero for attributes
not present at all. This is mostly to support C_GetAttributeValue()
in PKCS #11, but we also use it to make the Python interface a bit
kinder to the user.
|
|
Wiping the keystore flash requires reinitializing the keystore, but we
don't want to allocate new static memory when we do this.
|
|
hal_rpc_pkey_list() was a simplistic solution that worked when the
keystore only supported a handful of keys and we needed a quick
temporary solution in time for a workshop. It doesn't handle large
numbers of keys well, and while we could fix that, all of its
functionality is now available via more robust API functions, so
simplifying the API by deleting it seems best.
Since this change required mucking with dispatch vectors yet again, it
converts them to use C99 "designated initializer" syntax.
|
|
|
|
This check made sense when attribute deletion was a separate
operation, but now that it has been folded into set_attributes(), this
check was worse than useless.
|
|
pkey attribute API is now just set_attributes() and get_attributes().
|
|
Passes minimal unit-testing and the same minimal tests report that it
does deliver the desired performance speed-up. More testing and much
cleanup still needed.
Attribute API not quite stable yet, we're probably going to want to
remove all the singleton attribute operations from the RPC protocol,
and it turns out that ks_delete_attributes() has enough code in common
with ks_set_attributes() that it makes more sense to handle the former
as a special case of the latter.
|
|
This is not yet complete, only the ks_volatile driver supports it,
ks_flash will be a bit more complicated and isn't written yet.
At the moment, this adds a complete duplicate set of
{set,get,delete}_attributes() functions in parallel to the earlier
{set,get,delete}_attribute() functions. We will almost certainly want
to get rid of the duplicates, probably (but not necessarily) the
entire single-attribute suite. At the moment, though, we want both
sets so we can compare execution speeds of the two sets of functions.
|
|
Incidental minor refactoring of hal_rpc_server_dispatch().
|
|
The debugging code was for tracking down what turned out to be a race
condition in the Alpha's flash driver code (see sw/stm32); much of
this was temporary, and will be removed in a (near) future commit, but
some of the techniques were useful and belong in the repository in
case we need to pull them back for something similar in the future.
hal_ks_index_fsck() attempts to diagnose all the things I found wrong
in the ks_flash index after one long series of errors. As presently
written, it doesn't attempt to fix anything, just diagnose errors: the
intent is that we can call this, before and after every modification
if necessary, to poinpoint exactly which calls introduce errors. Once
things stablize a bit, we may want to crank down the number of calls
to this (it's a bit expensive, since it checks the entire index), and
perhaps add the ability to clean up whatever errors it might find; the
latter might be a good candidate for a CLI command.
|
|
In retrospect it's obvious that this never needed to be an
input/output argument, as its value will always be the same as the
last value in the returned array. Doh. So simplify the RPC and call
sequence slightly by removing the unnecessary output value.
|