summaryrefslogtreecommitdiff
path: root/raw-wiki-dump/GitRepositories%2Fsw%2Flibhal
blob: 29132dca37d8cf0e874755bfccade75033af0259 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
{{{
#!htmlcomment

This page is maintained automatically by a script.  Don't modify this page by hand,
your changes will just be overwritten the next time the script runs.  Talk to your
Friendly Neighborhood Repository Maintainer if you need to change something here.

}}}

{{{
#!html
<h1>libhal</h1>

<h2>Overview</h2>

<p>This library combines a set of low-level API functions which talk to
the Cryptech FPGA cores with a set of higher-level functions providing
various cryptographic services.</p>

<p>There's some overlap between the low-level code here and the low-level
code in core/platform/novena, which will need sorting out some day,
but at the time this library forked that code, the
core/platform/novena code was all written to support a test harness
rather than a higher-level API.</p>

<p>Current contents of the library:</p>

<ul>
<li><p>Low-level I/O code (FMC, EIM, and I2C).</p></li>
<li><p>An implementation of AES Key Wrap using the Cryptech AES core.</p></li>
<li><p>An interface to the Cryptech CSPRNG.</p></li>
<li><p>An interface to the Cryptech hash cores, including HMAC.</p></li>
<li><p>An implementation of PBKDF2.</p></li>
<li><p>An implementation of RSA, optionally using the Cryptech ModExp core.</p></li>
<li><p>An implementation of ECDSA, optionally using the Cryptech ECDSA base
point multiplier cores.</p></li>
<li><p>An implementation of HSS/LMS hash-based signatures.</p></li>
<li><p>An interface to the Master Key Memory interface core on the Cryptech
Alpha platform.</p></li>
<li><p>A simple keystore implementation with drivers for RAM and flash
storage on the Cryptech Alpha platform.</p></li>
<li><p>A remote procedure call (RPC) interface.</p></li>
<li><p>(Just enough) ASN.1 code to support a uniform interface to public
(SubjectPublicKeyInformation (SPKI)) and private (PKCS #8) keys.</p></li>
<li><p>A simple key backup mechanism, including a Python script to drive it
from the client side.</p></li>
<li><p>An RPC multiplexer to allow multiple clients (indepedent processes)
to talk to the Cryptech Alpha at once.</p></li>
<li><p>Client implenetations of the RPC mechanism in both C and Python.</p></li>
<li><p>Test code for all of the above.</p></li>
</ul>

<p>Most of these are fairly well self-contained, although the PBKDF2
implementation uses the hash-core-based HMAC implementation with
fallback to a software implementation if the cores aren't available.</p>

<p>The major exceptions are the RSA and ECDSA implementations, which uses
an external bignum implementation (libtfm) to handle a lot of the
arithmetic.  In the long run, much or all of this may end up being
implemented in Verilog, but for the moment all of the RSA math except
for modular exponentiation is happening in software, as is all of the
math for ECDSA verification; ECDSA math for key generation and signing
on the P-256 and P-384 curves is done in the ECDSA base point
multiplier cores when those are available.</p>

<h2>RSA</h2>

<p>The RSA implementation includes a compile-time option to bypass the
ModExp core and do everything in software, because the ModExp core is
a tad slow at the moment (others are hard at work fixing this).</p>

<p>The RSA implementation includes optional blinding (enabled by default).</p>

<h2>ECDSA</h2>

<p>The ECDSA implementation is specific to the NIST prime curves P-256,
P-384, and P-521.</p>

<p>The ECDSA implementation includes a compile-time option to allow test
code to bypass the CSPRNG in order to test against known test vectors.
Do <strong>NOT</strong> enable this in production builds, as ECDSA depends on good
random numbers not just for private keys but for individual
signatures, and an attacker who knows the random number used for a
particular signature can use this to recover the private key.
Arguably, this option should be removed from the code entirely.</p>

<p>The ECDSA software implementation attempts to be constant-time, to
reduce the risk of timing channel attacks.  The algorithms chosen for
the point arithmetic are a tradeoff between speed and code complexity,
and can probably be improved upon even in software; reimplementing at
least the field arithmetic in hardware would probably also help.
Signing and key generation performance is significantly better when
the ECDSA base point multiplier cores are available.</p>

<p>The point addition and point doubling algorithms in the current ECDSA
software implementation come from the <a href="http://www.hyperelliptic.org/EFD/g1p/auto-shortw-jacobian-3.html">EFD</a>.  At least at the
moment, we're only interested in ECDSA with the NIST prime curves, so
we use algorithms optimized for a=-3.</p>

<p>The point multiplication algorithm is a straightforward double-and-add
loop, which is not the fastest possible algorithm, but is relatively
easy to confirm by inspection as being constant-time within the limits
imposed by the NIST curves.  Point multiplication could probably be
made faster by using a non-adjacent form (NAF) representation for the
scalar, but the author doesn't understand that well enough to
implement it as a constant-time algorithm.  In theory, changing to a
NAF representation could be done without any change to the public API.</p>

<p>Points stored in keys and curve parameters are in affine format, but
point arithmetic is performed in Jacobian projective coordinates, with
the coordinates themselves in Montgomery form; final mapping back to
affine coordinates also handles the final Montgomery reduction.</p>

<h2>Hash-Based Signatures</h2>

<p>A hashsig private key is a Merkle tree of one-time signing keys, which can
be used to sign a finite number of messages. Since they don't rely on
"hard math" for security, hashsig schemes are quantum-resistant.</p>

<p>This hashsig code is a clean-room implementation of draft-mcgrew-hash-sigs.
It has been shown to interoperate with the Cisco reference code (each can
verify the other's signatures).</p>

<p>Following the recommendations of the draft, we only store the topmost hash
tree (the "root tree") in the token keystore; lower-level trees are stored
in the volatile keystore, and are regenerated upon a system restart.</p>

<p>This implementation has limitations on the number of keys, size of OTS
keys, and size of signatures, because of the design of the keystore and of
the RPC mechanism:</p>

<ol>
<li>The token keystore is a fairly small flash, partitioned into 2048
8096-byte blocks. Therefore, we can't support LMS algorithm types &gt;
lms_sha256_n32_h10 (a.k.a. h=10, or 1024 keys per tree). In this case,
keygen will return HAL_ERROR_NO_KEY_INDEX_SLOTS.</li>
</ol>

<p>Additionally, the 8KB key storage size means that we can't support LM-OTS
algorithm type lmots_sha256_n32_w1, which has an OTS key size of 8504
bytes. In this case, keygen will return HAL_ERROR_UNSUPPORTED_KEY.</p>

<ol>
<li><p>The volatile keystore is currently limited to 1280 keys, so only 2
levels at h=10, but more levels at h=5. One could easily increase the size
of the volatile keystore, but L=2/h=10 gives us a key that can sign 1M
messages, which is sufficient for development and testing purposes.</p></li>
<li><p>The RPC mechanism currently limits request and response messages to
16KB, so we can't generate or verify signatures greater than that size.
In this case, keygen will return HAL_ERROR_UNSUPPORTED_KEY.</p></li>
</ol>

<p>Because the hashsig private key consists of a large number of one-time
signing keys, and because only the root tree is stored in flash, it can
take several minutes to reconstruct the full tree on system restart.
During this time, attempts to generate a hashsig key, delete a hashsig
key, or sign with a hashsig key will return HAL_ERROR_NOT_READY.</p>

<p>A hashsig private key can sign at most 2^(L*h) messages. (System restarts
will cause the lower-level trees to be regenerated, which will need to be
signed with by the root tree, so frequent restarts will rapidly exhaust
the root tree.) When a hashsig key is exhausted, any attempt to use it for
signing will return HAL_ERROR_HASHSIG_KEY_EXHAUSTED.</p>

<h2>Keystore</h2>

<p>The keystore is basically a light-weight database intended to be run
directly over some kind of block-access device, with an internal
low-level driver interface so that we can use the same API for
multiple keystore devices (eg, flash for "token objects" and RAM for
"session objects", in the PKCS #11 senses of those terms).</p>

<p>The available storage is divided up into "blocks" of a fixed size; for
simplicity, the block size is a multiple of the subsector size of the
flash chip on the Alpha platform, since that's the minimum erasable
unit.  All state stored in the keystore itself follows the conventions
needed for flash devices, whether the device in question is flash or
not.  The basic rule here is that one can only clear bits, never set
them: the only way to set a bit is to erase the whole block and start
over.  So blocks progress from an initial state ("erased") where all
bits are set to one, through several states where the block contains
useful data, and ending in a state where all bits are set to zero
("zeroed"), because that's the way that flash hardware works.</p>

<p>The keystore implementation also applies a light-weight form of wear
leveling to all keystore devices, whether they're flash devices or
not.  The wear-leveling mechanism is not particularly sophisticated,
but should suffice.  The wear-leveling code treats the entirety of a
particular keystore device as a ring buffer of blocks, and keeps track
of which blocks have been used recently by zeroing blocks upon freeing
them rather than erasing them immediately, while also always keeping
the block at the current head of the free list in the erased state.
Taken together, this is enough to recover location of the block at the
head of the free list after a reboot, which is sufficient for a
round-robin wear leveling strategy.</p>

<p>The block format includes a field for a CRC-32 checksum, which covers
the entire block except for a few specific fields which need to be
left out.  On reboot, blocks with bad CRC-32 values are considered
candidates for reuse, but are placed at the end of the free list,
preserve their contents for as long as possible in case the real
problem is a buggy firmware update.</p>

<p>At the moment, the decision about whether to use the CRC-32 mechanism
is up to the individual driver: the flash driver uses it, the RAM
driver (which never stores anything across reboots anyway) does not.</p>

<p>Since the flash-like semantics do not allow setting bits, updates to a
block always consist of allocating a new block and copying the
modified data.  The keystore code uses a trivial lock-step protocol
for this: first:</p>

<ol>
<li>The old block is marked as a "tombstone";</li>
<li>The new block (with modified data) is written;</li>
<li>The old block is erased.</li>
</ol>

<p>This protocol is deliberately as simple as possible, so that there is
always a simple recovery path on reboot.</p>

<p>Active blocks within a keystore are named by UUIDs.  With one
exception, these are always type-4 (random) UUIDs, generated directly
from output of the TRNG.  The one exception is the current PIN block,
which always uses the reserved all-zeros UUID, which cannot possibly
conflict with a type-4 UUID (by definition).</p>

<p>The core of the keystore mechanism is the <code>ks-&gt;index[]</code> array, which
contains nothing but a list of block numbers.  This array is divided
into two parts: the first part is the index of active blocks, which is
kept sorted (by UUID); the second part is the round-robin free list.
Everything else in the keystore is indexed by these block numbers,
which means that the index array is the only data structure which the
keystore code needs to sort or rotate when adding, removing, or
updating a block.  Because the block numbers themselves are small
integers, the index array itself is small enough that shuffling data
within it using <code>memmove()</code> is a relatively cheap operation, which in
turn avoids a lot of complexity that would be involved in managing
more sophisticated data structures.</p>

<p>The keystore code includes both caching of recently used keystore
blocks (to avoid unnecessary flash reads) and caching of the location
of the block corresponding to a particular UUID (to avoid unnecessary
index searches).  Aside from whatever direct performance benefits this
might bring, this also frees the pkey layer that sits directly on top
of the keystore code from needing to keep a lot of active state on
particular keystore objects, which is important given that this whole
thing sits under an RPC protocol driven by a client program which can
impose arbitrary delays between any two operations at the pkey layer.</p>

<h2>Key backup</h2>

<p>The key backup mechanism is a straightforward three-step process,
mediated by a Python script which uses the Python client
implementation of the RPC mechanism.  Steps:</p>

<ol>
<li><p>Destination HSM (target of key transfer) generates an RSA keypair,
exports the public key (the "key encryption key encryption key" or
"KEKEK").</p></li>
<li><p>Source HSM (origin of the key transfer) wraps keys to be backed up
using AES keywrap with key encryption keys (KEKs) generated by the
TRNG; these key encryption keys are in turn encrypted with RSA
public key (KEKEK) generated by the receipient HSM.</p></li>
<li><p>Destination HSM receives wrapped keys, unwraps the KEKs using the
KEKEK then unwraps the wrapped private keys.</p></li>
</ol>

<p>Transfer of the wrapped keys between the two HSMs can be by any
convenient mechanism; for simplicity, <code>cryptech_backup</code> script bundles
everything up in a text file using JSON and Base64 encoding.</p>

<h2>Multiplexer daemon</h2>

<p>While the C client library can be built to talk directly to the
Cryptech Alpha board, in most cases it is more convenient to use the
<code>cryptech_muxd</code> multiplexer daemon, which is now the default.  Client
code talks to <code>cryptech_muxd</code> via a <code>PF_UNIX</code> socket; <code>cryptech_muxd</code>
handles interleaving of messages between multiple clients, and also
manages access to the Alpha's console port.</p>

<p>The multiplexer requires two external Python libraries, Tornado
(version 4.0 or later) and PySerial (version 3.0 or later).</p>

<p>In the long run, the RPC mechanism will need to be wrapped in some
kind of secure channel protocol, but we're not there yet.</p>

<h2>API</h2>

<p>Yeah, we ought to document the API, Real Soon Now, perhaps using
<a href="http://www.doxygen.org/">Doxygen</a>.  For the moment, see the function prototypes in hal.h,
the Python definitions in cryptech.libhal, and and comments in the
code.</p>
}}}

[[RepositoryIndex(format=table,glob=sw/libhal)]]

|| Clone `https://git.cryptech.is/sw/libhal.git` ||