aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPavel V. Shatov (Meister) <meisterpaul1@yandex.ru>2017-08-12 00:53:33 +0300
committerPavel V. Shatov (Meister) <meisterpaul1@yandex.ru>2017-08-12 00:53:33 +0300
commitae65d3f7941716854352cee9a4aec6f71c67105f (patch)
treee72102650bf5dcabb416c368db737e62ac2c6625
parentfc1c4fcdc95bf85b71f778a941e631fc573db0c3 (diff)
Added some info to the README file.
-rw-r--r--README.md35
-rw-r--r--src/tb/tb_systolic_multiplier.v2
2 files changed, 32 insertions, 5 deletions
diff --git a/README.md b/README.md
index 8abc6bc..35532d7 100644
--- a/README.md
+++ b/README.md
@@ -10,12 +10,17 @@ The core has two synthesis-time parameters:
* **OPERAND_ADDR_WIDTH** - Sets the _largest supported_ operand width. This affects the amount of block memory that is reserved for operand storage. Largest operand width in bits, that the core can handle is 32 * (2 ** OPERAND_ADDR_WIDTH). If the largest possible modulus is 1024 bits long, set OPERAND_ADDR_WIDTH = 5. For 2048-bit moduli support set OPERAND_ADDR_WIDTH = 6, for 4096-bit capable core set OPERAND_ADDR_WIDTH = 7 and so on.
- * **SYSTOLIC_ARRAY_POWER** - Determines the number of processing elements in the internal systolic array, total number of elements is 2 ** SYSTOLIC_ARRAY_POWER. This affects the number of DSP slices dedicated to parallelized multiplication. Allowed values are 1..OPERAND_ADDR_WIDTH-1, higher values produce higher performance core at the cost of higher device utilization.
+ * **SYSTOLIC_ARRAY_POWER** - Determines the number of processing elements in the internal systolic array, total number of elements is 2 ** SYSTOLIC_ARRAY_POWER. This affects the number of DSP slices dedicated to parallelized multiplication. Allowed values are 1..OPERAND_ADDR_WIDTH-1, higher values produce higher performance core at the cost of higher device utilization. The number of used DSP slices is NUM_DSP = 10 + 2 * (2 + 7 * (2 ** SYSTOLIC_ARRAY_POWER)). Here's a quick reference table:
----
-TODO: Give device utilization numbers for different values of SYSTOLIC_ARRAY_POWER.
-
----
+| SYSTOLIC_ARRAY_POWER | NUM_DSP |
+|----------------------|---------|
+| 1 | 42 |
+| 2 | 70 |
+| 3 | 126 |
+| 4 | 238 |
+| 5 | 462 |
+
+Given that Alpha board FPGA has 740 DSP slices, SYSTOLIC_ARRAY_POWER=5 is the largest possible setting. Note that if two cores are needed (eg. to do the two easier CRT exponentiations simultaneously), this parameter should be reduced to 4 to fit two cores into the device.
## API Specification
@@ -56,6 +61,26 @@ Read-only register bits:
[0] "ready" control bit
The "valid" status bit is cleared as soon as the core starts exponentiation, and gets set after the operation is complete. The "ready" status bit is cleared when the core starts precomputation and is set after the speed-up coefficient is precalculated.
+ * **MODE**
+Mode register bits:
+[31:2] Don't care, always read as 0
+[1] "CRT enable" control bit
+[0] Don't care, always read as 0
+The "CRT enable" control bit allows the core to take advantage of the Chinese remainder theorem to speed up RSA operations. When the CRT mode is disabled (MODE[1] = 0), the message (base) is assumed to be as large as the modulus. When the CRT mode is enabled (MODE[1] = 1), the message is assumed to be twice larger than the modulus and the core will reduce it before starting the exponentiation. Note that if the core was compiled for eg. 4096-bit operands (OPERAND_ADDR_WIDTH=7), it can only handle up to 2048-bit moduli in CRT mode. When singing using eg. 4096-bit public key without CRT, the modulus length must be set to 4096. When signing using the same 4096-bit public key with CRT, modulus length must be set to 2048.
+
+* **MODULUS_BITS**
+Length of modulus in bits, must be a multiple of 32. Smallest allowed value is 64, largest allowed value is 32 * (2 ** OPERAND_ADDR_WIDTH). If the modulus is eg. 1000 bits wide, it must be prepended with 24 zeroes to make it contain an integer number of 32-bit words.
+
+* **EXPONENT_BITS**
+Length of exponent in bits. Smallest allowed value is 2, largest allowed value is 32 * (2 ** OPERAND_ADDR_WIDTH).
+
+* **BUFFER_BITS**
+Length of operand buffer in bits. This read-only parameter returns the length of internal operand buffer and allows the largest supported operand width to be determined at run-time.
+
+* **ARRAY_BITS**
+Length of systolic array in bits. This read-only parameter returns the length of internal systolic multiplier array, it allows SYSTOLIC_ARRAY_POWER compile-time setting to be determined at run-time.
+
+
The second part of the address space contains four operand banks.
Length of each bank (BANK_LENGTH) depends on the largest supported operand width: 0x80 bytes for 1024-bit core (OPERAND_ADDR_WIDTH = 5), 0x100 bytes for 2048-bit core (OPERAND_ADDR_WIDTH = 6), 0x200 bytes for 4096-bit core (OPERAND_ADDR_WIDTH = 7) and so on.
diff --git a/src/tb/tb_systolic_multiplier.v b/src/tb/tb_systolic_multiplier.v
index 96e76d5..f476aa4 100644
--- a/src/tb/tb_systolic_multiplier.v
+++ b/src/tb/tb_systolic_multiplier.v
@@ -162,6 +162,8 @@ module tb_systolic_multiplier;
.ena (ena),
.rdy (rdy),
+ .reduce_only (1'b0),
+
.a_bram_addr (core_a_addr),
.b_bram_addr (core_b_addr),
.n_bram_addr (core_n_addr),