Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
outputs were going directry into a LUT-based ternary adder which was causing
timing problems. Added a layer of flip-flops, so instead of BRAM -> LUT -> FF
we have BRAM -> FF -> LUT -> FF. This increases core latency by
(number_of_supporting_modular_multiplications + number_of_exponent_bits) ticks.
|
|
|
|
|
|
- added core wrapper
- fixed module resets across entire core (all the resets are now consistently
active-low)
- continued refactoring
|
|
Moved micro-operations handler into a separate module file, this way we don't
have any synthesized stuff in the top-level module, just instantiations. This
is more consistent from the design partitioning point of view. Btw, Xilinx
claims their tools work better that way too, but who knows...
Added optional simulation-only code to assist debugging. Un-comment the
ENABLE_DEBUG `define in 'rtl/modexpng_parameters.vh' to use, but don't ever
try to synthesize the core with debugging enabled.
|
|
is basically
a block memory data mover, but it can also do some supporting operations required for the
Garner's formula part of the exponentiation.
|
|
there's
only one instance of input/output values, while storage manager has dual storage space
for P and Q multipliers).
Started working on microcoded layer, added input operation and modular multiplication.
|
|
have eight 4kbit entries and occupy one 36K BRAM tile.
|
|
|