Age | Commit message (Collapse) | Author |
|
(modular subtraction was split into three micro-operations instead of one).
|
|
The FSM previously had four states encoded using two bits, so the next state
logic didn't have a default case, since all the possible states were used.
Addition of the fifth state required one more state bit, so the FSM now has
five states out eight possible and a default case is thus necessary.
|
|
and DECODE. Apparently one clock cycle is not enough to entirely decode an
instruction, so decoding now takes two clock cycles (DECODE_1 and DECODE_2).
This seems to solve the problem. If we run into more timing violations here, we
can add an extra DECODE_3 cycle and register the currently combinatorial
uop_opcode_* flags at DECODE_2. This fix increases the core's latency by 59/32
clock cycles (CRT/non-CRT mode) plus two extra clock cycles per each bit of the
exponent.
|
|
|
|
|
|
- added core wrapper
- fixed module resets across entire core (all the resets are now consistently
active-low)
- continued refactoring
|
|
|
|
Moved micro-operations handler into a separate module file, this way we don't
have any synthesized stuff in the top-level module, just instantiations. This
is more consistent from the design partitioning point of view. Btw, Xilinx
claims their tools work better that way too, but who knows...
Added optional simulation-only code to assist debugging. Un-comment the
ENABLE_DEBUG `define in 'rtl/modexpng_parameters.vh' to use, but don't ever
try to synthesize the core with debugging enabled.
|