aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-10-26Experimental Makefile for VivadoHEADmasterPavel V. Shatov (Meister)
Tested with 2018.3 and 2019.2
2021-10-26XDC equivalent of UCF timing constraints for ISE.Pavel V. Shatov (Meister)
Note, that Vivado needs an extra constraint to bypass combinatorial loop DRC check, otherwise bitstream generation will fail.
2020-03-25Prune target list to match projects currently in core.cfg.Paul Selkirk
2020-02-26Merge branch 'js_keywrap' to 'master'Paul Selkirk
2020-02-26Add a build target for hsm_ng, but cut the ModExpNG clock back to 90MHz.Paul Selkirk
Builds fail to meet timing at 180MHz, so I need to cut it back until it can be made to work for everybody.
2020-02-11Turned off resource sharing during synthesis.Pavel V. Shatov (Meister)
2020-01-23Out of curiosity I tried compiling the bitstream with Vivado. These constraintsPavel V. Shatov (Meister)
may come handy if you're brave enough to try this at home.
2020-01-23Changed FMC I/O frequency to 45 MHz in the UCF file to match hardware and alsoPavel V. Shatov (Meister)
updated offset constraint values accordingly. Another important change is that the core selector is now multicycle. The I/O logic runs at 45 MHz, while all the cores and the core selector run at 90 MHz. The job of the core selector is to distribute the write data (STM32 -> FPGA) to the particular core being addressed and to select the read data (FPGA -> STM32) from the core being addressed. Timing issues arising from the distribution of write commands are mitigated by replication of data and control signals. Each core has its individual set of chip select, address and write data registers, they can be placed somewhere inbetween the core itself and the selector and thus the critical timing paths become twice shorter. Readback logic has a huge multiplexor that selects the read data from the core being addressed and then forwards it into the FMC bus arbiter. Since the FMC arbiter operates at 45 MHz (twice slower, than the readback multiplexor), it makes sense to give the multiplexor two 90 MHz clock cycles to select the value, since the arbiter waits one 45 MHz clock cycle before sampling the readback data. This is achieved by applying FROM-TO constraint. Note the two gotches here (took me some time to figure out): 1) you attach the TNM or TNM_NET... well, okay, there's the third potential gotcha here, since only one of the two works with I/O pads, read the CDG for more details, but luckily, it's not out case, phew. So, you attach a TNM_NET to a net, and ISE will follow the net and attach the TNM to **the next register being driven by this net**. This is somewhat non-obvious: say, you have a flip-flop called 'something_dout<0>' and it's output net is naturally also called 'something_dout<0>', attaching TNM_NET to this net doesn't apply the TNM to the flip-flop, it instead follows the net and attaches the TNM to all the load flip-flops on the net. For FROM-TO to work as expected we have to apply TNM_NET to all the output "read data" nets of all the cores, so that they could be traced into the multiplexor. Note, that for a multicycle multiplexor you actually need two FROM-TO constraints: one for the data and one for the control signal. 2) Applying FROM-TO affects both the setup and the hold check in ISE. This is different from Vivado, when you have to individually specify the setup and the hold checks.
2020-01-22This commit turns off the "equivalent_register_removal" setting for XST.Pavel V. Shatov (Meister)
Okay, here's the story. Xilinx synthesis tool ("XST") is smart in the sense, that it detects all the registers with equivalent behaviour and then removes all of them, but one, and connects all loads to this one flip-flop. This works fine most of the time and usually even saves some resources, but for our particular design it was starting to cause just too many problems. The reason is that ModExp* cores exploit the parallel nature of an FPGA, for example, the ModExpNG instantiates four copies of the modular multiplier internally. Those multipliers all operate the same way (but on different data, of course), so all their internal signals such as, say, clock enables and word counters are the same. XST happily throws away all the internals from three multipliers, leaves only one instance of control signals and then the map and place&route tools start struggling for hours fusing this all together. Turning off equivalent register removal entirely leads to excessive resource consumption, so the optimal solution would be to selectively turn it off only for those tricky places where several copies of control signals are actually required to meet timing. The problem is that according to Xilinx' docs (UG687 v14.5, p. 363) "quivalent_register_removal = no" inline constraint can be applied to entire modules, not only individual registers, but I was unable to get this to work, XST seems to just ignore it. This may have been fixed in Vivado though, haven't tried yet. Another potential solution is to prepend every register declaration inside the modular multiplier with this constraint, but that would look just ugly. One trick I've seen somewhere is to `define a new 'keep_equivalent_reg' "keyword" to be '"quivalent_register_removal = no" reg' and tweak register declarations accordingly, that seems to looks somewhat less ugly, don't know. Yet another way around might be to use the "max_fanout" constraint instead. Say there're eight DSP slices per multiplier (thirty two DSP slices total since there're four multiplier instances). In theory we can constrain their clock enable fanout to not exceed 8. The problem is that XST will first throw away three of the clock enables, and then gradually add them back to limit each clock enable fanout to 8. This way there's no guarantee, that the first clock enable will be routed to all the eight DSP slices in the first multiplier, it can be routed to DSP slices in the three remaining multipliers as well, since XST will try to just limit the fanout. It's difficult to predict how the place&route tools will handle this. Anyways, the current slice consumption with 2x ModExpA7 and 1x ModExpNG is ~40%, and the timing situation is very good (the very first phase of place and route already has zero setup time violations, yay!). With global equivalent register removal turned on, utilization drops to ~35%, but timing is impossible to meet even on the highest map and place&route effort setting. I believe the best way forward is to just keep global removal disabled for now. We may revisit this in the future, say, if we decide to generate a custom dedicated RSA-only signer bitstream with as many core instances as possible. Then every register will count, but I suspect we won't get away with just re-enabling global equivalent register removal alone, likely some floorplanning will be required too at least.
2020-01-21Tweak the Makefile to match the new Alpha platform.Pavel V. Shatov (Meister)
2020-01-21Testbench for the new clock manager.Pavel V. Shatov (Meister)
2020-01-21Bumped version number.Pavel V. Shatov (Meister)
2020-01-21New Alpha platform with three clocks:Pavel V. Shatov (Meister)
* 45 MHz (aka "io_clk") is the I/O clock for the FMC bus * 90 MHz (aka "sys_clk") is the system clock for all the cores * 180 MHz (aka "core_clk") is the high-speed clock for high-performance cores
2019-04-09Rebase branch 'js_keywrap' from masterjs_keywrapPaul Selkirk
2019-04-09Collapse build targets into one rule, because that's exactly what $@ is ↵Paul Selkirk
designed for.
2019-04-09correct fpga part number, add keywrap build targetPaul Selkirk
2019-04-05Byte-swap in hardware, so we can do memcpy from software.Paul Selkirk
2019-04-03Merge branch 'fmc_clk_60mhz' to 'master'Paul Selkirk
2019-01-23Generate detailed timing report when PAR fails.Rob Austein
The original version of this file appears to have been attempting to do this, but got the grotty details wrong.
2019-01-23Comment smartguide out of Makefile, not just out of shell script.Rob Austein
2019-01-23Remove `-global_opt off` per discussion with Joachim and Pavel.Rob Austein
2019-01-23Add explicit check for timing failure, per Pavel.Rob Austein
2019-01-231. Disabled SmartGuide as it can thwart reproducible implementation.Pavel V. Shatov (Meister)
2. Enabled multi-threading for MAP and PAR, the corresponding switch is -mt. MAP supports -mt off|2, PAR supports -mt off|2|3|4. Please revert back to -mt off if the build system has only two cores.
2019-01-23Use default synthesis options.Pavel V. Shatov (Meister)
2019-01-22Generate detailed timing report when PAR fails.Rob Austein
The original version of this file appears to have been attempting to do this, but got the grotty details wrong.
2019-01-22Comment smartguide out of Makefile, not just out of shell script.Rob Austein
2019-01-22Remove `-global_opt off` per discussion with Joachim and Pavel.Rob Austein
2019-01-22Add explicit check for timing failure, per Pavel.Rob Austein
2019-01-221. Disabled SmartGuide as it can thwart reproducible implementation.Pavel V. Shatov (Meister)
2. Enabled multi-threading for MAP and PAR, the corresponding switch is -mt. MAP supports -mt off|2, PAR supports -mt off|2|3|4. Please revert back to -mt off if the build system has only two cores.
2019-01-22Use default synthesis options.Pavel V. Shatov (Meister)
2019-01-22Upon reflection, I prefer the way Pavel handled include paths in 8cd28d0Paul Selkirk
(which he only committed on fmc_clk, and I was only looking at master). But I moved the curly brackets from Makefile to xilinx.mk, because a) Makefile shouldn't need to know the picky details of xst option syntax, and b) xst will throw an uninformative error if called with '-vlgincdir ' versus '-vlgincdir {}', if vlgincdir isn't defined in Makefile.
2019-01-22Cherry-pick 8cd28d0/fe3d53c: Added `include directories to Makefile.Paul Selkirk
2019-01-22Upon reflection, I prefer the way Pavel handled include paths in 8cd28d0Paul Selkirk
(which he only committed on fmc_clk, and I was only looking at master). But I moved the curly brackets from Makefile to xilinx.mk, because a) Makefile shouldn't need to know the picky details of xst option syntax, and b) xst will throw an uninformative error if called with '-vlgincdir ' versus '-vlgincdir {}', if vlgincdir isn't defined in Makefile.
2019-01-22Corrected target device.Pavel V. Shatov (Meister)
2019-01-14Add include directives for Pavel's .vh files.Paul Selkirk
2019-01-14Add include directives for Pavel's .vh files.Paul Selkirk
2018-12-04Collapse build targets into one rule, because that's exactly what $@ is ↵Paul Selkirk
designed for.
2018-12-04Collapse build targets into one rule, because that's exactly what $@ is ↵Paul Selkirk
designed for.
2018-09-06Constraints for 60 MHz FMC_CLK.Pavel V. Shatov (Meister)
2018-08-27correct fpga part number, add keywrap build targetPaul Selkirk
2018-08-27Generate detailed timing report when PAR fails.Rob Austein
The original version of this file appears to have been attempting to do this, but got the grotty details wrong.
2018-08-27Comment smartguide out of Makefile, not just out of shell script.Rob Austein
2018-08-27Remove `-global_opt off` per discussion with Joachim and Pavel.Rob Austein
2018-08-21Add explicit check for timing failure, per Pavel.Rob Austein
2018-08-181. Disabled SmartGuide as it can thwart reproducible implementation.Pavel V. Shatov (Meister)
2. Enabled multi-threading for MAP and PAR, the corresponding switch is -mt. MAP supports -mt off|2, PAR supports -mt off|2|3|4. Please revert back to -mt off if the build system has only two cores.
2018-08-18Use default synthesis options.Pavel V. Shatov (Meister)
2018-08-18Corrected target device.Pavel V. Shatov (Meister)
2018-07-14Adjust Makefile to track source changes.Rob Austein
2018-07-05Changed top module to accomodate changes to the clock manager.Pavel V. Shatov (Meister)
2018-07-05Updated clock manager.Pavel V. Shatov (Meister)