This reference model was written to help debug Verilog code, it mimics how an FPGA would do modular exponentiation using systolic Montgomery multiplier. Note, that the model may do weird (from CPU point of view, of course) things at times. Another important thing is that while FPGA modules are written to operate in true constant-time manner, this model itself doesn't take any active measures to keep run-time constant. Do NOT use it in production as-is!
The model is split into low-level primitives (32-bit adder, 32-bit subtractor, 32x32-bit multiplier with pre-adder) and higher-level arithmetic routines (multiplier and exponentiator).
This model uses tips and tricks from the following sources:
1. High-Speed RSA Implementation
2. Handbook of Applied Cryptography
3. Montgomery Modular Multiplication on Reconfigurable Hardware: Systolic versus Multiplexed Implementation