Hardware implementation of the SHA-256 cryptographic hash function. The
implementation is written in Verilog 2001 compliant code. The
implementation includes a core and a wrapper that provides a 32-bit
interface for simple integration. There is also an alternative wrapper
that implements a Wishbone compliant interface.
This is a low area implementation that iterates over the rounds but
there is no sharing of operations such as adders.
The hardware implementation is complemented by a functional model
written in Python.
The sha256 is divided into the following sections.
- src/rtl - RTL source files
- src/tb - Testbenches for the RTL files
- src/model/python - Functional model written in python
- doc - documentation (currently not done.)
- toolruns - Where tools are supposed to be run. Includes a Makefile for
building and simulating the design using Icarus Verilog
The actual core consists of the following files:
- sha256_core.v - The core itself with wide interfaces.
- sha256_w_mem.v - W message block memort and expansion logic.
- sha256_k_constants.v - K constants ROM memory.
The top level entity is called sha256_core. This entity has wide
interfaces (512 bit block input, 256 bit digest). In order to make it
usable you probably want to wrap the core with a bus interface.
Unless you want to provide your own interface you therefore also need to
select one top level wrapper. There are two wrappers provided:
- sha256.v - A wrapper with a 32-bit memory like interface.
- wb_sha256.v - A wrapper that implements a Wishbone interface.
Do not include both wrappers in the same project.
The core (sha256_core) will sample all data inputs when given the init
or next signal. the wrappers provided contains additional data
registers. This allows you to load a new block while the core is
processing the previous block.
The W-memory scheduler is based on 16 32-bit registers. Thee registers
are loaded with the current block. After 16 rounds the contents of the
registers slide through the registers r5..r0 while the new W word is
inserted at r15 as well as being returned to the core.
Implementation results using Altera Quartus-II 13.1.
Cyclone IV E
- EP4CE6F17C6
- 3882 LEs
- 1813 registers
- 74 MHz
- 66 cycles latency
Cyclone IV GX
- EP4CGX22CF19C6
- 3773 LEs
- 1813 registers
- 76 MHz
- 66 cycles latency
Cyclone V
- 5CGXFC7C7F23C8
- 1469 ALMs
- 1813 registers
- 79 MHz
- 66 cycles latency
- Extensive verification in physical device.
- Complete documentation.
(2013-02-23)
Cleanup, more results etc. Move all wmem update logic to a separate
process for a cleaner code.
(2014-02-22)
Redesigned the W-memory into a sliding window solution. This not only
removed 48 32-registers but also several muxes and address decoders.
The old implementation resources and performance:
- 9587 LEs
- 3349 registers
- 73 MHz
- 66 cycles latency
The new implementation resources and performance:
- 3765 LEs
- 1813 registers
- 76 MHz
- 66 cycles latency
(2014-02-19)
- The core has been added to the Cryptech repo. The core comes from
https://github.com/secworks/sha256