|
Barretenberg
The ZK-SNARK library at the core of Aztec
|
Circuit-friendly implementation of the SHA-256 compression function using lookup tables and sparse form arithmetic.
Applies the SHA-256 compression function to a single 512-bit message block.
Parameters:
h_init: The 8-word (256-bit) hash state. For the first block, the standard SHA-256 IV. For subsequent blocks, the output of the previous compression.input: The 16-word (512-bit) message block to compress.Output: The updated 8-word hash state.
Note: Only the compression function is implemented since this is all that is needed to support Sha256Compression constraints in DSL.
Standard SHA-256 compression function (FIPS 180-4):
Extends the 16 input words to 64 words:
64 rounds updating 8 working variables (a, b, c, d, e, f, g, h) initialized from h_init:
XOR and bitwise operations are expensive in circuits. We use "sparse form" to convert them to additions:
Two sparse bases are used across three contexts:
7*rotation_sum + (e + 2f + 3g) per digit4*rotation_sum + (a + b + c) per digit4*σ₀_digit + σ₁_digit per digitThe multipliers (4 and 7) separate the two XOR results or the rotation result from the Boolean function encoding within each digit.
In sparse form, rotating a value by r bits is equivalent to multiplying by B^r (where B is the base). However, 32-bit values are decomposed into 3-4 limbs by the input lookup tables, and this creates complications:
Simple case (contiguous after rotation): When a limb's bits remain contiguous and don't land at bit position 0, the rotation is handled by multiplying that limb by the appropriate power of B. These are called "rotation multipliers" or "rotation coefficients" (e.g., get_choose_rotation_multipliers()).
Complex cases (handled by lookup tables):
The input lookup tables (C3 column) return the combined contribution from all complex cases, while the code applies rotation multipliers to handle the simple cases. Correction factors reconcile any coefficient mismatches between limbs.
See the appendix "Detailed Example: Choose + Σ₁" for a worked example showing how Σ₁'s three rotations (6, 11, 25) are split between multipliers and lookup tables.
Six lookup tables handle the sparse form conversions:
| Operation | Input Table | Output Table |
|---|---|---|
| Message extension (σ₀ + σ₁) | SHA256_WITNESS_INPUT | SHA256_WITNESS_OUTPUT |
| Choose + Σ₁ | SHA256_CH_INPUT | SHA256_CH_OUTPUT |
| Majority + Σ₀ | SHA256_MAJ_INPUT | SHA256_MAJ_OUTPUT |
Input tables decompose 32-bit values into sparse limbs. Output tables normalize the sparse sum back to a 32-bit result.
See plookup_tables/sha256.hpp for detailed table documentation.
| Algorithm | Implementation |
|---|---|
| σ₀ + σ₁ | extend_witness() |
| Σ₁ + Ch | choose_with_sigma1() |
| Σ₀ + Maj | majority_with_sigma0() |
| mod 2³² addition | add_normalize() |
Correctness is verified through a layered approach:
The native (non-circuit) SHA-256 implementation (full hash algorithm with padding) is tested against official NIST test vectors, verifying the core algorithm is correct.
The circuit tests use NIST vectors with manual padding - sha256.test.cpp tests sha256_block directly against NIST vectors ("abc" and 56-byte messages), manually constructing padded blocks to verify the compression function produces correct output.
The fuzzer generates random inputs and compares circuit output against native output for the compression function. This provides broad coverage without requiring exhaustive test vectors.
This section walks through how choose_with_sigma1() (in sha256.cpp) computes Σ₁(e) + Ch(e,f,g) using sparse form.
Per bit position i:
Σ₁(e)_i = e[i-6] ⊕ e[i-11] ⊕ e[i-25] (3-way XOR of rotated bits)Ch(e,f,g)_i = (e_i ∧ f_i) ⊕ (¬e_i ∧ g_i) (if e=1, take f; else take g)The Choose function can be computed from the weighted sum e + 2f + 3g:
| e | f | g | e + 2f + 3g | Ch(e,f,g) |
|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 1 | 0 |
| 0 | 1 | 0 | 2 | 0 |
| 0 | 0 | 1 | 3 | 1 |
| 1 | 1 | 0 | 3 | 1 |
| 1 | 0 | 1 | 4 | 0 |
| 0 | 1 | 1 | 5 | 1 |
| 1 | 1 | 1 | 6 | 1 |
The weighted sum uniquely determines Ch(e,f,g) (note: 3 maps to 1 in both cases where it occurs).
This encoding is implemented in get_choose_output_table() (plookup_tables/sha256.hpp).
We encode both Σ₁ and Ch in a single base-28 digit:
Where:
rotation_sum ∈ {0,1,2,3} (sum of 3 rotated bits)e + 2f + 3g ∈ {0,1,2,3,4,5,6}The constant 7 is SPARSE_MULT in choose_with_sigma1().
Computing Σ₁(e) requires rotating e by 6, 11, and 25 bits. In sparse form, rotation by r bits is multiplication by B^r (where B=28). But there's a complication: the 32-bit value is split into three limbs by the input table:
See get_choose_input_table() in plookup_tables/sha256.hpp for limb structure.
When a rotation is applied, each limb's bits either:
B^(new_position).For Σ₁'s three rotations (6, 11, 25), each limb behaves differently:
| Limb | Bits | Rot 6 | Rot 11 | Rot 25 |
|---|---|---|---|---|
| L0 | 0-10 | splits | contiguous | contiguous |
| L1 | 11-21 | contiguous | lands at 0 | contiguous |
| L2 | 22-31 | contiguous | contiguous | splits |
The rotation coefficients c0, c1, c2 (from get_choose_rotation_multipliers()) combine the contiguous contributions for each limb. The lookup table (C3) handles the splits and lands-at-0 cases.
For efficiency, we want to compute e_sparse + SPARSE_MULT * Σ₁(e_sparse) in one operation via multiplication of the full value by a single coefficient. Recall e_sparse = L0 + B^11*L1 + B^22*L2. The coefficient c0 is designed for L0, so we compute e_sparse * c0. But this gives incorrect coefficients for L1 and L2:
c0 (correct)B^11 * c0 (incorrect - should be c1)B^22 * c0 (incorrect - should be c2)Two correction factors fix this:
δ1 = c1 - B^11 * c0, baked into C3 via limb1_table_correction in get_choose_input_table()δ2 = c2 - B^22 * c0, returned by get_choose_rotation_multipliers() and applied explicitly in choose_with_sigma1()SHA256_CH_INPUT via get_choose_input_table()):e_sparseS2 (for L2 correction)choose_with_sigma1()): c0 * SPARSE_MULT + 1 term computes both SPARSE_MULT * Σ₁(e) and e from the same multiplication.SHA256_CH_OUTPUT via get_choose_output_table()): Each base-28 digit maps to Σ₁_bit + Ch_bitThe lookup tables encode all the bit-level logic, so the circuit only performs field additions and table lookups.