Cryptographic integrity verification: merkle proofs, root progression, proof stability, and tamper detection.
Overview
This example focuses exclusively on merkql's integrity layer—the merkle tree that backs every partition. No consumers are used; instead, the example works directly with the partition API to generate and verify proofs.
For any record at a given offset, generate a proof consisting of the leaf hash and the sibling hashes along the path to the root. A verifier can recompute the root from the leaf and confirm it matches—proving the record is unmodified.
The merkle root changes with every append. Capturing the root at a point in time creates a cryptographic commitment to the entire log up to that point. If any record is later modified, the root would be different.
merkql uses an append-only binary carry chain for its merkle tree. Appending new records never modifies existing tree nodes, so proofs generated earlier remain valid after growth.
Every object in the pack file is stored under its SHA-256 hash. If the underlying bytes are modified on disk, re-reading and re-hashing produces a different hash—an immediate integrity violation.
Walkthrough
Open a broker with defaults (no compression, no retention) to preserve the full history—critical for audit scenarios. Produce 50 events with structured JSON payloads containing user, action, resource, and timestamp fields.
let broker = Broker::open(BrokerConfig::new(dir.path())).unwrap();
let producer = Broker::producer(&broker);
let users = ["alice", "bob", "charlie", "diana", "eve"];
let actions = ["CREATE", "READ", "UPDATE", "DELETE", "LOGIN"];
for i in 0..50 {
let event = AuditEvent {
user: users[i % users.len()].to_string(),
action: actions[i % actions.len()].to_string(),
resource: resources[(i * 3) % resources.len()].to_string(),
ts: 1700000000 + i as u64,
};
producer.send(&ProducerRecord::new(
"audit", Some(event.user.clone()),
serde_json::to_string(&event).unwrap(),
)).unwrap();
}
Access the partition directly through broker.topic().partition(). For each of 5 sampled offsets, call partition.proof(offset) to generate a Proof struct containing the leaf hash, sibling hashes, and the root. Then verify with MerkleTree::verify_proof().
let topic = broker.topic("audit").unwrap();
let part_arc = topic.partition(0).unwrap();
let partition = part_arc.read().unwrap();
let sample_offsets = [0, 10, 25, 37, 49];
for &offset in &sample_offsets {
let proof = partition.proof(offset).unwrap().unwrap();
let valid = MerkleTree::verify_proof(
&proof, partition.store()
).unwrap();
assert!(valid);
}
The proof depth reflects the tree height at that offset. Offset 49 (the last leaf of 50) has depth 3 because the tree's right edge is shorter than the interior.
Offset 0: leaf=ed0b5b61ff5a084a... depth=6 valid=true
Offset 10: leaf=cf2a783ab91ecc1b... depth=6 valid=true
Offset 25: leaf=f0597ca3b071b2b3... depth=6 valid=true
Offset 37: leaf=75a8f5069a0171cf... depth=6 valid=true
Offset 49: leaf=7e26e538033f0c75... depth=3 valid=true
Capture the merkle root, append 10 more events, and capture it again. The root changes because each append creates new branch nodes that propagate up to a new root.
let root_before = partition.merkle_root().unwrap().unwrap();
// Produce 10 more events...
let root_after = partition.merkle_root().unwrap().unwrap();
assert_ne!(root_before, root_after);
In practice, you would publish or escrow the root hash at regular intervals. Any future claim that “the log contained X at time T” can be verified against the escrowed root.
Root before: a67c8df3b074b586...
Root after: 398fdbbf5afde7fe...
Root changed as expected.
The 5 proofs generated before appending are re-verified after the 10 new records are added. All still pass. This is a key property of merkql's merkle tree: new leaves are added to the right, and the existing tree nodes are immutable in the content-addressed object store.
for proof in &earlier_proofs {
let valid = MerkleTree::verify_proof(
proof, partition.store()
).unwrap();
assert!(valid);
}
Note: each proof was captured with the root hash at the time of generation. The proof verifies against that root, not the current one. Since the underlying tree nodes haven't changed, the proof remains valid even though the tree has grown.
Proof for offset 0: still valid = true
Proof for offset 10: still valid = true
Proof for offset 25: still valid = true
Proof for offset 37: still valid = true
Proof for offset 49: still valid = true
This step demonstrates what happens when data is modified on disk. The example reads a record at offset 10, computes its SHA-256 hash, then directly corrupts the pack file by flipping bytes at the entry's storage location. When the data is re-read through the object store and re-hashed, the hash no longer matches.
// Read the record and compute its expected hash
let record = partition.read(tamper_offset).unwrap().unwrap();
let serialized = record.serialize();
let expected_hash = Hash::digest(&serialized);
// Corrupt the pack file on disk (flip bytes at the entry)...
tamper_pack_file(&pack_path, &expected_hash);
// Re-read through the store — still finds the entry by offset,
// but the bytes are now different
let corrupted_data = partition.store().get(&expected_hash).unwrap();
let corrupted_hash = Hash::digest(&corrupted_data);
assert_ne!(expected_hash, corrupted_hash);
The tamper function walks the pack file's entry format ([4B length][32B hash][data]) to find and corrupt the target entry. In production, you wouldn't need this—the hash comparison alone detects any modification.
Record at offset 10: user=alice, action=CREATE
Expected hash: 36f0a3094a59beb0...
Tampered with pack file data for offset 10.
Re-read hash: c24733d31c33d848...
Integrity check: hashes match = false (tamper DETECTED)
Concepts
merkql uses an incremental binary carry chain (similar to binary addition) for tree construction. When two entries at the same height exist, they merge into a branch one level up. This makes appends O(log n) and avoids rebuilding the tree from scratch.
Every object—records, leaf nodes, branch nodes—is stored in a pack file keyed by its SHA-256 hash. This means identical content is never stored twice, and any modification to stored bytes is detectable by re-hashing.
A Proof contains the leaf hash (the record's hash), a list of sibling hashes with their side (Left or Right), and the root hash. Verification walks from leaf to root, hashing pairs at each level.
The tree's state (pending entries and count) is persisted to tree.snapshot after every write using atomic temp+fsync+rename. On reopen, the tree resumes without replaying the log.
For SOX, HIPAA, PCI-DSS, or GDPR audit requirements: periodically capture and escrow the merkle root, then hand auditors individual proofs. They can independently verify any record's inclusion without access to the full dataset.
Reference
| API | Purpose |
|---|---|
broker.topic(name) | Get a topic by name |
topic.partition(id) | Get a partition (returns Arc<RwLock>) |
partition.proof(offset) | Generate a merkle inclusion proof |
MerkleTree::verify_proof() | Verify a proof against the object store |
partition.merkle_root() | Get the current root hash |
partition.read(offset) | Read a single record by offset |
partition.store() | Access the content-addressed pack file store |
store.get(hash) | Retrieve object bytes by SHA-256 hash |
Hash::digest(data) | Compute SHA-256 hash |
hash.to_hex() | Convert hash to hex string for display |
record.serialize() | Serialize record to bytes (for hashing) |
cargo run -p merkql-audit-trail