🛟 Consistency & Recovery
This page explains HestiaStore’s crash safety model and commit semantics. There is no WAL‑based crash recovery or multi‑operation transactions. Durability is driven by explicit flushes and by the fact that all data files are written via temporary files and atomically renamed on commit.
📜 Scope and Guarantees
- No automatic recovery: the system does not replay a WAL or roll back partial groups of operations after a crash.
- Durability boundary: calling
flush()or closing the index persists all writes that happened before the call. Writes that are only in memory (index‑level buffer) and not flushed are not durable. - Atomic file replacement: data files are written to
*.tmpand made visible viarenameonly after the writer is closed and the transaction is committed. A crash cannot produce partially written visible files.
đź’ľ Where Writes Become Durable
- Index‑level buffer → disk:
SegmentIndex.flush()drains the in‑memory unique buffer into segment delta cache files. On close, the index also flushes. - Segment merge/compaction: when a segment compacts, the new main SST, sparse index, and Bloom filter are built via transactional writers; on commit they atomically replace the old ones.
- Key→segment map (
index.map): persisted via a transactional sorted data writer during flush or when updated.
Relevant code: segmentindex/SegmentIndexImpl#flush(), segmentindex/CompactSupport, segmentindex/KeySegmentCache#optionalyFlush().
✍️ Transactional Write Primitives
All main data files follow the same pattern: write to a temporary file, then atomically rename on commit().
- Guarded transactions:
GuardedWriteTransactionrequires the resource to be closed beforecommit()and prevents double‑commit. - Single‑call helper:
WriteTransaction.execute(writer -> { … })does open → write → close → commit.
Key classes:
- unsorteddatafile/UnsortedDataFileWriterTx → rename(temp, final) on commit
- sorteddatafile/SortedDataFileWriterTx → rename(temp, final) on commit
- datablockfile/DataBlockWriterTx → used by chunk store writers
- chunkstore/ChunkStoreWriterTx and chunkentryfile/ChunkEntryFileWriterTx → layered over DataBlockWriterTx
- bloomfilter/BloomFilterWriterTx → writes new filter and swaps it in on commit
🗂️ File Types and Commit Paths
- Segment delta cache files
- Writer:
segment/SegmentDeltaCacheWriter - Mechanism:
SortedDataFileWriterTx.execute(…) -
Naming: property counter assigns
segmentId-delta-XXX.cachebefore write; if a crash happens before commit, the reader treats missing files as empty, so boot remains safe. -
Main SST (chunked) + sparse index ("scarce index")
- Writers:
segment/SegmentFullWriterTxandsegment/SegmentFullWriter - Internals:
ChunkEntryFileWriterTxfor SST,ScarceIndexWriterTxfor the sparse index -
Bloom filter:
BloomFilterWriterTxbuilds a new filter and commits (rename) before the SST and sparse index are committed. This ordering avoids false negatives on restart. -
Bloom filter
-
Writes to a temporary file via
BloomFilterWriterTx.open()and commits withrename; also updates the in‑memory hash snapshot on commit. -
Key→segment map (
index.map) - Writer:
SortedDataFileWriterTx.execute(…)insideKeySegmentCache.optionalyFlush() - Ensures the map is replaced atomically.
đźš« What Is Not Transactional
- Segment properties (counts and delta‑file numbering) are persisted via an overwrite (
Directory.Access.OVERWRITE). They are updated after data files are committed, and are not critical to data correctness. If a crash corrupts or desynchronizes this metadata, the reader logic remains safe (e.g., missing delta file names yield empty reads) and you can re‑establish consistency via the checker below.
Code: properties/PropertyStoreimpl and SegmentPropertiesManager.
đź’Ą Failure Model (Examples)
- Crash while writing a delta file before commit: only
*.tmpexists; it is ignored on boot; prior state remains valid. - Crash after committing a Bloom filter but before committing SST/sparse index: Bloom filter is ahead of data, which is safe (may increase positives but never produce false negatives).
- Crash after committing SST/sparse index but before properties update: data is fully committed; metadata may lag but does not affect correctness.
đź§° Consistency Check and Repair
- Run
SegmentIndex.checkAndRepairConsistency()after an unexpected shutdown to verify that segments are well‑formed and sorted and that the key→segment map is coherent. This walks all segments, checks ordering and basic invariants, and raises an error if it finds non‑recoverable issues.
Key classes: segmentindex/IndexConsistencyChecker, segment/SegmentConsistencyChecker.
👩‍💻 Developer Notes: open()/commit() and *.tmp
open()returns a writer bound to a temporary file (typically with a.tmpsuffix). You must close the writer before callingcommit().commit()performs an atomicrename(temp, final)so either the old file or the new file is visible on disk.- Prefer
execute(writer -> {…})to ensure the correct order: open → write → close → commit.
Examples in code:
- sorteddatafile/SortedDataFileWriterTx#open() → commit() renames temp to final
- unsorteddatafile/UnsortedDataFileWriterTx#open() → commit() renames temp to final
- datablockfile/DataBlockWriterTx#open() → commit() renames temp to final
- bloomfilter/BloomFilterWriterTx#open() → commit() renames temp to final and swaps hash
đź§ Practical Guidance
- Call
flush()on periodic boundaries and always before shutdown to persist in‑memory writes. - After a crash, reopen the index and run
checkAndRepairConsistency(); optionally trigger acompact()to collapse delta caches. - Remember there is no WAL: durability is guaranteed at the
flush()/close boundaries and via atomic file replacement for all data files.