📖 Read Path
This page explains how reads resolve values with low latency and predictable I/O. It walks through point lookups, range iteration, and the interplay of caches, Bloom filter, and the sparse index, mapped to concrete classes in the codebase.
🧭 High‑Level Flow (Point Lookup)
- API call:
SegmentIndex.get(key) - Check the index‑level unique buffer (latest in‑process writes)
- Locate the target segment using the key→segment map
- Inside the segment: consult delta cache → Bloom filter → sparse index → local scan
Lookups are read‑after‑write consistent thanks to the in‑memory buffers.
🚪 Entry Point and First‑Level Cache
segmentindex/SegmentIndexImpl#get(K)does:- Check the index‑level
UniqueCache(holds latest writes prior to flush) - If miss, find
SegmentIdviaKeySegmentCache.findSegmentId(key) - Delegate to
Segment.get(key)
Key classes: segmentindex/SegmentIndexImpl.java, segmentindex/KeySegmentCache.java, cache/UniqueCache.java.
Behavior
- Cache hit and non‑tombstone → return value
- Cache hit and tombstone → treat as absent
- Otherwise fall back to the segment path below
🧩 Per‑Segment Read Path
segment/SegmentImpl#get(key) uses SegmentSearcher with lazily loaded segment data:
- Delta cache probe: in‑memory map of the segment’s pending updates (merged from delta files). If hit and value not a tombstone → return; tombstone → absent.
- Bloom filter:
bloomFilter.isNotStored(key)guards the on‑disk path. If “not stored” → absent. - Sparse index ("scarce index"): returns a chunk start position for keys ≥ query.
- Local scan: within at most N keys (
maxNumberOfKeysInIndexPage) starting at that chunk, compare keys in ascending order and stop as soon as the target is found or passed. - If the sparse index pointed us into the file but no exact key was found, mark a false positive on the Bloom filter for metrics and return absent.
Key classes: segment/SegmentSearcher.java, segment/SegmentIndexSearcher.java, scarceindex/ScarceSegmentIndex.java, bloomfilter/BloomFilter.java.
🔁 Range Scans and Full Iteration
SegmentIndex.getStream()andSegmentIndex.openSegmentIterator(...)produce iterators over all data:segmentindex/SegmentsIteratorchainsSegment.openIterator()across all segments in order.segment/SegmentImpl.openIterator()merges the on‑disk main SST with the segment’s delta cache viaMergeDeltaCacheWithIndexIterator, skipping tombstones.- The per‑segment iterator is wrapped with
EntryIteratorWithLockusing anOptimisticLock. If a write changes the segment version mid‑scan, the iterator stops gracefully (no partial records). - At the top level,
EntryIteratorRefreshedFromCacheoverlays the index‑level unique buffer so that the iterator sees the latest writes even before they’re flushed to disk.
Key classes: segmentindex/SegmentsIterator.java, segment/MergeDeltaCacheWithIndexIterator.java, segmentindex/EntryIteratorRefreshedFromCache.java, EntryIteratorWithLock.java, OptimisticLock.java.
🔄 Read‑After‑Write Semantics
Two layers provide immediate visibility of recent writes:
- Index‑level
UniqueCache(pre‑flush) is checked first bySegmentIndex.getand overlaid on iterators. - Segment delta cache (post‑flush) is kept in memory when loaded; writes to a new delta file also update the in‑memory delta cache when present.
Deletes are represented as tombstones by the value type descriptor. The read path treats a tombstone as “not found”.
🧮 Complexity and I/O Characteristics
- Index‑level cache probe: O(1) hash map
- Segment delta cache probe: O(1) hash map
- Bloom filter probe: O(k) where k is number of hash functions; no I/O
- Sparse index probe: in‑memory list search over a small sample set (fast, cache‑friendly)
- Local scan: sequential read within one chunk window of up to
maxNumberOfKeysInIndexPageentries - Iterators: sequential over chunks; minimal seeks due to chunked layout
These choices keep random access bounded and predictable, with sequential I/O for scans.
⚙️ Configuration Knobs Affecting Reads
maxNumberOfKeysInSegmentChunk— upper bound of keys per chunk; also the window size for a local scan from the sparse index pointer- Bloom filter parameters —
numberOfHashFunctions,indexSizeInBytes,falsePositiveProbability diskIoBufferSize— affects chunk and data block I/O buffering- Encoding/decoding filters — enable CRC32, magic number and optional Snappy compression on read/write paths
See: segmentindex/IndexConfiguration and segment/SegmentConf.
🛡️ Integrity on the Read Path
Decoding applies the inverse of the write pipeline when reading chunks:
- Validate magic number
- Verify CRC32
- Decompress (if Snappy was enabled)
Errors surface as exceptions; partial reads do not corrupt state.
Key classes: chunkstore/ChunkStoreReaderImpl, chunkstore/ChunkFilterMagicNumberValidation, chunkstore/ChunkFilterCrc32Validation, chunkstore/ChunkFilterSnappyDecompress.
🧩 Where to Look in the Code
- Point lookup orchestration:
src/main/java/org/hestiastore/index/segmentindex/SegmentIndexImpl.java - Segment search path:
src/main/java/org/hestiastore/index/segment/SegmentSearcher.java - Sparse index:
src/main/java/org/hestiastore/index/scarceindex/* - Iteration and merging:
src/main/java/org/hestiastore/index/segment/MergeDeltaCacheWithIndexIterator.java - Iterator safety:
src/main/java/org/hestiastore/index/EntryIteratorWithLock.java