Caching Strategy
HestiaStore uses a few focused caches to deliver read-after-write visibility and predictable read latency while keeping memory bounded. This page outlines each layer, how it is populated or evicted, and which configuration knobs control sizing.
For segment internals (delta, Bloom, sparse/scarce structures), use Segment Architecture. This page describes cache roles at the SegmentIndex integration level.
Goals
- Read-after-write consistency without synchronous disk I/O
- Bound the working set in memory via LRU at the segment layer
- Keep read I/O predictable: avoid random seeks with Bloom filter + sparse index
- Make flush and compaction operations deterministic and safe
Layers Overview
- Segment write cache: per-segment in-memory mutable layer
- Classes:
segment/SegmentCache,segment/SegmentWritePath - Owner:
segment/SegmentImpl -
Purpose: absorb writes and provide immediate visibility before flush
-
Segment delta cache: per-segment overlay of recent writes
- Classes:
segment/SegmentDeltaCache,segment/SegmentDeltaCacheWriter,segment/SegmentDeltaCacheController -
Purpose: hold sorted updates for a segment between compactions; also backs reads
-
Segment data LRU: cache of heavyweight per-segment objects
- Classes:
segmentindex/SegmentDataCache(LRU), values aresegment/SegmentData(lazy container) -
Contents: delta cache, Bloom filter, sparse index (scarce index)
-
Bloom filter: per-segment probabilistic set for negative checks
-
Classes:
bloomfilter/*; created bysegment/SegmentDataSupplier -
Sparse index ("scarce index"): per-segment in-memory snapshot of pointers
-
Classes:
scarceindex/ScarceIndex,ScarceIndexSnapshot -
Key→segment map: max-key to SegmentId mapping
- Class:
segmentindex/routemap/SegmentRouteMap(TreeMap, persisted toindex.map)
Write‑Time Caches
Segment write cache
- On
SegmentIndex.put/delete, the write is routed to one segment. Segment.put(...)stores the latest value in the segment write cache.- Deletes are represented as tombstones.
- Flush persists frozen write-cache snapshots to delta cache files; compaction later merges them into the main SST.
Code:
segmentindex/core/execution/PointOperationCoordinator,
segmentindex/core/routing/MappedSegmentLeaseService,
segment/SegmentWritePath,
segment/SegmentMaintenanceService.
Segment delta cache
- Flush writes become per-segment delta files via
SegmentDeltaCacheWriter(transactional temp file + rename). - If the segment's data is currently loaded in memory, the in-memory delta cache is updated immediately to keep reads fresh.
- Compaction (
SegmentCompacter) rewrites the segment, thenSegmentDeltaCacheController.clear()evicts in-memory delta cache and deletes delta files.
Code:
segment/SegmentDeltaCacheWriter,
segment/SegmentDeltaCacheController,
segment/SegmentCompacter,
segment/SegmentFullWriterTx#doCommit.
Read‑Time Caches
- Top-level routing:
SegmentIndex.get(k)resolves the route and reads one segment directly. - Segment write cache:
Segment.get(k)checks the active or frozen write cache before the delta cache and on-disk structures. - Segment delta cache is consulted before the Bloom filter + sparse index path. If it returns a tombstone, the key is absent.
- Heavy objects (Bloom filter, scarce index, delta cache) are obtained via a provider backed by LRU:
segmentindex/SegmentDataCacheholdssegment/SegmentDatainstances with an LRU limit; eviction callsclose()on the container- providers:
segment/SegmentDataProviderimplementations
Code:
segmentindex/core/execution/PointOperationCoordinator,
segmentindex/core/routing/MappedSegmentLeaseService,
segment/SegmentImpl#get,
segment/SegmentSearcher,
segment/SegmentCache,
segmentindex/SegmentDataCache.
Eviction and Lifecycle
- SegmentDataCache (LRU of SegmentData): evicts least-recently-used segment;
eviction closes Bloom filter and clears delta cache via
close()hook. - SegmentDeltaCache: cleared and files removed after compaction via
SegmentDeltaCacheController.clear(); rebuilt on demand from delta files. - Segment write cache: frozen snapshots are flushed to delta cache files and then retired by segment maintenance.
- SegmentRouteMap: persisted via
flushIfDirty()when updated; survives process restarts by readingindex.map.
Configuration Knobs
Index‑level:
- IndexConfiguration.segment().cachedSegmentLimit() — LRU size for
SegmentDataCache
Per‑segment (via SegmentConf, derived from index configuration):
- IndexConfiguration.segment().cacheKeyLimit() — target size for a single
delta cache
- IndexConfiguration.writePath().segmentWriteCacheKeyLimit() — segment
write-cache threshold
- IndexConfiguration.writePath().maintenanceWriteCacheKeyLimit() —
per-segment maintenance/write ceiling
- IndexConfiguration.writePath().indexBufferedWriteKeyLimit() — index-wide
budget exposed in runtime tuning and metrics
- IndexConfiguration.segment().chunkKeyLimit() — sparse index sampling cadence (affects read
scan window)
Bloom filter sizing:
- bloomFilter().indexSizeBytes() and bloomFilter().hashFunctions()
- bloomFilter().falsePositiveProbability()
I/O buffering:
- io().diskBufferSizeBytes() — affects memory used by readers and writers
across files
See: segmentindex/IndexConfiguration, segment/SegmentConf.
Warm‑Up Strategies
- Point warm-up: issue representative
get(key)calls; this loads the target segments' Bloom filter and sparse index into the LRU. - Segment warm-up: iterate a small range to prime chunk readers and caches.
- Global warm-up: a bounded
index.getStream(SegmentWindow.limit(N))over initial segments to seed the LRU without scanning the full dataset.
Observability
- Bloom filter effectiveness and false-positive rate:
bloomfilter/BloomFilterStats, accessible viaBloomFilter.getStatistics(). - SegmentIndex operation counters (coarse):
segmentindex/Statsincrements on get, put, and delete.
Tuning Guidance
- Throughput-oriented writes: tune the segment write-cache and maintenance
limits (
writePath().segmentWriteCacheKeyLimit(),writePath().maintenanceWriteCacheKeyLimit(),writePath().indexBufferedWriteKeyLimit()); monitor memory and segment maintenance latency. - Read-heavy workloads touching few segments: increase
segment().cachedSegmentLimit()so the working set of segments (Bloom + scarce + delta) stays resident. - Space-sensitive deployments: reduce Bloom filter size (may increase false positives and extra reads) or disable compression filters to trade CPU for I/O.
- Latency-sensitive point lookups: ensure Bloom filter is sized adequately;
keep segments' working set in the LRU; consider slightly smaller
segment().chunkKeyLimit()to narrow the local scan window.
Code Pointers
- Routed direct writes:
src/main/java/org/hestiastore/index/segmentindex/core/execution/PointOperationCoordinator.java,src/main/java/org/hestiastore/index/segmentindex/core/routing/MappedSegmentLeaseService.java - Routed direct reads:
src/main/java/org/hestiastore/index/segmentindex/core/execution/PointOperationCoordinator.java,src/main/java/org/hestiastore/index/segmentindex/core/routing/MappedSegmentLeaseService.java - Segment caches and providers:
src/main/java/org/hestiastore/index/segmentindex/*SegmentData*,src/main/java/org/hestiastore/index/segment/SegmentData* - LRU cache:
src/main/java/org/hestiastore/index/cache/CacheLru.java,src/main/java/org/hestiastore/index/cache/CacheLruImpl.java - Key→segment map:
src/main/java/org/hestiastore/index/segmentindex/routemap/SegmentRouteMap.java