Skip to content

Limitations & Trade‑offs

This page lists the most important constraints and design trade‑offs so you can plan deployments and avoid surprises. Each item links to the code or related docs for verification.

Durability and Recovery

  • WAL is optional and disabled by default (Wal.EMPTY). With WAL disabled, durability boundaries are explicit flushAndWait() and close(). With WAL enabled, writes are appended before apply and startup can replay WAL with safe-tail truncation. See Recovery and Operations/WAL docs.
  • Per‑file atomicity only: Writers use temp files + atomic rename; groups of files (e.g., SST + scarce index + bloom) commit in a safe order but not as a single atomic unit. Readers remain consistent because old files stay in place until each rename. See SegmentFullWriterTx, BloomFilterWriterTx.
  • Filesystem requirement: Crash safety relies on same‑directory atomic rename. Use local filesystems; be cautious with network filesystems that may not guarantee strict atomicity.
  • Stale lock files: A crash can leave .lock behind, preventing open until removed. See directory/FsFileLock.java and IndexState*. Remove the file only when certain no process still uses the directory.

Concurrency

  • SegmentIndex is thread‑safe and not globally serialized; heavy concurrent writes can increase contention in shared caches and per-segment state machines.
  • No serialized adapter is provided; enforce strict ordering externally if needed.
  • Optimistic iteration: Segment iterators may stop early if a write bumps the version during a scan (by design). Re‑open the iterator to continue. See EntryIteratorWithLock.

Size and Addressing Limits

  • Per‑segment SST size bounded by 32‑bit positions: Sparse index stores an Integer position and readers cast to int (ChunkEntryFile#openIteratorAtPosition((int)position)). Keep a single vNN-index.sst file below ~2 GiB. Use multiple segments to scale. Code: chunkentryfile/ChunkEntryFile.java, scarceindex/*.
  • Data‑block and cell sizing constraints: diskIoBufferSize must be divisible by 1024; chunk cell size is fixed at 16 bytes. Payloads pad to whole cells (space overhead). Code: Vldtn#requireIoBufferSize, chunkstore/CellPosition.java.

Configuration Immutability

Once an index is created, several properties cannot be changed when reopening with a config:

  • Type descriptors (key/value serialization)
  • Sparse index cadence (maxNumberOfKeysInSegmentChunk)
  • Segment sizing limits (e.g., maxNumberOfKeysInSegment)
  • Bloom filter sizing and hash functions
  • Encoding/decoding filter lists (order and membership)

Attempts to change these raise a validation error in IndexConfigurationManager. To change them, create a new index and bulk-copy data (read + write) or export/import. See segmentindex/IndexConfigurationManager.java.

Data Model and Semantics

  • Strictly increasing keys: All on‑disk structures assume ascending order; compaction and consistency checks enforce it. Incorrect comparators or inconsistent key ordering will fail. Code: segment/SegmentConsistencyChecker.java.
  • Tombstones: Deletes are tombstones until compaction merges and drops them. Heavy delete workloads without compaction may grow delta files and increase read work.
  • No multi‑key transactions: Writes are per‑key; there is no cross‑key atomic batch. Use application‑level coordination if needed.

Security Posture

  • XOR filter is not encryption: ChunkFilterXorEncrypt provides reversible obfuscation only; do not use as security. For encryption at rest, place HestiaStore on an encrypted volume or add a real crypto layer above. See chunkstore/ChunkFilterXor*.
  • No authentication/authorization: HestiaStore is an embedded library and relies on your process/container isolation.

Workloads That Fit Well

  • High‑throughput append/update workloads where read‑after‑write visibility matters and periodic flush/compaction is acceptable.
  • Point lookups and ordered scans with predictable latency (Bloom + sparse index bound I/O).

Anti‑patterns

  • Expecting WAL recovery semantics without enabling WAL via withWal(...).
  • Expecting ASYNC WAL mode to preserve every acknowledged write after OS/power loss.
  • Very large single segments (>2 GiB vNN-index.sst); split into more segments.
  • Heavy mixed concurrent reads/writes with strict low‑latency tail in synchronized mode (coarse locking).

Mitigations and Best Practices

  • Plan periodic flushAndWait() and compact() windows; after crashes run consistency check and optionally compact.
  • Size Bloom filters for your negative‑lookup rate; monitor BloomFilterStats.
  • Tune maxNumberOfKeysInSegmentChunk to balance read scan length vs. sparse index size.
  • Use multiple segments to stay under per‑segment limits and to improve compaction parallelism (future).