Segment API: Flush, Compact, and Split
Scope
This document explains how the public Segment API behaves during maintenance
and how segment-level rules interact with index-level route splitting.
The public Segment interface exposes flush() and compact(). Split is
owned by the segment-index layer through
SplitPolicyScheduler, SplitTaskCoordinator, and
RouteSplitPlanner. Route drain and parent segment acquisition are owned
by MappedSegmentLeaseService.
For the current state machine and iterator rules, use Segment Concurrency.
Current Model
Segment.put(...)writes into the segment write cache.Segment.get(...)reads write cache first, then delta cache, then stable files.flush()freezes the current write-cache snapshot and persists it into delta cache files.compact()rewrites the stable segment view from main SST + delta cache and clears obsolete delta files.SegmentIndexroutes writes directly to a stable segment; there is no index-level ingest overlay anymore.
Flush
Current behavior:
- under
FREEZE, the segment swaps out the current write cache - in
MAINTENANCE_RUNNING, the frozen snapshot is written to delta cache files - under
FREEZE, metadata is published and iterators are invalidated - new writes continue in a fresh write cache while the frozen snapshot is being persisted
Practical consequences:
flush()is a no-op when the write cache is empty- concurrent
flush()requests returnBUSY - fail-fast iterators can terminate early after the publish step
Compact
Current behavior:
- compaction captures the current stable view plus in-memory deltas
- background I/O builds a new SST, sparse index, and Bloom filter
- publish swaps the new stable files atomically
- obsolete delta files are removed after a successful publish
Practical consequences:
- concurrent
compact()requests returnBUSY - fail-fast iterators can terminate early after publish
get()remains thread-safe; it either sees the old view or the new view
Split
Current index-level split behavior:
SplitPolicySchedulerdecides when a routed segment should be splitSplitTaskCoordinatoracquires aRouteSplitLeasefor the parent route before materializationRouteSplitPlannercomputes the split boundary from a parent segment snapshot- child stable segments are materialized before route-map publish
RouteSplitLeasedrains the parent route before child materialization, and publish updatesSegmentRouteMap- writes to the affected route may see transient internal
BUSYand are retried with segment-access retry settings
Important boundary:
Segmentitself does not own route-map changesSegmentonly provides the stable snapshot and local read-after-write guarantee used by the split coordinator
Backpressure And Overload
There is no index-level ingest overlay to absorb writes anymore. Backpressure comes from:
- segment-local write cache and delta-cache maintenance limits
- transient
BUSYwhile a segment is inFREEZEor maintenance - transient
BUSYwhile a route is scheduled for split - WAL retention pressure when WAL is enabled
The Java configuration API uses grouped writePath(...) and segment(...)
sections. Historical partition-named properties may still appear in persisted
metadata and compatibility metrics, but they now map to routed segment
write-cache, maintenance backlog, and split thresholds rather than a separate
partition runtime.
Parallel Calls
Same segment:
flush(),compact(), andFULL_ISOLATIONiteration are exclusive- writes can continue during
MAINTENANCE_RUNNINGwhen the segment state allows it
Different segments:
- maintenance can run in parallel
- split work on one route does not stop unrelated routes
Reads And Iterators
- multiple
get()calls can run concurrently FAIL_FASTiteration is optimistic and can stop early after a publishFULL_ISOLATIONiteration holds exclusive access for its lifetime and blocks writes, flush, compact, and split materialization on that segment
Corner Cases
- always close
FULL_ISOLATIONiterators; otherwise writers and split materialization can stall - calls on a closed segment return
SegmentResultStatus.CLOSED - version overflow still fails fast in
VersionController - stale references to a retired parent segment must not be reused after split
- Run maintenance on a background executor and prioritize older snapshots.
Parallel calls
- One maintenance task per segment at a time; other requests coalesce or wait.
- Flush requests during split should attach to replay or run after split.
- Maintenance can still run in parallel across different segments.
Reads, get(), and iterators
get()should read from current caches and files without blocking on long maintenance tasks.- Fail-fast iterators continue to invalidate on version changes.
- Prefer snapshot-based iterators for long scans instead of holding write
holds for
FULL_ISOLATION.
Corner cases
- Version mismatch at swap time should trigger retry or fallback.
- Large post-freeze write backlog should trigger backpressure or replay caps.
- Segment replacement must be atomic in the registry to avoid stale reads.
- Iterators opened before a swap should either complete on their snapshot or fail fast.
Coordination summary
- Per-segment
FREEZEfor short cache/file swaps. - Per-segment maintenance state to serialize flush/compact/split work.
- Optimistic version counter for iterator invalidation.
- Registry lock for file replacement if added in the future.