Segment Index Concurrency & Lifecycle
Glossary
- Segment index: top-level API that routes operations to segments.
- Key-segment mapping: map of max key -> SegmentId (KeyToSegmentMap).
- Mapping version: monotonically increasing counter for optimistic mapping checks.
- Segment registry: cache of Segment instances plus the maintenance executor.
- Background split coordinator: decides split scheduling after writes.
- Split: replace one segment with a new segment (or two) and update the mapping.
Core Rules
- SegmentIndex is thread-safe by contract; calls may be concurrent.
- Target: highly concurrent SegmentIndex API; avoid global synchronization and only protect minimal shared structures (mapping updates, split swaps).
- Index operations are not globally serialized; concurrency is bounded by shared caches, mapping updates, and per-segment state machines.
- Segment maintenance IO runs on the segment maintenance executor.
- The maintenance executor is always created by SegmentRegistry from IndexConfiguration.numberOfStableSegmentMaintenanceThreads (default 10).
- Automatic post-write flush/compact is optional and enabled by default.
- Segment BUSY is treated as transient and retried internally; callers do not see BUSY.
- Mapping changes are applied atomically and validated by version checks.
Thread Safety Mechanisms
- IndexInternalConcurrent executes sync operations on caller threads without a global executor.
- SegmentRegistrySynchronized serializes access to the segment instance map and registry mutations.
- KeyToSegmentMap uses snapshot reads plus a mapping version; updates take a write lock and increment the version.
- Segment implementations are thread-safe; read/write operations proceed in parallel when the segment state allows it.
API Behavior
- put/get/delete: retry on per-segment BUSY using IndexRetryPolicy (indexBusyBackoffMillis + indexBusyTimeoutMillis); mapping version mismatch triggers a retry with a fresh snapshot. Timeouts throw IndexException.
- putAsync/getAsync/deleteAsync: submit the synchronous operation to the
dedicated index-worker executor owned by SegmentIndexImpl and return a
CompletionStage.
IndexAsyncAdapterpreserves the async-facing API but does not own a separate executor. - flush/compact: start maintenance on each segment and return once accepted; do not wait for IO completion; BUSY retries follow IndexRetryPolicy.
- flushAndWait/compactAndWait: wait for each segment to return to
READY(orCLOSED); do not call from a segment maintenance executor thread. - getStream: captures a snapshot of segment ids and iterates them using the default segment iterator isolation (FAIL_FAST). An overload allows FULL_ISOLATION for per-segment exclusivity; the stream must be closed to release the segment lock.
- Segment close (async): once close starts, the segment drains in-flight work
and rejects/blocks new operations until CLOSED. The registry should not
reopen a closing segment; attempts should retry until the close completes.
The per-segment
.lockfile enforces single-open at the directory level.
Maintenance & Splits
- SegmentIndexImpl evaluates overlay thresholds after each write and triggers drain/flush follow-up only when backgroundMaintenanceAutoEnabled is true.
- Splits are scheduled by BackgroundSplitCoordinator on the shared split executor; only one split per segment id can be in flight.
- PartitionStableSplitCoordinator retries BUSY using IndexRetryPolicy; timeouts throw.
- Split materialization reads parent stable data with FAIL_FAST iteration, publishes child stable segments, then performs a short exclusive apply phase for route-map swap + overlay reassignment.
- After a split, KeyToSegmentMap updates the mapping and flushes it to disk; any in-flight write with a stale mapping version retries.
Index State Machine
States:
- OPENING: index bootstrap/consistency checks (and lock acquisition) in
progress; operations are rejected.
- READY: operations allowed.
- CLOSING: close() is in progress; new API operations are rejected, the
directory lock is still held, and shutdown may still wait for split/drain/WAL
durability boundaries to settle.
- ERROR: unrecoverable failure; operations are rejected.
- CLOSED: shutdown completed, resources were released, and operations are
rejected.
Transitions:
- OPENING -> READY: after initialization and consistency checks complete.
- READY -> CLOSING: close() starts and begins shutdown coordination.
- CLOSING -> CLOSED: shutdown completes; file lock released.
- any -> ERROR: unrecoverable failure (e.g., OOM, disk full, failed split/file
swap, or consistency check failure).
Notes:
- Only one index instance may hold the directory lock at a time.
- The lock is held through CLOSING and released only when the instance
reaches CLOSED (or during terminal ERROR cleanup).
- flushAndWait() and compactAndWait() remain explicit maintenance
boundaries while close() now uses the same settlement model before
finalizing shutdown.
Failure Handling
- SegmentResultStatus.ERROR from any segment results in IndexException.
- Maintenance failures move the segment to ERROR; flushAndWait/compactAndWait propagate as IndexException.
- Split failures surface through the split future and are rethrown when joined.
- When entering ERROR, the index stops accepting operations and requires manual intervention (recovery/repair or restore from backups).
Components
- SegmentIndex (public API): thread-safe entry point.
- SegmentIndexImpl: retries BUSY, routes operations to segments, and manages maintenance, including the dedicated async API executor.
- IndexAsyncAdapter: thin facade that forwards async API calls to the wrapped index.
- StableSegmentGateway: single-attempt mapping + stable-segment selection.
- IndexRetryPolicy: backoff + timeout for BUSY retries.
- IndexResult/IndexResultStatus: internal OK/BUSY/CLOSED/ERROR wrapper.
- KeyToSegmentMap: mapping, snapshot versioning, and persistence of segment ids.
- SegmentRegistry(Synchronized): caches Segment instances and supplies the maintenance executor.
- BackgroundSplitCoordinator: post-write split scheduling decisions.
- PartitionStableSplitCoordinator: split execution.
Iterator Isolation
- FAIL_FAST: iteration is optimistic; any mutation can invalidate the iterator and terminate the stream early.
- FULL_ISOLATION: holds exclusive access per segment while its iterator is open; writers, flush/compact, and split on that segment block until the iterator (or stream) is closed.
Implementation Mapping
- Index implementation: IndexInternalConcurrent (caller-thread execution).
- Mapping version: KeyToSegmentMap.version (AtomicLong).
- Maintenance executor: SegmentRegistry.getMaintenanceExecutor() backed by IndexConfiguration.numberOfStableSegmentMaintenanceThreads (default 10).
- Split isolation: SegmentIteratorIsolation.FULL_ISOLATION.
- Retry policy: IndexConfiguration.indexBusyBackoffMillis and IndexConfiguration.indexBusyTimeoutMillis.