Skip to content

Segment Index Concurrency & Lifecycle

Glossary

  • Segment index: top-level API that routes operations to segments.
  • Key-segment mapping: map of max key -> SegmentId (KeyToSegmentMap).
  • Mapping version: monotonically increasing counter for optimistic mapping checks.
  • Segment registry: cache of Segment instances plus the maintenance executor.
  • Background split coordinator: decides split scheduling after writes.
  • Split: replace one segment with a new segment (or two) and update the mapping.

Core Rules

  • SegmentIndex is thread-safe by contract; calls may be concurrent.
  • Target: highly concurrent SegmentIndex API; avoid global synchronization and only protect minimal shared structures (mapping updates, split swaps).
  • Index operations are not globally serialized; concurrency is bounded by shared caches, mapping updates, and per-segment state machines.
  • Segment maintenance IO runs on the segment maintenance executor.
  • The maintenance executor is always created by SegmentRegistry from IndexConfiguration.numberOfStableSegmentMaintenanceThreads (default 10).
  • Automatic post-write flush/compact is optional and enabled by default.
  • Segment BUSY is treated as transient and retried internally; callers do not see BUSY.
  • Mapping changes are applied atomically and validated by version checks.

Thread Safety Mechanisms

  • IndexInternalConcurrent executes sync operations on caller threads without a global executor.
  • SegmentRegistrySynchronized serializes access to the segment instance map and registry mutations.
  • KeyToSegmentMap uses snapshot reads plus a mapping version; updates take a write lock and increment the version.
  • Segment implementations are thread-safe; read/write operations proceed in parallel when the segment state allows it.

API Behavior

  • put/get/delete: retry on per-segment BUSY using IndexRetryPolicy (indexBusyBackoffMillis + indexBusyTimeoutMillis); mapping version mismatch triggers a retry with a fresh snapshot. Timeouts throw IndexException.
  • putAsync/getAsync/deleteAsync: submit the synchronous operation to the dedicated index-worker executor owned by SegmentIndexImpl and return a CompletionStage. IndexAsyncAdapter preserves the async-facing API but does not own a separate executor.
  • flush/compact: start maintenance on each segment and return once accepted; do not wait for IO completion; BUSY retries follow IndexRetryPolicy.
  • flushAndWait/compactAndWait: wait for each segment to return to READY (or CLOSED); do not call from a segment maintenance executor thread.
  • getStream: captures a snapshot of segment ids and iterates them using the default segment iterator isolation (FAIL_FAST). An overload allows FULL_ISOLATION for per-segment exclusivity; the stream must be closed to release the segment lock.
  • Segment close (async): once close starts, the segment drains in-flight work and rejects/blocks new operations until CLOSED. The registry should not reopen a closing segment; attempts should retry until the close completes. The per-segment .lock file enforces single-open at the directory level.

Maintenance & Splits

  • SegmentIndexImpl evaluates overlay thresholds after each write and triggers drain/flush follow-up only when backgroundMaintenanceAutoEnabled is true.
  • Splits are scheduled by BackgroundSplitCoordinator on the shared split executor; only one split per segment id can be in flight.
  • PartitionStableSplitCoordinator retries BUSY using IndexRetryPolicy; timeouts throw.
  • Split materialization reads parent stable data with FAIL_FAST iteration, publishes child stable segments, then performs a short exclusive apply phase for route-map swap + overlay reassignment.
  • After a split, KeyToSegmentMap updates the mapping and flushes it to disk; any in-flight write with a stale mapping version retries.

Index State Machine

SegmentIndex lifecycle state machine

States: - OPENING: index bootstrap/consistency checks (and lock acquisition) in progress; operations are rejected. - READY: operations allowed. - CLOSING: close() is in progress; new API operations are rejected, the directory lock is still held, and shutdown may still wait for split/drain/WAL durability boundaries to settle. - ERROR: unrecoverable failure; operations are rejected. - CLOSED: shutdown completed, resources were released, and operations are rejected.

Transitions: - OPENING -> READY: after initialization and consistency checks complete. - READY -> CLOSING: close() starts and begins shutdown coordination. - CLOSING -> CLOSED: shutdown completes; file lock released. - any -> ERROR: unrecoverable failure (e.g., OOM, disk full, failed split/file swap, or consistency check failure).

Notes: - Only one index instance may hold the directory lock at a time. - The lock is held through CLOSING and released only when the instance reaches CLOSED (or during terminal ERROR cleanup). - flushAndWait() and compactAndWait() remain explicit maintenance boundaries while close() now uses the same settlement model before finalizing shutdown.

Failure Handling

  • SegmentResultStatus.ERROR from any segment results in IndexException.
  • Maintenance failures move the segment to ERROR; flushAndWait/compactAndWait propagate as IndexException.
  • Split failures surface through the split future and are rethrown when joined.
  • When entering ERROR, the index stops accepting operations and requires manual intervention (recovery/repair or restore from backups).

Components

  • SegmentIndex (public API): thread-safe entry point.
  • SegmentIndexImpl: retries BUSY, routes operations to segments, and manages maintenance, including the dedicated async API executor.
  • IndexAsyncAdapter: thin facade that forwards async API calls to the wrapped index.
  • StableSegmentGateway: single-attempt mapping + stable-segment selection.
  • IndexRetryPolicy: backoff + timeout for BUSY retries.
  • IndexResult/IndexResultStatus: internal OK/BUSY/CLOSED/ERROR wrapper.
  • KeyToSegmentMap: mapping, snapshot versioning, and persistence of segment ids.
  • SegmentRegistry(Synchronized): caches Segment instances and supplies the maintenance executor.
  • BackgroundSplitCoordinator: post-write split scheduling decisions.
  • PartitionStableSplitCoordinator: split execution.

Iterator Isolation

  • FAIL_FAST: iteration is optimistic; any mutation can invalidate the iterator and terminate the stream early.
  • FULL_ISOLATION: holds exclusive access per segment while its iterator is open; writers, flush/compact, and split on that segment block until the iterator (or stream) is closed.

Implementation Mapping

  • Index implementation: IndexInternalConcurrent (caller-thread execution).
  • Mapping version: KeyToSegmentMap.version (AtomicLong).
  • Maintenance executor: SegmentRegistry.getMaintenanceExecutor() backed by IndexConfiguration.numberOfStableSegmentMaintenanceThreads (default 10).
  • Split isolation: SegmentIteratorIsolation.FULL_ISOLATION.
  • Retry policy: IndexConfiguration.indexBusyBackoffMillis and IndexConfiguration.indexBusyTimeoutMillis.