Skip to content

Refactor backlog

Active

[ ] 59.1 Concurrency: remove lock-order inversion in core ops (Risk: HIGH) - SegmentIndexCore.get/put: avoid holding key-map read lock while calling SegmentRegistry.getSegment or touching segments. - Use key-map snapshot + version re-check on retry/BUSY paths. - Tests: IntegrationSegmentIndexConcurrencyTest + new split/put stress.

Planned

High

[ ] 78 Monitoring/Management platform rollout (Risk: HIGH) - Goal: evolve from in-process counters to multi-JVM monitoring and control without forcing Micrometer/Prometheus dependencies into core. - Delivery model: phase-gated rollout where each phase is releasable and backward compatible. - Constraints: - Core package must not depend on Micrometer, Prometheus, servlet stacks, or UI classes. - Runtime control endpoints must be explicit allowlist operations only (no generic "execute command" style endpoint). - All mutating management operations must be auditable.

[ ] 78.1 Define source/module boundaries and package contracts (Risk: HIGH) - Target logical modules/packages: - org.hestiastore.index.* (core) - org.hestiastore.monitoring.* (metrics model + exporter adapters) - org.hestiastore.management.api.* (shared DTOs/contracts) - org.hestiastore.management.agent.* (node-local REST API in index JVM) - org.hestiastore.console.* (web UI / control plane) - Start in single-module codebase with strict package boundaries to keep later physical split low risk. - Add architecture doc with allowed dependency direction: core <- monitoring <- management.agent <- console and management.api shared by agent/console. - Acceptance: - No core imports from monitoring/agent/console packages. - Checkstyle/ArchUnit (or similar) rule blocks forbidden imports.

[ ] 78.2 Add stable core metrics snapshot API (Risk: HIGH) - Introduce immutable public snapshot types in core for index/segment metrics (e.g. op counters, bloom stats, segment counts, state). - Add SegmentIndex.metricsSnapshot() (or equivalent read-only API). - Keep existing behavior intact while wiring current counters into snapshot. - Make counters thread-safe (LongAdder/AtomicLong) where currently not. - Define compatibility policy: - new fields may be added, - existing field names/semantics cannot silently change. - Acceptance: - Unit/integration tests for snapshot consistency under concurrent load. - Docs page with metric field definitions and semantics.

[ ] 78.3 Build monitoring bridge layer (Micrometer/Prometheus/JMX) (Risk: HIGH) - Implement monitoring adapters in org.hestiastore.monitoring.*: - Micrometer binder reading from core snapshot API. - Prometheus exposition support (via Micrometer registry or direct bridge). - Optional JMX MBean exporter mapped from the same snapshot model. - Ensure adapters can be created/removed without restarting index (where runtime allows). - Define metric naming/tag conventions (hestiastore_*, stable tag set). - Acceptance: - Prometheus scrape returns expected metrics and labels. - Zero adapter overhead when monitoring package is not used.

[ ] 78.4 Add management API contracts and versioning (Risk: HIGH) - Create org.hestiastore.management.api.* DTOs: - NodeStateResponse, MetricsResponse, ActionRequest/Response, ConfigPatchRequest, ErrorResponse. - Version endpoints from start (/api/v1/...) and define deprecation rules. - Include idempotency and safety semantics for actions: - flush, compact, selected config patch operations. - Acceptance: - OpenAPI (or equivalent) published with examples. - Contract tests verify backward-compatible serialization.

[ ] 78.5 Implement node-local management agent (Risk: HIGH) - Add lightweight REST server integration for index JVM process: - GET /api/v1/state - GET /api/v1/metrics - POST /api/v1/actions/flush - POST /api/v1/actions/compact - PATCH /api/v1/config (allowlist runtime-safe keys only) - Include health and readiness endpoints for deployment integration. - Add per-request audit logging for mutating endpoints. - Acceptance: - End-to-end test: invoke actions and verify effect on index state. - Negative tests for forbidden config keys and invalid state transitions.

[ ] 78.6 Implement central console web application (Risk: HIGH) - Build org.hestiastore.console.* with capabilities: - register/manage multiple index JVM nodes, - poll agent APIs and display key read/write/latency/segment metrics, - trigger safe operations (flush/compact) with confirmation UX, - show recent audit/event log entries. - Keep UI read-first: write controls separated and permission-gated. - Define minimal dashboard first; defer advanced analytics to later items. - Acceptance: - Multi-node dashboard works for at least 3 registered nodes. - Action execution shows pending/success/failure lifecycle.

[ ] 78.7 Secure transport, authz, and audit trail (Risk: HIGH) - Agent <-> console transport: - enforce TLS (prefer mTLS in production profiles), - token- or cert-based authn, - role-based authz (read, operate, admin). - Add immutable audit records for mutating calls: actor, target node, endpoint, payload digest, result, timestamp. - Add rate limits and retry/backoff policy for control operations. - Acceptance: - Security integration tests for unauthorized/forbidden scenarios. - Audit log verification tests for all mutating endpoints.

[ ] 78.8 Packaging, release strategy, and migration path (Risk: HIGH) - Release artifacts initially from same repo: - hestiastore (core) - hestiastore-monitoring (bridges/exporters) - hestiastore-management-agent - hestiastore-console - Keep aligned versions per release line (for example 0.2.x for all). - Document migration from single-module to multi-module build: move packages with no API break using prior boundary rules from 78.1. - Acceptance: - Build produces separate jars and integration tests across artifacts pass. - Release docs include compatibility matrix and upgrade notes.

[ ] 78.9 Rollout stages with explicit quality gates (Risk: HIGH) - Stage A: core snapshot API only; no external exporters. - Stage B: monitoring bridge with Prometheus scrape + docs. - Stage C: node agent endpoints (read-only first, then mutating). - Stage D: console UI for multi-node visibility, then controlled actions. - Required gates per stage: - load/perf regression budget defined and met, - concurrency tests for stats correctness, - failure-mode tests (node down, timeout, partial responses), - operational docs/runbook updated. - Acceptance: - Each stage releasable independently. - Rollback procedure documented and tested.

Medium

[ ] 54 Dedicated executor for index async ops (Risk: MEDIUM) - Use a dedicated, bounded executor for SegmentIndexImpl.runAsyncTracked (no common pool). - Define rejection policy: map saturation to BUSY/error with clear message. - Ensure close waits for in‑flight async work or cancels safely. - Tests: saturation/backpressure, close ordering, no caller‑thread IO.

[ ] 55 Replace busy spin loops with retry + jitter (Risk: MEDIUM) - Replace Thread.onSpinWait/busy loops in split iterator open and other retry paths with IndexRetryPolicy + jitter. - Make timeouts explicit and surface IndexException with operation name. - Tests: BUSY retry exits on READY, timeout path, interrupt handling.

[ ] 56 Key‑to‑segment map read contention reduction (Risk: MEDIUM) - Evaluate snapshot‑based reads or StampedLock for high‑read workloads. - Keep version validation semantics intact for split/extend paths. - Tests: concurrent get/put under splits, no missing mappings, no deadlocks.

[ ] 57 Streaming iterators without full materialization (Risk: MEDIUM) - Replace list materialization in getStream/FULL_ISOLATION with streaming merge iterators over write/delta caches and segment files. - Ensure iterator close releases resources and does not leak locks. - Tests: large data set memory profile, iterator isolation correctness.

[ ] 5 Stop materializing merged cache lists on read (Risk: MEDIUM) - Problem: SegmentReadPath.openIterator calls getAsSortedList, building full merged lists for each iterator. - Fix: provide streaming merge iterator over delta/write caches without full list materialization. - Options: - Option A (recommended): switch UniqueCache to TreeMap / ConcurrentSkipListMap, add a sorted iterator API, and merge cache iterators (write/frozen/delta) with MergedEntryIterator in the FULL_ISOLATION path. - Option B: keep HashMap / ConcurrentHashMap for get/put and maintain a sorted key index (TreeSet / ConcurrentSkipListSet) for iteration; expose a sorted iterator over keys + map values and merge like Option A. [ ] 6 Stream compaction without full cache snapshot (Risk: MEDIUM) - Problem: compaction snapshots the full cache list in memory. - Fix: stream from iterators or chunk snapshot to bounded buffers. [ ] 7 Stream split without full cache snapshot (Risk: MEDIUM) - Problem: split uses FULL_ISOLATION iterator backed by full list snapshot. - Fix: use streaming iterator or chunked splitting to cap memory. [ ] 8 Avoid full materialization in IndexInternalConcurrent.getStream (Risk: MEDIUM) - Problem: method loads all entries into a list before returning a stream. - Fix: return a streaming spliterator tied to iterator close. [ ] 9 Add eviction for heavy segment resources (Risk: MEDIUM) - Problem: SegmentResourcesImpl caches bloom/scarce forever. - Fix: tie resource lifetime to segment eviction or add per-resource LRU; ensure invalidate/close releases memory.

Low

[ ] 10 Allow cache shrink after peaks (Risk: LOW) - Problem: UniqueCache.clear() keeps underlying HashMap capacity. - Fix: rebuild map on clear when size exceeds a threshold; add tests.

Other refactors (non-OOM)

[ ] 13 Implement a real registry lock (Risk: MEDIUM) - Add an explicit lock around registry mutations + file ops. - Replace/rename executeWithRegistryLock to actually serialize callers. - Add tests for split/compact interleaving and segment visibility. [ ] 14 Replace common-pool async with dedicated executor + backpressure (Risk: MEDIUM) - Add/configure a dedicated executor for async API calls. - Track in-flight tasks and wait on close; add queue/backpressure limits. - Add tests for saturation, cancellation, and close ordering. [ ] 15 Define IndexAsyncAdapter.close() behavior (Risk: MEDIUM) - Decide on wait vs non-blocking close and document it. - Add tests that match the chosen contract. [ ] 16 Replace busy-spin loops with retry+backoff+timeout (Risk: MEDIUM) - Use IndexRetryPolicy in SegmentsIterator and split iterator open. - Add interrupt handling and timeout paths with clear error messaging. - Add tests for BUSY loops and timeout behavior. [ ] 17 Stop returning null on CLOSED in SegmentIndexImpl.get (Risk: MEDIUM) - Decide API surface (exception vs status/Optional). - Update callers and docs to distinguish "missing" vs "closed". - Add tests for CLOSED/ERROR paths. [ ] 19 Propagate MDC context to async ops and stream consumption (Risk: LOW) - Capture MDC context on submit and reapply in async tasks. - Wrap stream/iterator consumption with MDC scope; clear on close. - Add tests asserting index.name appears in async logs. [ ] 41 Unify async execution for segment index (Risk: MEDIUM) - Route SegmentIndexImpl.runAsyncTracked and IndexAsyncAdapter.runAsyncTracked through a shared, dedicated executor (no common pool). - Decide whether to keep both async layers or make one delegate to the other. - Align async close behavior and document rejection/backpressure outcomes. [ ] 42 Revisit SegmentAsyncExecutor rejection policy (Risk: MEDIUM) - Ensure maintenance IO never runs on caller threads. - Choose AbortPolicy + BUSY/error mapping or custom handler. - Update docs and metrics if behavior changes. [ ] 43 Replace registry close polling with completion signal (Risk: MEDIUM) - Add a close completion handle or signal in Segment. - Update SegmentRegistry.closeSegmentIfNeeded to wait on completion rather than polling getState(). - Ensure close-from-maintenance thread does not deadlock. [ ] 44 Normalize split close/eviction flow (Risk: MEDIUM) - Centralize segment close/eviction in SegmentRegistry. - Remove direct segment.close() calls from split coordinator. - Ensure split outcome updates mapping, eviction, and close are ordered. [ ] 45 Replace spin-wait in SegmentConcurrencyGate.awaitNoInFlight (Risk: LOW) - Use wait/notify or ManagedBlocker with timeout. - Preserve FREEZE semantics and early exit on state change. - Add tests for drain behavior under load. [ ] 46 Align iterator isolation naming and semantics (Risk: LOW) - Choose between FAIL_FAST/FULL_ISOLATION and the legacy INTERRUPT_FAST/STOP_FAST terminology. - Update docs, comments, and any mapping code consistently. [ ] 47 Consolidate BUSY/CLOSED retry loops (Risk: LOW) - Extract shared retry helper for segmentindex operations. - Replace ad-hoc loops in SegmentRegistry, SegmentSplitCoordinator, and SegmentIndexImpl. - Keep backoff/timeout semantics and error messages consistent.

Testing/Quality

[ ] 48 Test executor saturation and backpressure paths (Risk: MEDIUM) - Add tests for SegmentAsyncExecutor queue saturation and rejection handling. - Add tests for SplitAsyncExecutor rejection and in-flight cleanup. - Verify maintenance IO never runs on caller threads. [ ] 49 Test close path interactions (Risk: MEDIUM) - Close while segment is MAINTENANCE_RUNNING and ensure backoff/timeout works. - Close during async operations should fail fast with clear error. - Assert no deadlock when waiting for segment READY/CLOSED. [ ] 50 Test split failure cleanup (Risk: MEDIUM) - Force exceptions in split steps and assert splitsInFlight clears. - Validate directory swap and key-to-segment map remain consistent. - Ensure resources/locks are released on failure. [ ] 51 Test maintenance failure transitions (Risk: MEDIUM) - Inject failures in maintenance IO and publish phases. - Assert segment moves to ERROR and callers see ERROR status. - Verify rejection handling does not leave the segment in FREEZE.

Ready

  • (move items here when they are scoped and ready to execute)

Deferred (segment scope, do not touch now)

Maintenance tasks

[ ] M37 Audit segment package for unused or test-only code (Risk: LOW) - Limit class, method and variables visiblity - Identify unused classes/methods/fields. - Remove code only referenced by tests or move test helpers into test scope. - Ensure public API docs and tests remain consistent after cleanup. [ ] M38 Review segment package for test and Javadoc coverage (Risk: LOW) - Ensure each class has a JUnit test or document why coverage is excluded. - Ensure each public class/method has Javadoc; add missing docs. [ ] M39 Audit segmentindex package for unused or test-only code (Risk: LOW) - Limit class, method and variables visiblity - Identify unused classes/methods/fields. - Remove code only referenced by tests or move test helpers into test scope. - Ensure public API docs and tests remain consistent after cleanup. [ ] M40 Review segmentindex package for test and Javadoc coverage (Risk: LOW) - Ensure each class has a JUnit test or document why coverage is excluded. - Ensure each public class/method has Javadoc; add missing docs. [ ] M41 Audit segmentregistry package for unused or test-only code (Risk: LOW) - Limit class, method and variables visiblity - Identify unused classes/methods/fields. - Remove code only referenced by tests or move test helpers into test scope. - Ensure public API docs and tests remain consistent after cleanup. [ ] M42 Review segmentregistry package for test and Javadoc coverage (Risk: LOW) - Ensure each class has a JUnit test or document why coverage is excluded. - Ensure each public class/method has Javadoc; add missing docs. - See docs/development/segmentregistry-audit.md for audit notes.

Done (Archive)

  • (keep completed items here; do not delete)

[x] 61.1 Wire SegmentHandler into key-to-segment map usage (Risk: HIGH) - Replace direct segment references in key-to-segment map paths with SegmentHandler usage. - Ensure handlers are used consistently for segment access in index flows.

[x] 61.2 Refactor split algorithm around handler locks (Risk: HIGH) - When a segment is eligible for split: acquire handler lock, re-check eligibility under lock, then either unlock or proceed with split. - Split apply ordering: update map on disk first, then in-memory map, then close old segment, delete files, and finally unlock. - Ensure failures unlock the handler and clean up temporary segments. - Update docs/architecture/registry/registry.md to reflect handler-based locking.

[x] 61.3 Simplify SegmentHandler lock API (Risk: MEDIUM) - Keep internal handler state as READY/LOCKED. - lock() returns SegmentHandlerLockStatus with OK or BUSY. - Replace token-based lock/unlock usage across registry + split flows. - Update handler-related tests to match the new API.

[x] 60 Move registry implementation to segmentregistry package (Risk: MEDIUM) - Move SegmentRegistryImpl, SegmentRegystryState, SegmentRegistryCache, SegmentRegistryState, and SegmentRegistryResult to org.hestiastore.index.segmentregistry. - Update imports/usages in segmentindex and tests. - Keep public API surface the same; verify no package-private access leaks.

[x] M41 Audit segmentregistry package for unused or test-only code (Risk: LOW) - Limit class, method and variables visiblity - Identify unused classes/methods/fields. - Remove code only referenced by tests or move test helpers into test scope. - Ensure public API docs and tests remain consistent after cleanup. [x] M42 Review segmentregistry package for test and Javadoc coverage (Risk: LOW) - Ensure each class has a JUnit test or document why coverage is excluded. - Ensure each public class/method has Javadoc; add missing docs.

[x] 59 Introduce SegmentHandler lock gate in segmentindex (Risk: HIGH) - Add SegmentHandler with getSegment() returning SegmentHandlerResult: OK (segment), LOCKED, and handler states READY/LOCKED. - lock() returns a privileged handle/token that allows access to the underlying segment while handler state is LOCKED. - getSegment() must return LOCKED while locked for all non-privileged callers (no segment exposure during lock). - Wire split flow to lock via handler before opening FULL_ISOLATION iterator, then unlock after apply/cleanup. - Add tests: LOCKED is returned during lock; lock holder can operate; unlock restores OK.

[x] 59.2 Concurrency: reduce redundant key-map read locks (Risk: MEDIUM) - Make KeyToSegmentMapSynchronizedAdapter.snapshot() lock-free (volatile snapshot + AtomicLong version). - Keep read locks only for map-only operations; do not wrap segment calls. - Tests: snapshot consistency + existing KeyToSegmentMapTest.

[x] 59.3 Concurrency: limit registry FREEZE to split apply (Risk: MEDIUM) - Remove FreezeGuard usage from SegmentRegistryImpl.getSegment create/ eviction path; keep cache lock for LRU safety. - Reserve registry FREEZE for split apply only. - Tests: split + eviction concurrency (SegmentRegistryCacheTest, SegmentSplitCoordinatorConcurrencyTest, integration stress).

[x] 52 Remove automatic compaction from segmentindex (Risk: MEDIUM) - Drop pre-split compaction in SegmentSplitCoordinator and remove SegmentSplitterPolicy.shouldBeCompactedBeforeSplitting + related retry logic. - Simplify split planning to use estimated key counts directly (remove compaction/tombstone hints from SegmentSplitterPolicy or replace with a minimal estimate helper). - Keep SegmentIndex.compact / compactAndWait as the only segmentindex-triggered compaction entry point; update Javadocs to reflect compaction being handled inside the segment package otherwise. - Update tests that construct SegmentSplitterPolicy and add coverage that split does not call Segment.compact while user-invoked compaction still does.

[x] 1 everiwhere rename maxNumberOfKeysInSegmentWriteCacheDuringFlush to maxNumberOfKeysInSegmentWriteCacheDuringMaintenance including all configurations setter getter all all posssible usages. [x] 2 Wnen write cache reach size as maxNumberOfKeysInSegmentWriteCacheDuringMaintenance than response to put with BUSY. [x] 3 UniqueCache should not use read/write reentrant lock. It's property of concurrent hash map. [x] 4 Enforce maxNumberOfSegmentsInCache in SegmentRegistry (Risk: MEDIUM) - Problem: segments are cached unbounded; memory grows as segments grow. - Fix: implement LRU or size-bounded cache; evict + close segments and invalidate resources on eviction. [x] 18 Provide index-level FULL_ISOLATION streaming (Risk: MEDIUM) - Add overload or option to request FULL_ISOLATION on index iterators. - Implement iterator that holds exclusivity across segments safely. - Add tests for long-running scans during maintenance. [x] 23 Refactor Segment.close() to async fire-and-forget with READY-only entry (Risk: MEDIUM) - Change Segment to drop CloseableResource and return SegmentResult<Void> from close(). - Close starts only in READY: transition to FREEZE, drain, optionally flush write cache, then run close work on maintenance thread. - Completion marks CLOSED, releases locks/resources, and stops admissions. - Move close-state tracking into segment index (avoid Segment.wasClosed()). - Update state machine/gate/docs/tests to match the new close lifecycle. [x] 24 Add integration test: in-memory segment lock prevents double-open (Risk: LOW) - Create an integration test that opens a segment in a directory and asserts a second open in the same directory fails (lock enforcement). [x] 25 Simplify Segment.flush()/compact() to return status only (Risk: MEDIUM) - Remove CompletionStage return values from flush() and compact(). - Operation completion is observable when segment state returns to READY. - Update callers, docs, and tests that wait on completion stages. [x] 25 Create directory API and layout helpers (Risk: HIGH) - Add Directory.openSubDirectory(String) + AsyncDirectory.openSubDirectory(String) and lifecycle helpers Directory.mkdir(String) / Directory.rmdir(String). - Implement in FsDirectory, AsyncDirectoryAdapter, and in-memory MemDirectory equivalents; define semantics for non-empty rmdir. - Add SegmentDirectoryLayout (or similar) that builds names for: index, scarce, bloom, delta, properties, and lock files. - Add tests for directory creation and layout mapping.

[x] 26 Introduce segment-rooted SegmentFiles (Risk: HIGH) - Add a SegmentFiles constructor that accepts a segment root AsyncDirectory (instead of a flat base directory + id). - Keep legacy flat layout working (auto-detect existing files, or flag in SegmentBuilder). - Update SegmentBuilder to create/use the segment root directory. - Add tests that both layouts open the same data correctly.

[x] 27 Add per-segment .lock file (Risk: MEDIUM) - Add segment.lock (or .lock) inside the segment directory. - Acquire lock on segment open; release on close. Fail fast on lock held. - Add stale-lock recovery policy (manual delete or metadata timestamp). - Add tests for lock contention and cleanup.

[x] 28 Shared properties file structure (Risk: MEDIUM) - Introduce a common property schema used by segment + segmentindex packages (e.g. IndexPropertiesSchema). - Store schema version and required keys; add migration helpers. - Update SegmentPropertiesManager and IndexConfiguratonStorage to use the shared schema.

[x] 29 Compact flow for directory layout (publish protocol) (Risk: HIGH) - IO phase (MAINTENANCE_RUNNING): - Create a new directory, e.g. segment-00001.next/ or versioned segment-00001/v2/. - Write new index/scarce/bloom/cache files there. - Write properties with state PREPARED + metadata. - Publish phase (short FREEZE): - Mark new directory as ACTIVE in properties (or update a pointer file segment-00001.active). - Reload SegmentFiles/SegmentResources to the new root. - Bump version and return to READY. - Cleanup: - Delete old directory only after publish and resource reload. - Add startup recovery for PREPARED without ACTIVE. - Align with items 11/12 (atomic swaps + map updates).

[x] 30 Split + replace updates (Risk: HIGH) - Update split/rename logic to use directory swaps or pointer updates. - Ensure registry + segmentindex metadata remain consistent. - Add tests for crash recovery and partial swaps. [x] 31 Segment layout uses versioned file names in a single directory (Risk: HIGH) - Name index/scarce/bloom/delta as vNN-* (for example v01-index.sst, v01-scarce.sst, v01-bloom-filter.bin, v01-delta-0000.cache). - Store the active version and counters in manifest.txt (no .active pointer). - Use zero-padded 2-digit versions and 4-digit delta ids. [x] 32 Builder/files treat the provided directory as the segment home (Risk: HIGH) - Require Segment.builder(AsyncDirectory) for construction. - Lock + properties live inside the segment directory. - Resolve active version from properties or detected index files. [x] 33 Compaction/flush publish is memory-only (Risk: HIGH) - IO phase writes versioned files and property updates. - Publish swaps in-memory version/resources and bumps iterator version. - Cleanup old version files asynchronously. [x] 34 Registry/tests align with single-directory versioning (Risk: MEDIUM) - Registry passes segment directories; no active-directory switching. - Update tests to accept versioned names and per-segment directories. [x] 35 Remove unused close monitor in SegmentConcurrencyGate (Risk: LOW) - Remove closeMonitor and signalCloseMonitor since nothing waits on it. - Keep drain behavior in awaitNoInFlight() unchanged. [x] 36 Consolidate in-flight read/write counters in SegmentConcurrencyGate (Risk: LOW) - Replace inFlightReads/inFlightWrites with a single counter. - Keep admission rules and drain behavior unchanged. - Update any stats or tests that rely on read/write split (if introduced). [x] 11 Remove segmentState from segment properties schema (Risk: MEDIUM) - Remove SegmentKeys.SEGMENT_STATE from IndexPropertiesSchema. - Update SegmentPropertiesManager to drop getState/setState usage. - Decide migration behavior for existing properties files. [x] 12 Add getMaxNumberOfDeltaCacheFiles() to Segment (Risk: LOW) - Implement in SegmentImpl. - Update any callers/tests that need the accessor. [x] 13 Add maxNumberOfDeltaCacheFiles to IndexConfiguration + builder (Risk: MEDIUM) - Add config property, validation, defaults, and persistence. - Plumb through SegmentBuilder/SegmentConf as needed. [x] 14 Wire delta cache file cap into SegmentMaintenancePolicyThreshold (Risk: MEDIUM) - Add the max file count to policy constructor/state. - Pass the value from configuration. [x] 15 Enforce delta cache file cap in policy (Risk: MEDIUM) - In SegmentMaintenancePolicyThreshold (~line 44), trigger maintenance when delta cache file count exceeds the cap. [x] 16 Enforce segment lock test on open (Risk: MEDIUM) - Add a test that opening a segment with an existing .lock fails. - Cover both in-memory and filesystem-backed directories. [x] 17 Document locked-directory behavior in SegmentBuilder (Risk: LOW) - Clarify how builder reacts when the segment directory is already locked. [x] 18 Acquire segment lock before prepareBuildContext() (Risk: MEDIUM) [x] 19 Add SegmentRegistryResult + status + adapters (Risk: MEDIUM) - Define result/status types and adapters to/from SegmentResult. - Unit tests only; no wiring. [x] 20 Add registry state enum + gate (Risk: MEDIUM) - Define SegmentRegistryState and a small gate/state holder. - Unit tests only; no integration. [x] 21 Introduce SegmentRegistry interface + SegmentRegistryImpl (Risk: MEDIUM) - Keep interface minimal and keep SegmentResult returns for now. - Rename existing class to impl and update call sites in same step. [x] 22 Add SegmentRegistrySyncAdapter with BUSY retry (Risk: MEDIUM) - Wrap SegmentRegistry and retry BUSY (use IndexRetryPolicy). [x] 23 Wire state gate into impl (Risk: HIGH) - BUSY only from registry state; FREEZE only around map changes. - Keep SegmentResult API to avoid broad changes. [x] 24 Switch registry API to SegmentRegistryResult (Risk: HIGH) - Introduce SegmentRegistryLegacyAdapter to keep old callers working. - Migrate call sites/tests, then remove legacy adapter. [x] 53.1 Split “apply” DTO (Risk: LOW) - Introduce a small DTO for split apply (oldId, lowerId, upperId, min/max keys, status). - Unit tests for DTO invariants. [x] 53.2 Split worker extraction (Risk: MEDIUM) - Refactor split execution to: open FULL_ISOLATION iterator, run split on maintenance executor, return DTO without touching registry or map. - Ensure iterator is closed in all paths. - Unit tests for result wiring. [x] 53.3 Registry apply entry point (Risk: MEDIUM) - Add registry apply method that (a) FREEZE, (b) update cache (remove old, add new ids), (c) exit FREEZE. - Keep key‑map lock separate. - Unit tests for cache mutation under FREEZE. [x] 53.4 Key‑map persistence (Risk: MEDIUM) - Update key‑to‑segment map using its own lock/adapter. - Persist map file after in‑memory registry apply. - Tests that map persistence order is enforced. [x] 53.5 Old segment deletion (Risk: MEDIUM) - Delete old segment directory only after map persistence and after iterator/segment locks are released. - Tests that deletion never happens before map persistence. [x] 53.6 Lock order contract (Risk: LOW) - Enforce lock order (segment → registry → map; release map → registry → segment) and document in code. - Add a small test or assertion helper to catch order violations. [x] 53.7 Split concurrency scenarios (Risk: HIGH) - Tests: - split does not run under registry FREEZE (short window) - split returns BUSY on lock conflict and retries safely - concurrent get/put during split never sees missing segment mapping [x] 58.1 Split: keep split IO outside registry freeze (Risk: HIGH) - SegmentSplitCoordinator.split(...): ensure all IO (iterator open, writes) happens before any registry FREEZE. - SegmentSplitStepOpenIterator: keep FULL_ISOLATION acquisition once per split. - SegmentSplitCoordinator.hasLiveEntries(...): now uses FAIL_FAST to avoid a second FULL_ISOLATION lock. - Tests may fail if ordering assumptions change; fix after step 58.4. [x] 58.2 Split: invert lock order for apply phase (Risk: HIGH) - SegmentSplitCoordinator.applySplitPlan(...): remove outer keyToSegmentMap.withWriteLock(...). - SegmentRegistryImpl.applySplitPlan(...): acquire registry freeze first, then call onApplied which acquires key-map write lock. - Update lock-order enforcement flags to match registry -> key-map. [x] 58.3 Split: propagate lock-order flags into key-map adapter (Risk: MEDIUM) - KeyToSegmentMapSynchronizedAdapter: set/clear keyMapLockHeld around write-lock acquisition when enforcement is enabled. - Ensure registry checks validate registryLockHeld before key-map lock. [x] 58.4 Split: finalize apply/cleanup ordering (Risk: MEDIUM) - Ensure apply evicts old segment instance and closes it via SegmentRegistryImpl.closeSegmentInstance(...). - Keep key-map flush outside registry freeze: keyToSegmentMap.optionalyFlush() only after apply OK. - Delete old segment files only after apply succeeds and locks released. [x] 58.5 Split: test alignment (Risk: MEDIUM) - Add/update tests to assert no directory swap in split flow. - Add tests for enforced lock order (registry -> key-map). - Add tests for split failure cleanup of new segments. [x] 63 SegmentIdAllocator in segmentregistry (Risk: MEDIUM) - Add SegmentIdAllocator interface and directory-backed implementation. - Scan AsyncDirectory.getFileNamesAsync() for segment directories named segment-00001 (prefix segment- + 5 digits) and initialize next id to max+1 (or 1 when none found). - Allocate ids with thread-safe counter.

[x] 64 Include directories in Directory.getFileNames() (Risk: LOW) - Ensure Directory.getFileNames() returns subdirectory names as well. - Update MemDirectory to include subdirectory names in its stream. - Verify no tests rely on file-only behavior.

[x] 65 Remove id allocation from key-to-segment map (Risk: MEDIUM) - Remove nextSegmentId and findNewSegmentId() from KeyToSegmentMap and its synchronized adapter. - Remove updates to nextSegmentId in tryExtendMaxKey/updateMaxKey.

[x] 66 Wire allocator into registry + index (Risk: MEDIUM) - Update SegmentRegistryImpl to use SegmentIdAllocator instead of supplier. - Update SegmentIndexImpl wiring and split coordinator to use registry allocation only. - Update tests to stub allocator or use directory-backed allocator.

[x] 67 Tests + docs for allocator move (Risk: LOW) - Add allocator tests (empty dir, max id, thread-safety). - Update docs/architecture/registry/registry.md to reflect registry allocator.

[x] 62 Add SegmentRegistryBuilder modeled after Segment.builder(...) (Risk: MEDIUM) - Add SegmentRegistryBuilder in segmentregistry with required inputs (directory, type descriptors, config, maintenance executor). - Provide optional setters for SegmentIdAllocator and SegmentFactory. - Add static factory SegmentRegistry.builder(...) (or on impl) to return builder. - Move default wiring (factory + allocator creation) into builder. - Keep SegmentRegistryImpl constructor with full DI for tests. - Update SegmentIndexImpl (and other callers) to use the builder. - Add unit tests for missing required fields and default wiring.

[x] 68 Align split apply with registry FREEZE + lock-order enforcement (Risk: MEDIUM) - Expose registry FREEZE in SegmentRegistryAccess (or equivalent) so split apply can run under FREEZE while holding handler + key-map locks. - While FREEZE is active, set hestiastore.registryLockHeld=true so key-map lock order enforcement can be enabled safely. - Wrap key-map apply + cache eviction inside the FREEZE window.

[x] 69 Separate cache eviction from file deletion in split apply (Risk: MEDIUM) - Add registry operation to evict a specific segment from cache while the handler lock is held (no file deletion). - After apply: evict old segment under handler+FREEZE, release iterator, unlock handler, then delete old segment files via registry helper. - Keep deleteSegment behavior for general callers unchanged.

[x] 70 Apply-failure should mark registry ERROR (Risk: LOW) - When split apply fails mid-update, set registry gate to ERROR and surface the failure (avoid silent BUSY loops). - Add tests for apply-failure transitions.

[x] 71 SegmentRegistry: expose NOT_FOUND for missing segments (Risk: LOW) - Add NOT_FOUND to SegmentRegistryResultStatus + factory method. - Return NOT_FOUND when getSegment targets a missing directory. - Keep createSegment creating new segments even when others exist. - Tests: missing-segment lookup, status plumbing.

[x] 72 SegmentRegistryBuilder: configure only via with* methods (Risk: LOW) - Remove constructor parameters from SegmentRegistryBuilder. - Ensure all required inputs are set via with... methods. - Update call sites and tests to use the builder setters.

[x] 73 SegmentRegistry handler-backed cache (Risk: MEDIUM) - Make SegmentRegistryCache store SegmentHandler per SegmentId (segment + lock state as one entry). - Keep SegmentRegistry.getSegment returning SegmentRegistryResult to signal registry state; map LOCKED to BUSY. - Add internal accessors for handler-only flows (split/evict) without exposing handler in the public registry API. - Update eviction logic to skip LOCKED handlers and keep cache/handler in sync. - Tests: locked entry not evicted, handler/segment consistency, BUSY returned when handler locked.

[x] 74 RegistryAccess: lock via SegmentHandler (Risk: MEDIUM) - Add internal accessor that returns the SegmentHandler for a segmentId + expected segment instance (BUSY/ERROR when mismatch). - Remove lockSegmentHandler/unlockSegmentHandler from SegmentRegistryLocking and SegmentRegistryAccess. - Update SegmentRegistryAccessAdapter to expose handler instead of lock/unlock methods.

[x] 75 Split flow: use handler lock directly (Risk: MEDIUM) - In SegmentSplitCoordinator, acquire handler via registry access and call handler.lock()/handler.unlock() directly. - Keep BUSY mapping when handler is locked. - Ensure eviction path still validates handler instance + state.

[x] 76 Tests + cleanup for handler locking (Risk: LOW) - Update tests that currently call registry lock/unlock to use handler locking instead. - Remove unused lock methods from SegmentRegistryImpl. - Verify eviction skips locked handlers and BUSY is returned when locked.

[x] 77 SegmentRegistry target-state rollout from docs/architecture/registry/registry.md (Risk: HIGH) - Goal: make implementation fully match the documented registry model (state gate + per-key Entry state machine + single-flight load + bounded cache eviction + unload semantics). - Global rule: every step in 77.x must preserve behavioral parity with docs/architecture/registry/registry.md. If behavior must change, update registry.md and diagrams first in the same PR before code changes. - Hard constraints: - no global lock in get hot path - unrelated keys must not block each other - per-key wait only on the same Entry - LOADING waits, UNLOADING maps to BUSY - load/open failures are exception-driven - Exit criteria: - behavior parity with docs/architecture/registry/registry.md and docs/architecture/images/registry-seq*.plantuml - all new/updated tests green - no flakiness in repeated concurrency runs

[x] 77.1 Freeze target contract and remove ambiguity (Risk: HIGH) - Pin docs/architecture/registry/registry.md + diagrams as source of truth. - Explicitly list non-negotiable runtime rules in code comments/Javadocs: - state gate mapping: READY normal, FREEZE -> BUSY, CLOSED -> CLOSED, ERROR -> ERROR - cache state mapping: LOADING wait, UNLOADING -> BUSY - failed unload leaves UNLOADING (documented behavior) - Acceptance: - no contradictory comments/Javadocs in segmentregistry package - docs and code contracts use same method names

[x] 77.2 Implement/align per-key Entry API contract (Risk: HIGH) - Ensure SegmentRegistryCache.Entry exposes and follows: - tryStartLoad() - waitWhileLoading(currentAccessCx) - finishLoad(value) - fail(exception) - tryStartUnload() - finishUnload() - getEvictionOrder() - Ensure lock/condition is strictly per-entry (no cross-key monitor). - Acceptance: - transitions only: MISSING->LOADING->READY->UNLOADING->MISSING - invalid transitions return fast/fail predictably

[x] 77.3 Align get(key) miss path to single-flight semantics (Risk: HIGH) - Use putIfAbsent race semantics correctly: - winner: entryInMap == null then load - loser: wait on the existing entry from map - Ensure wait target is the entry stored in map, not a local temporary. - Ensure load failure path calls fail(exception), wakes waiters, and removes the expected entry from map. - Acceptance: - exactly one loader execution per key under high contention - all losers observe winner result or propagated exception

[x] 77.4 Align get(key) hit path semantics (Risk: HIGH) - READY: immediate return + recency update. - LOADING: block only on same entry until READY/failure. - UNLOADING: do not wait; return BUSY to caller. - Acceptance: - no waiting on keys in UNLOADING - no blocking between unrelated keys

[x] 77.5 Implement bounded eviction flow per docs (Risk: HIGH) - Keep capacity enforcement in cache layer. - Candidate selection: - LRU by accessCx - exclude requested key in removeLastRecentUsedSegment(exceptSegmentId) - only READY candidates can move to UNLOADING - Start close asynchronously, remove only after close success. - Acceptance: - eviction never unloads exceptSegmentId - failed tryStartUnload retries candidate selection without global stall

[x] 77.6 Lifecycle executor behavior and failure handling (Risk: HIGH) - Verify load/open and close/unload execution contexts follow design: - load for seq03 scenario in caller thread - close/unload on lifecycle executor thread - Define exact reaction to close failure: - keep entry UNLOADING - subsequent get returns BUSY - do not remove cache entry - Acceptance: - no caller-thread close IO - failed close path is deterministic and test-covered

[x] 77.7 Registry gate lifecycle alignment (Risk: MEDIUM) - Ensure startup: FREEZE -> READY. - Ensure close flow: READY -> FREEZE -> CLOSED. - Ensure idempotent close and terminal ERROR semantics. - Acceptance: - gate transitions are atomic and race-safe under concurrent calls - status mapping is consistent for all operations

[x] 77.8 API/status cleanup to match exception-driven load policy (Risk: MEDIUM) - Preserve SegmentRegistryAccess for status-oriented flows. - Keep load/open failure as propagated runtime exception from registry load paths (per docs). - Remove or deprecate status branches that conflict with this policy. - Acceptance: - no mixed behavior where same failure is sometimes status, sometimes throw

[x] 77.9 Unit tests for Entry/cache state machine (Risk: HIGH) - Extend SegmentRegistryCacheTest with deterministic tests: - single-flight: same key, many threads -> loader called once - wait-on-loading: loser threads block and then return same value - load failure wakeup: all waiters receive same failure - unloading maps to BUSY (no waiting) - eviction excludes exceptSegmentId - close failure leaves UNLOADING - Use CountDownLatch/CyclicBarrier to force races. - Add @Timeout to every concurrency-sensitive test.

[x] 77.10 Registry-level behavior tests (Risk: HIGH) - Update/add tests in: - SegmentRegistryImplTest - SegmentRegistryStateMachineTest - SegmentRegistryAccessImplTest - Verify: - gate mapping (FREEZE/BUSY, CLOSED/CLOSED, ERROR/ERROR) - startup transition (FREEZE->READY) - getSegment behavior across READY/LOADING/UNLOADING - exception propagation on load/open failure

[x] 77.11 High-concurrency integration verification (Risk: HIGH) - Extend/execute: - IntegrationSegmentIndexConcurrencyTest - SegmentIndexImplConcurrencyTest - SegmentSplitCoordinatorConcurrencyTest - Add focused registry stress tests (new class): - many threads on same key (single-flight proof) - many threads on different keys (independence proof) - eviction + concurrent gets + split coordinator interaction - Run repeated stress cycles to catch flakes. - Completed: - Added and executed src/test/java/org/hestiastore/index/segmentindex/SegmentRegistryConcurrencyStressTest.java. - Passed: mvn -q -Dtest=IntegrationSegmentIndexConcurrencyTest,SegmentIndexImplConcurrencyTest,SegmentSplitCoordinatorConcurrencyTest,SegmentRegistryConcurrencyStressTest test - Flake gate passed: 20/20 repeated runs with 0 failures.

[x] 77.12 Quality gates and release checklist (Risk: HIGH) - Mandatory local gates before merge: - targeted unit tests: mvn -q -Dtest=SegmentRegistryCacheTest,SegmentRegistryImplTest,SegmentRegistryStateMachineTest test - concurrency/integration tests: mvn -q -Dtest=IntegrationSegmentIndexConcurrencyTest,SegmentIndexImplConcurrencyTest,SegmentSplitCoordinatorConcurrencyTest test - full verification: mvn verify - Flake gate: - rerun concurrency suite N times (recommended N=20) and require 0 flakes. - Code quality gate: - no TODO/FIXME left in touched files - Javadocs reflect final behavior - diagrams and registry.md updated if behavior changed - Completed: - Passed targeted unit tests: mvn -q -Dtest=SegmentRegistryCacheTest,SegmentRegistryImplTest,SegmentRegistryStateMachineTest test - Passed concurrency/integration tests: mvn -q -Dtest=IntegrationSegmentIndexConcurrencyTest,SegmentIndexImplConcurrencyTest,SegmentSplitCoordinatorConcurrencyTest,SegmentRegistryConcurrencyStressTest test - Passed full verification: mvn verify - TODO/FIXME scan on touched files: none found.

[x] 77.13 Rollout and fallback plan (Risk: MEDIUM) - Deliver in small PRs matching 77.1-77.12 order. - After each PR: - run targeted regression suite - update docs/architecture/registry/registry.md if contract changed - Keep a temporary feature flag only if needed for safe migration. - Remove fallback/compatibility code when final parity is reached. - Completed: - Work delivered incrementally following 77.1 -> 77.12 sequence. - Regression suites executed after key steps and before final merge gate. - No temporary feature flag required for this rollout.