Refactor backlog

Active

[ ] 59.1 Concurrency: remove lock-order inversion in core ops (Risk: HIGH) - SegmentIndexCore.get/put: avoid holding key-map read lock while calling SegmentRegistry.getSegment or touching segments. - Use key-map snapshot + version re-check on retry/BUSY paths. - Tests: IntegrationSegmentIndexConcurrencyTest + new split/put stress.

Planned

High

[ ] 78 Monitoring/Management platform rollout (Risk: HIGH) - Goal: evolve from in-process counters to multi-JVM monitoring and control without forcing Micrometer/Prometheus dependencies into core. - Delivery model: phase-gated rollout where each phase is releasable and backward compatible. - Constraints: - Core package must not depend on Micrometer, Prometheus, servlet stacks, or UI classes. - Runtime control endpoints must be explicit allowlist operations only (no generic "execute command" style endpoint). - All mutating management operations must be auditable.

[ ] 78.1 Define source/module boundaries and package contracts (Risk: HIGH) - Target logical modules/packages: - org.hestiastore.index.* (core) - org.hestiastore.monitoring.* (metrics model + exporter adapters) - org.hestiastore.management.api.* (shared DTOs/contracts) - org.hestiastore.management.agent.* (node-local REST API in index JVM) - org.hestiastore.console.* (web UI / control plane) - Start in single-module codebase with strict package boundaries to keep later physical split low risk. - Add architecture doc with allowed dependency direction: core <- monitoring <- management.agent <- console and management.api shared by agent/console. - Acceptance: - No core imports from monitoring/agent/console packages. - Checkstyle/ArchUnit (or similar) rule blocks forbidden imports.

[ ] 78.2 Add stable core metrics snapshot API (Risk: HIGH) - Introduce immutable public snapshot types in core for index/segment metrics (e.g. op counters, bloom stats, segment counts, state). - Add SegmentIndex.metricsSnapshot() (or equivalent read-only API). - Keep existing behavior intact while wiring current counters into snapshot. - Make counters thread-safe (LongAdder/AtomicLong) where currently not. - Define compatibility policy: - new fields may be added, - existing field names/semantics cannot silently change. - Acceptance: - Unit/integration tests for snapshot consistency under concurrent load. - Docs page with metric field definitions and semantics.

[ ] 78.3 Build monitoring bridge layer (Micrometer/Prometheus/JMX) (Risk: HIGH) - Implement monitoring adapters in org.hestiastore.monitoring.*: - Micrometer binder reading from core snapshot API. - Prometheus exposition support (via Micrometer registry or direct bridge). - Optional JMX MBean exporter mapped from the same snapshot model. - Ensure adapters can be created/removed without restarting index (where runtime allows). - Define metric naming/tag conventions (hestiastore_*, stable tag set). - Acceptance: - Prometheus scrape returns expected metrics and labels. - Zero adapter overhead when monitoring package is not used.

[ ] 78.4 Add management API contracts and versioning (Risk: HIGH) - Create org.hestiastore.management.api.* DTOs: - NodeStateResponse, MetricsResponse, ActionRequest/Response, ConfigPatchRequest, ErrorResponse. - Version endpoints from start (/api/v1/...) and define deprecation rules. - Include idempotency and safety semantics for actions: - flush, compact, selected config patch operations. - Acceptance: - OpenAPI (or equivalent) published with examples. - Contract tests verify backward-compatible serialization.

[ ] 78.5 Implement node-local management agent (Risk: HIGH) - Add lightweight REST server integration for index JVM process: - GET /api/v1/state - GET /api/v1/metrics - POST /api/v1/actions/flush - POST /api/v1/actions/compact - PATCH /api/v1/config (allowlist runtime-safe keys only) - Include health and readiness endpoints for deployment integration. - Add per-request audit logging for mutating endpoints. - Acceptance: - End-to-end test: invoke actions and verify effect on index state. - Negative tests for forbidden config keys and invalid state transitions.

[ ] 78.6 Implement central console web application (Risk: HIGH) - Build org.hestiastore.console.* with capabilities: - register/manage multiple index JVM nodes, - poll agent APIs and display key read/write/latency/segment metrics, - trigger safe operations (flush/compact) with confirmation UX, - show recent audit/event log entries. - Keep UI read-first: write controls separated and permission-gated. - Define minimal dashboard first; defer advanced analytics to later items. - Acceptance: - Multi-node dashboard works for at least 3 registered nodes. - Action execution shows pending/success/failure lifecycle.

[ ] 78.7 Secure transport, authz, and audit trail (Risk: HIGH) - Agent <-> console transport: - enforce TLS (prefer mTLS in production profiles), - token- or cert-based authn, - role-based authz (read, operate, admin). - Add immutable audit records for mutating calls: actor, target node, endpoint, payload digest, result, timestamp. - Add rate limits and retry/backoff policy for control operations. - Acceptance: - Security integration tests for unauthorized/forbidden scenarios. - Audit log verification tests for all mutating endpoints.

[ ] 78.8 Packaging, release strategy, and migration path (Risk: HIGH) - Release artifacts initially from same repo: - hestiastore (core) - hestiastore-monitoring (bridges/exporters) - hestiastore-management-agent - hestiastore-console - Keep aligned versions per release line (for example 0.2.x for all). - Document migration from single-module to multi-module build: move packages with no API break using prior boundary rules from 78.1. - Acceptance: - Build produces separate jars and integration tests across artifacts pass. - Release docs include compatibility matrix and upgrade notes.

[ ] 78.9 Rollout stages with explicit quality gates (Risk: HIGH) - Stage A: core snapshot API only; no external exporters. - Stage B: monitoring bridge with Prometheus scrape + docs. - Stage C: node agent endpoints (read-only first, then mutating). - Stage D: console UI for multi-node visibility, then controlled actions. - Required gates per stage: - load/perf regression budget defined and met, - concurrency tests for stats correctness, - failure-mode tests (node down, timeout, partial responses), - operational docs/runbook updated. - Acceptance: - Each stage releasable independently. - Rollback procedure documented and tested.

Medium

[ ] 54 Dedicated executor for index async ops (Risk: MEDIUM) - Use a dedicated, bounded executor for SegmentIndexImpl.runAsyncTracked (no common pool). - Define rejection policy: map saturation to BUSY/error with clear message. - Ensure close waits for in‑flight async work or cancels safely. - Tests: saturation/backpressure, close ordering, no caller‑thread IO.

[ ] 55 Replace busy spin loops with retry + jitter (Risk: MEDIUM) - Replace Thread.onSpinWait/busy loops in split iterator open and other retry paths with IndexRetryPolicy + jitter. - Make timeouts explicit and surface IndexException with operation name. - Tests: BUSY retry exits on READY, timeout path, interrupt handling.

[ ] 56 Key‑to‑segment map read contention reduction (Risk: MEDIUM) - Evaluate snapshot‑based reads or StampedLock for high‑read workloads. - Keep version validation semantics intact for split/extend paths. - Tests: concurrent get/put under splits, no missing mappings, no deadlocks.

[ ] 57 Streaming iterators without full materialization (Risk: MEDIUM) - Replace list materialization in getStream/FULL_ISOLATION with streaming merge iterators over write/delta caches and segment files. - Ensure iterator close releases resources and does not leak locks. - Tests: large data set memory profile, iterator isolation correctness.

[ ] 5 Stop materializing merged cache lists on read (Risk: MEDIUM) - Problem: SegmentReadPath.openIterator calls getAsSortedList, building full merged lists for each iterator. - Fix: provide streaming merge iterator over delta/write caches without full list materialization. - Options: - Option A (recommended): switch UniqueCache to TreeMap / ConcurrentSkipListMap, add a sorted iterator API, and merge cache iterators (write/frozen/delta) with MergedEntryIterator in the FULL_ISOLATION path. - Option B: keep HashMap / ConcurrentHashMap for get/put and maintain a sorted key index (TreeSet / ConcurrentSkipListSet) for iteration; expose a sorted iterator over keys + map values and merge like Option A. [ ] 6 Stream compaction without full cache snapshot (Risk: MEDIUM) - Problem: compaction snapshots the full cache list in memory. - Fix: stream from iterators or chunk snapshot to bounded buffers. [ ] 7 Stream split without full cache snapshot (Risk: MEDIUM) - Problem: split uses FULL_ISOLATION iterator backed by full list snapshot. - Fix: use streaming iterator or chunked splitting to cap memory. [ ] 8 Avoid full materialization in IndexInternalConcurrent.getStream (Risk: MEDIUM) - Problem: method loads all entries into a list before returning a stream. - Fix: return a streaming spliterator tied to iterator close. [ ] 9 Add eviction for heavy segment resources (Risk: MEDIUM) - Problem: SegmentResourcesImpl caches bloom/scarce forever. - Fix: tie resource lifetime to segment eviction or add per-resource LRU; ensure invalidate/close releases memory.

Low

[ ] 10 Allow cache shrink after peaks (Risk: LOW) - Problem: UniqueCache.clear() keeps underlying HashMap capacity. - Fix: rebuild map on clear when size exceeds a threshold; add tests.

Other refactors (non-OOM)

[ ] 13 Implement a real registry lock (Risk: MEDIUM) - Add an explicit lock around registry mutations + file ops. - Replace/rename executeWithRegistryLock to actually serialize callers. - Add tests for split/compact interleaving and segment visibility. [ ] 14 Replace common-pool async with dedicated executor + backpressure (Risk: MEDIUM) - Add/configure a dedicated executor for async API calls. - Track in-flight tasks and wait on close; add queue/backpressure limits. - Add tests for saturation, cancellation, and close ordering. [ ] 15 Define IndexAsyncAdapter.close() behavior (Risk: MEDIUM) - Decide on wait vs non-blocking close and document it. - Add tests that match the chosen contract. [ ] 16 Replace busy-spin loops with retry+backoff+timeout (Risk: MEDIUM) - Use IndexRetryPolicy in SegmentsIterator and split iterator open. - Add interrupt handling and timeout paths with clear error messaging. - Add tests for BUSY loops and timeout behavior. [ ] 17 Stop returning null on CLOSED in SegmentIndexImpl.get (Risk: MEDIUM) - Decide API surface (exception vs status/Optional). - Update callers and docs to distinguish "missing" vs "closed". - Add tests for CLOSED/ERROR paths. [ ] 19 Propagate MDC context to async ops and stream consumption (Risk: LOW) - Capture MDC context on submit and reapply in async tasks. - Wrap stream/iterator consumption with MDC scope; clear on close. - Add tests asserting index.name appears in async logs. [ ] 41 Unify async execution for segment index (Risk: MEDIUM) - Route SegmentIndexImpl.runAsyncTracked and IndexAsyncAdapter.runAsyncTracked through a shared, dedicated executor (no common pool). - Decide whether to keep both async layers or make one delegate to the other. - Align async close behavior and document rejection/backpressure outcomes. [ ] 42 Revisit SegmentAsyncExecutor rejection policy (Risk: MEDIUM) - Ensure maintenance IO never runs on caller threads. - Choose AbortPolicy + BUSY/error mapping or custom handler. - Update docs and metrics if behavior changes. [ ] 43 Replace registry close polling with completion signal (Risk: MEDIUM) - Add a close completion handle or signal in Segment. - Update SegmentRegistry.closeSegmentIfNeeded to wait on completion rather than polling getState(). - Ensure close-from-maintenance thread does not deadlock. [ ] 44 Normalize split close/eviction flow (Risk: MEDIUM) - Centralize segment close/eviction in SegmentRegistry. - Remove direct segment.close() calls from split coordinator. - Ensure split outcome updates mapping, eviction, and close are ordered. [ ] 45 Replace spin-wait in SegmentConcurrencyGate.awaitNoInFlight (Risk: LOW) - Use wait/notify or ManagedBlocker with timeout. - Preserve FREEZE semantics and early exit on state change. - Add tests for drain behavior under load. [ ] 46 Align iterator isolation naming and semantics (Risk: LOW) - Choose between FAIL_FAST/FULL_ISOLATION and the legacy INTERRUPT_FAST/STOP_FAST terminology. - Update docs, comments, and any mapping code consistently. [ ] 47 Consolidate BUSY/CLOSED retry loops (Risk: LOW) - Extract shared retry helper for segmentindex operations. - Replace ad-hoc loops in SegmentRegistry, SegmentSplitCoordinator, and SegmentIndexImpl. - Keep backoff/timeout semantics and error messages consistent.

Testing/Quality

[ ] 48 Test executor saturation and backpressure paths (Risk: MEDIUM) - Add tests for SegmentAsyncExecutor queue saturation and rejection handling. - Add tests for SplitAsyncExecutor rejection and in-flight cleanup. - Verify maintenance IO never runs on caller threads. [ ] 49 Test close path interactions (Risk: MEDIUM) - Close while segment is MAINTENANCE_RUNNING and ensure backoff/timeout works. - Close during async operations should fail fast with clear error. - Assert no deadlock when waiting for segment READY/CLOSED. [ ] 50 Test split failure cleanup (Risk: MEDIUM) - Force exceptions in split steps and assert splitsInFlight clears. - Validate directory swap and key-to-segment map remain consistent. - Ensure resources/locks are released on failure. [ ] 51 Test maintenance failure transitions (Risk: MEDIUM) - Inject failures in maintenance IO and publish phases. - Assert segment moves to ERROR and callers see ERROR status. - Verify rejection handling does not leave the segment in FREEZE.

Ready

(move items here when they are scoped and ready to execute)

Deferred (segment scope, do not touch now)

Maintenance tasks

[ ] M37 Audit segment package for unused or test-only code (Risk: LOW) - Limit class, method and variables visiblity - Identify unused classes/methods/fields. - Remove code only referenced by tests or move test helpers into test scope. - Ensure public API docs and tests remain consistent after cleanup. [ ] M38 Review segment package for test and Javadoc coverage (Risk: LOW) - Ensure each class has a JUnit test or document why coverage is excluded. - Ensure each public class/method has Javadoc; add missing docs. [ ] M39 Audit segmentindex package for unused or test-only code (Risk: LOW) - Limit class, method and variables visiblity - Identify unused classes/methods/fields. - Remove code only referenced by tests or move test helpers into test scope. - Ensure public API docs and tests remain consistent after cleanup. [ ] M40 Review segmentindex package for test and Javadoc coverage (Risk: LOW) - Ensure each class has a JUnit test or document why coverage is excluded. - Ensure each public class/method has Javadoc; add missing docs. [ ] M41 Audit segmentregistry package for unused or test-only code (Risk: LOW) - Limit class, method and variables visiblity - Identify unused classes/methods/fields. - Remove code only referenced by tests or move test helpers into test scope. - Ensure public API docs and tests remain consistent after cleanup. [ ] M42 Review segmentregistry package for test and Javadoc coverage (Risk: LOW) - Ensure each class has a JUnit test or document why coverage is excluded. - Ensure each public class/method has Javadoc; add missing docs. - See docs/development/segmentregistry-audit.md for audit notes.

Done (Archive)

(keep completed items here; do not delete)

[x] 61.1 Wire SegmentHandler into key-to-segment map usage (Risk: HIGH) - Replace direct segment references in key-to-segment map paths with SegmentHandler usage. - Ensure handlers are used consistently for segment access in index flows.

[x] 61.2 Refactor split algorithm around handler locks (Risk: HIGH) - When a segment is eligible for split: acquire handler lock, re-check eligibility under lock, then either unlock or proceed with split. - Split apply ordering: update map on disk first, then in-memory map, then close old segment, delete files, and finally unlock. - Ensure failures unlock the handler and clean up temporary segments. - Update docs/architecture/registry/registry.md to reflect handler-based locking.

[x] 61.3 Simplify SegmentHandler lock API (Risk: MEDIUM) - Keep internal handler state as READY/LOCKED. - lock() returns SegmentHandlerLockStatus with OK or BUSY. - Replace token-based lock/unlock usage across registry + split flows. - Update handler-related tests to match the new API.

[x] 60 Move registry implementation to segmentregistry package (Risk: MEDIUM) - Move SegmentRegistryImpl, SegmentRegystryState, SegmentRegistryCache, SegmentRegistryState, and SegmentRegistryResult to org.hestiastore.index.segmentregistry. - Update imports/usages in segmentindex and tests. - Keep public API surface the same; verify no package-private access leaks.

[x] M41 Audit segmentregistry package for unused or test-only code (Risk: LOW) - Limit class, method and variables visiblity - Identify unused classes/methods/fields. - Remove code only referenced by tests or move test helpers into test scope. - Ensure public API docs and tests remain consistent after cleanup. [x] M42 Review segmentregistry package for test and Javadoc coverage (Risk: LOW) - Ensure each class has a JUnit test or document why coverage is excluded. - Ensure each public class/method has Javadoc; add missing docs.

[x] 59 Introduce SegmentHandler lock gate in segmentindex (Risk: HIGH) - Add SegmentHandler with getSegment() returning SegmentHandlerResult: OK (segment), LOCKED, and handler states READY/LOCKED. - lock() returns a privileged handle/token that allows access to the underlying segment while handler state is LOCKED. - getSegment() must return LOCKED while locked for all non-privileged callers (no segment exposure during lock). - Wire split flow to lock via handler before opening FULL_ISOLATION iterator, then unlock after apply/cleanup. - Add tests: LOCKED is returned during lock; lock holder can operate; unlock restores OK.

[x] 59.2 Concurrency: reduce redundant key-map read locks (Risk: MEDIUM) - Make KeyToSegmentMapSynchronizedAdapter.snapshot() lock-free (volatile snapshot + AtomicLong version). - Keep read locks only for map-only operations; do not wrap segment calls. - Tests: snapshot consistency + existing KeyToSegmentMapTest.

[x] 59.3 Concurrency: limit registry FREEZE to split apply (Risk: MEDIUM) - Remove FreezeGuard usage from SegmentRegistryImpl.getSegment create/ eviction path; keep cache lock for LRU safety. - Reserve registry FREEZE for split apply only. - Tests: split + eviction concurrency (SegmentRegistryCacheTest, SegmentSplitCoordinatorConcurrencyTest, integration stress).

[x] 52 Remove automatic compaction from segmentindex (Risk: MEDIUM) - Drop pre-split compaction in SegmentSplitCoordinator and remove SegmentSplitterPolicy.shouldBeCompactedBeforeSplitting + related retry logic. - Simplify split planning to use estimated key counts directly (remove compaction/tombstone hints from SegmentSplitterPolicy or replace with a minimal estimate helper). - Keep SegmentIndex.compact / compactAndWait as the only segmentindex-triggered compaction entry point; update Javadocs to reflect compaction being handled inside the segment package otherwise. - Update tests that construct SegmentSplitterPolicy and add coverage that split does not call Segment.compact while user-invoked compaction still does.

[x] 1 everiwhere rename maxNumberOfKeysInSegmentWriteCacheDuringFlush to maxNumberOfKeysInSegmentWriteCacheDuringMaintenance including all configurations setter getter all all posssible usages. [x] 2 Wnen write cache reach size as maxNumberOfKeysInSegmentWriteCacheDuringMaintenance than response to put with BUSY. [x] 3 UniqueCache should not use read/write reentrant lock. It's property of concurrent hash map. [x] 4 Enforce maxNumberOfSegmentsInCache in SegmentRegistry (Risk: MEDIUM) - Problem: segments are cached unbounded; memory grows as segments grow. - Fix: implement LRU or size-bounded cache; evict + close segments and invalidate resources on eviction. [x] 18 Provide index-level FULL_ISOLATION streaming (Risk: MEDIUM) - Add overload or option to request FULL_ISOLATION on index iterators. - Implement iterator that holds exclusivity across segments safely. - Add tests for long-running scans during maintenance. [x] 23 Refactor Segment.close() to async fire-and-forget with READY-only entry (Risk: MEDIUM) - Change Segment to drop CloseableResource and return SegmentResult<Void> from close(). - Close starts only in READY: transition to FREEZE, drain, optionally flush write cache, then run close work on maintenance thread. - Completion marks CLOSED, releases locks/resources, and stops admissions. - Move close-state tracking into segment index (avoid Segment.wasClosed()). - Update state machine/gate/docs/tests to match the new close lifecycle. [x] 24 Add integration test: in-memory segment lock prevents double-open (Risk: LOW) - Create an integration test that opens a segment in a directory and asserts a second open in the same directory fails (lock enforcement). [x] 25 Simplify Segment.flush()/compact() to return status only (Risk: MEDIUM) - Remove CompletionStage return values from flush() and compact(). - Operation completion is observable when segment state returns to READY. - Update callers, docs, and tests that wait on completion stages. [x] 25 Create directory API and layout helpers (Risk: HIGH) - Add Directory.openSubDirectory(String) + AsyncDirectory.openSubDirectory(String) and lifecycle helpers Directory.mkdir(String) / Directory.rmdir(String). - Implement in FsDirectory, AsyncDirectoryAdapter, and in-memory MemDirectory equivalents; define semantics for non-empty rmdir. - Add SegmentDirectoryLayout (or similar) that builds names for: index, scarce, bloom, delta, properties, and lock files. - Add tests for directory creation and layout mapping.

[x] 26 Introduce segment-rooted SegmentFiles (Risk: HIGH) - Add a SegmentFiles constructor that accepts a segment root AsyncDirectory (instead of a flat base directory + id). - Keep legacy flat layout working (auto-detect existing files, or flag in SegmentBuilder). - Update SegmentBuilder to create/use the segment root directory. - Add tests that both layouts open the same data correctly.

[x] 27 Add per-segment .lock file (Risk: MEDIUM) - Add segment.lock (or .lock) inside the segment directory. - Acquire lock on segment open; release on close. Fail fast on lock held. - Add stale-lock recovery policy (manual delete or metadata timestamp). - Add tests for lock contention and cleanup.

[x] 28 Shared properties file structure (Risk: MEDIUM) - Introduce a common property schema used by segment + segmentindex packages (e.g. IndexPropertiesSchema). - Store schema version and required keys; add migration helpers. - Update SegmentPropertiesManager and IndexConfiguratonStorage to use the shared schema.

[x] 29 Compact flow for directory layout (publish protocol) (Risk: HIGH) - IO phase (MAINTENANCE_RUNNING): - Create a new directory, e.g. segment-00001.next/ or versioned segment-00001/v2/. - Write new index/scarce/bloom/cache files there. - Write properties with state PREPARED + metadata. - Publish phase (short FREEZE): - Mark new directory as ACTIVE in properties (or update a pointer file segment-00001.active). - Reload SegmentFiles/SegmentResources to the new root. - Bump version and return to READY. - Cleanup: - Delete old directory only after publish and resource reload. - Add startup recovery for PREPARED without ACTIVE. - Align with items 11/12 (atomic swaps + map updates).

[x] 30 Split + replace updates (Risk: HIGH) - Update split/rename logic to use directory swaps or pointer updates. - Ensure registry + segmentindex metadata remain consistent. - Add tests for crash recovery and partial swaps. [x] 31 Segment layout uses versioned file names in a single directory (Risk: HIGH) - Name index/scarce/bloom/delta as vNN-* (for example v01-index.sst, v01-scarce.sst, v01-bloom-filter.bin, v01-delta-0000.cache). - Store the active version and counters in manifest.txt (no .active pointer). - Use zero-padded 2-digit versions and 4-digit delta ids. [x] 32 Builder/files treat the provided directory as the segment home (Risk: HIGH) - Require Segment.builder(AsyncDirectory) for construction. - Lock + properties live inside the segment directory. - Resolve active version from properties or detected index files. [x] 33 Compaction/flush publish is memory-only (Risk: HIGH) - IO phase writes versioned files and property updates. - Publish swaps in-memory version/resources and bumps iterator version. - Cleanup old version files asynchronously. [x] 34 Registry/tests align with single-directory versioning (Risk: MEDIUM) - Registry passes segment directories; no active-directory switching. - Update tests to accept versioned names and per-segment directories. [x] 35 Remove unused close monitor in SegmentConcurrencyGate (Risk: LOW) - Remove closeMonitor and signalCloseMonitor since nothing waits on it. - Keep drain behavior in awaitNoInFlight() unchanged. [x] 36 Consolidate in-flight read/write counters in SegmentConcurrencyGate (Risk: LOW) - Replace inFlightReads/inFlightWrites with a single counter. - Keep admission rules and drain behavior unchanged. - Update any stats or tests that rely on read/write split (if introduced). [x] 11 Remove segmentState from segment properties schema (Risk: MEDIUM) - Remove SegmentKeys.SEGMENT_STATE from IndexPropertiesSchema. - Update SegmentPropertiesManager to drop getState/setState usage. - Decide migration behavior for existing properties files. [x] 12 Add getMaxNumberOfDeltaCacheFiles() to Segment (Risk: LOW) - Implement in SegmentImpl. - Update any callers/tests that need the accessor. [x] 13 Add maxNumberOfDeltaCacheFiles to IndexConfiguration + builder (Risk: MEDIUM) - Add config property, validation, defaults, and persistence. - Plumb through SegmentBuilder/SegmentConf as needed. [x] 14 Wire delta cache file cap into SegmentMaintenancePolicyThreshold (Risk: MEDIUM) - Add the max file count to policy constructor/state. - Pass the value from configuration. [x] 15 Enforce delta cache file cap in policy (Risk: MEDIUM) - In SegmentMaintenancePolicyThreshold (~line 44), trigger maintenance when delta cache file count exceeds the cap. [x] 16 Enforce segment lock test on open (Risk: MEDIUM) - Add a test that opening a segment with an existing .lock fails. - Cover both in-memory and filesystem-backed directories. [x] 17 Document locked-directory behavior in SegmentBuilder (Risk: LOW) - Clarify how builder reacts when the segment directory is already locked. [x] 18 Acquire segment lock before prepareBuildContext() (Risk: MEDIUM) [x] 19 Add SegmentRegistryResult + status + adapters (Risk: MEDIUM) - Define result/status types and adapters to/from SegmentResult. - Unit tests only; no wiring. [x] 20 Add registry state enum + gate (Risk: MEDIUM) - Define SegmentRegistryState and a small gate/state holder. - Unit tests only; no integration. [x] 21 Introduce SegmentRegistry interface + SegmentRegistryImpl (Risk: MEDIUM) - Keep interface minimal and keep SegmentResult returns for now. - Rename existing class to impl and update call sites in same step. [x] 22 Add SegmentRegistrySyncAdapter with BUSY retry (Risk: MEDIUM) - Wrap SegmentRegistry and retry BUSY (use IndexRetryPolicy). [x] 23 Wire state gate into impl (Risk: HIGH) - BUSY only from registry state; FREEZE only around map changes. - Keep SegmentResult API to avoid broad changes. [x] 24 Switch registry API to SegmentRegistryResult (Risk: HIGH) - Introduce SegmentRegistryLegacyAdapter to keep old callers working. - Migrate call sites/tests, then remove legacy adapter. [x] 53.1 Split “apply” DTO (Risk: LOW) - Introduce a small DTO for split apply (oldId, lowerId, upperId, min/max keys, status). - Unit tests for DTO invariants. [x] 53.2 Split worker extraction (Risk: MEDIUM) - Refactor split execution to: open FULL_ISOLATION iterator, run split on maintenance executor, return DTO without touching registry or map. - Ensure iterator is closed in all paths. - Unit tests for result wiring. [x] 53.3 Registry apply entry point (Risk: MEDIUM) - Add registry apply method that (a) FREEZE, (b) update cache (remove old, add new ids), (c) exit FREEZE. - Keep key‑map lock separate. - Unit tests for cache mutation under FREEZE. [x] 53.4 Key‑map persistence (Risk: MEDIUM) - Update key‑to‑segment map using its own lock/adapter. - Persist map file after in‑memory registry apply. - Tests that map persistence order is enforced. [x] 53.5 Old segment deletion (Risk: MEDIUM) - Delete old segment directory only after map persistence and after iterator/segment locks are released. - Tests that deletion never happens before map persistence. [x] 53.6 Lock order contract (Risk: LOW) - Enforce lock order (segment → registry → map; release map → registry → segment) and document in code. - Add a small test or assertion helper to catch order violations. [x] 53.7 Split concurrency scenarios (Risk: HIGH) - Tests: - split does not run under registry FREEZE (short window) - split returns BUSY on lock conflict and retries safely - concurrent get/put during split never sees missing segment mapping [x] 58.1 Split: keep split IO outside registry freeze (Risk: HIGH) - SegmentSplitCoordinator.split(...): ensure all IO (iterator open, writes) happens before any registry FREEZE. - SegmentSplitStepOpenIterator: keep FULL_ISOLATION acquisition once per split. - SegmentSplitCoordinator.hasLiveEntries(...): now uses FAIL_FAST to avoid a second FULL_ISOLATION lock. - Tests may fail if ordering assumptions change; fix after step 58.4. [x] 58.2 Split: invert lock order for apply phase (Risk: HIGH) - SegmentSplitCoordinator.applySplitPlan(...): remove outer keyToSegmentMap.withWriteLock(...). - SegmentRegistryImpl.applySplitPlan(...): acquire registry freeze first, then call onApplied which acquires key-map write lock. - Update lock-order enforcement flags to match registry -> key-map. [x] 58.3 Split: propagate lock-order flags into key-map adapter (Risk: MEDIUM) - KeyToSegmentMapSynchronizedAdapter: set/clear keyMapLockHeld around write-lock acquisition when enforcement is enabled. - Ensure registry checks validate registryLockHeld before key-map lock. [x] 58.4 Split: finalize apply/cleanup ordering (Risk: MEDIUM) - Ensure apply evicts old segment instance and closes it via SegmentRegistryImpl.closeSegmentInstance(...). - Keep key-map flush outside registry freeze: keyToSegmentMap.optionalyFlush() only after apply OK. - Delete old segment files only after apply succeeds and locks released. [x] 58.5 Split: test alignment (Risk: MEDIUM) - Add/update tests to assert no directory swap in split flow. - Add tests for enforced lock order (registry -> key-map). - Add tests for split failure cleanup of new segments. [x] 63 SegmentIdAllocator in segmentregistry (Risk: MEDIUM) - Add SegmentIdAllocator interface and directory-backed implementation. - Scan AsyncDirectory.getFileNamesAsync() for segment directories named segment-00001 (prefix segment- + 5 digits) and initialize next id to max+1 (or 1 when none found). - Allocate ids with thread-safe counter.

[x] 64 Include directories in Directory.getFileNames() (Risk: LOW) - Ensure Directory.getFileNames() returns subdirectory names as well. - Update MemDirectory to include subdirectory names in its stream. - Verify no tests rely on file-only behavior.

[x] 65 Remove id allocation from key-to-segment map (Risk: MEDIUM) - Remove nextSegmentId and findNewSegmentId() from KeyToSegmentMap and its synchronized adapter. - Remove updates to nextSegmentId in tryExtendMaxKey/updateMaxKey.

[x] 66 Wire allocator into registry + index (Risk: MEDIUM) - Update SegmentRegistryImpl to use SegmentIdAllocator instead of supplier. - Update SegmentIndexImpl wiring and split coordinator to use registry allocation only. - Update tests to stub allocator or use directory-backed allocator.

[x] 67 Tests + docs for allocator move (Risk: LOW) - Add allocator tests (empty dir, max id, thread-safety). - Update docs/architecture/registry/registry.md to reflect registry allocator.

[x] 62 Add SegmentRegistryBuilder modeled after Segment.builder(...) (Risk: MEDIUM) - Add SegmentRegistryBuilder in segmentregistry with required inputs (directory, type descriptors, config, maintenance executor). - Provide optional setters for SegmentIdAllocator and SegmentFactory. - Add static factory SegmentRegistry.builder(...) (or on impl) to return builder. - Move default wiring (factory + allocator creation) into builder. - Keep SegmentRegistryImpl constructor with full DI for tests. - Update SegmentIndexImpl (and other callers) to use the builder. - Add unit tests for missing required fields and default wiring.

[x] 68 Align split apply with registry FREEZE + lock-order enforcement (Risk: MEDIUM) - Expose registry FREEZE in SegmentRegistryAccess (or equivalent) so split apply can run under FREEZE while holding handler + key-map locks. - While FREEZE is active, set hestiastore.registryLockHeld=true so key-map lock order enforcement can be enabled safely. - Wrap key-map apply + cache eviction inside the FREEZE window.

[x] 69 Separate cache eviction from file deletion in split apply (Risk: MEDIUM) - Add registry operation to evict a specific segment from cache while the handler lock is held (no file deletion). - After apply: evict old segment under handler+FREEZE, release iterator, unlock handler, then delete old segment files via registry helper. - Keep deleteSegment behavior for general callers unchanged.

[x] 70 Apply-failure should mark registry ERROR (Risk: LOW) - When split apply fails mid-update, set registry gate to ERROR and surface the failure (avoid silent BUSY loops). - Add tests for apply-failure transitions.

[x] 71 SegmentRegistry: expose NOT_FOUND for missing segments (Risk: LOW) - Add NOT_FOUND to SegmentRegistryResultStatus + factory method. - Return NOT_FOUND when getSegment targets a missing directory. - Keep createSegment creating new segments even when others exist. - Tests: missing-segment lookup, status plumbing.

[x] 72 SegmentRegistryBuilder: configure only via with* methods (Risk: LOW) - Remove constructor parameters from SegmentRegistryBuilder. - Ensure all required inputs are set via with... methods. - Update call sites and tests to use the builder setters.

[x] 73 SegmentRegistry handler-backed cache (Risk: MEDIUM) - Make SegmentRegistryCache store SegmentHandler per SegmentId (segment + lock state as one entry). - Keep SegmentRegistry.getSegment returning SegmentRegistryResult to signal registry state; map LOCKED to BUSY. - Add internal accessors for handler-only flows (split/evict) without exposing handler in the public registry API. - Update eviction logic to skip LOCKED handlers and keep cache/handler in sync. - Tests: locked entry not evicted, handler/segment consistency, BUSY returned when handler locked.

[x] 74 RegistryAccess: lock via SegmentHandler (Risk: MEDIUM) - Add internal accessor that returns the SegmentHandler for a segmentId + expected segment instance (BUSY/ERROR when mismatch). - Remove lockSegmentHandler/unlockSegmentHandler from SegmentRegistryLocking and SegmentRegistryAccess. - Update SegmentRegistryAccessAdapter to expose handler instead of lock/unlock methods.

[x] 75 Split flow: use handler lock directly (Risk: MEDIUM) - In SegmentSplitCoordinator, acquire handler via registry access and call handler.lock()/handler.unlock() directly. - Keep BUSY mapping when handler is locked. - Ensure eviction path still validates handler instance + state.

[x] 76 Tests + cleanup for handler locking (Risk: LOW) - Update tests that currently call registry lock/unlock to use handler locking instead. - Remove unused lock methods from SegmentRegistryImpl. - Verify eviction skips locked handlers and BUSY is returned when locked.

[x] 77 SegmentRegistry target-state rollout from docs/architecture/registry/registry.md (Risk: HIGH) - Goal: make implementation fully match the documented registry model (state gate + per-key Entry state machine + single-flight load + bounded cache eviction + unload semantics). - Global rule: every step in 77.x must preserve behavioral parity with docs/architecture/registry/registry.md. If behavior must change, update registry.md and diagrams first in the same PR before code changes. - Hard constraints: - no global lock in get hot path - unrelated keys must not block each other - per-key wait only on the same Entry - LOADING waits, UNLOADING maps to BUSY - load/open failures are exception-driven - Exit criteria: - behavior parity with docs/architecture/registry/registry.md and docs/architecture/images/registry-seq*.plantuml - all new/updated tests green - no flakiness in repeated concurrency runs

[x] 77.1 Freeze target contract and remove ambiguity (Risk: HIGH) - Pin docs/architecture/registry/registry.md + diagrams as source of truth. - Explicitly list non-negotiable runtime rules in code comments/Javadocs: - state gate mapping: READY normal, FREEZE -> BUSY, CLOSED -> CLOSED, ERROR -> ERROR - cache state mapping: LOADING wait, UNLOADING -> BUSY - failed unload leaves UNLOADING (documented behavior) - Acceptance: - no contradictory comments/Javadocs in segmentregistry package - docs and code contracts use same method names

[x] 77.2 Implement/align per-key Entry API contract (Risk: HIGH) - Ensure SegmentRegistryCache.Entry exposes and follows: - tryStartLoad() - waitWhileLoading(currentAccessCx) - finishLoad(value) - fail(exception) - tryStartUnload() - finishUnload() - getEvictionOrder() - Ensure lock/condition is strictly per-entry (no cross-key monitor). - Acceptance: - transitions only: MISSING->LOADING->READY->UNLOADING->MISSING - invalid transitions return fast/fail predictably

[x] 77.3 Align get(key) miss path to single-flight semantics (Risk: HIGH) - Use putIfAbsent race semantics correctly: - winner: entryInMap == null then load - loser: wait on the existing entry from map - Ensure wait target is the entry stored in map, not a local temporary. - Ensure load failure path calls fail(exception), wakes waiters, and removes the expected entry from map. - Acceptance: - exactly one loader execution per key under high contention - all losers observe winner result or propagated exception

[x] 77.4 Align get(key) hit path semantics (Risk: HIGH) - READY: immediate return + recency update. - LOADING: block only on same entry until READY/failure. - UNLOADING: do not wait; return BUSY to caller. - Acceptance: - no waiting on keys in UNLOADING - no blocking between unrelated keys

[x] 77.5 Implement bounded eviction flow per docs (Risk: HIGH) - Keep capacity enforcement in cache layer. - Candidate selection: - LRU by accessCx - exclude requested key in removeLastRecentUsedSegment(exceptSegmentId) - only READY candidates can move to UNLOADING - Start close asynchronously, remove only after close success. - Acceptance: - eviction never unloads exceptSegmentId - failed tryStartUnload retries candidate selection without global stall

[x] 77.6 Lifecycle executor behavior and failure handling (Risk: HIGH) - Verify load/open and close/unload execution contexts follow design: - load for seq03 scenario in caller thread - close/unload on lifecycle executor thread - Define exact reaction to close failure: - keep entry UNLOADING - subsequent get returns BUSY - do not remove cache entry - Acceptance: - no caller-thread close IO - failed close path is deterministic and test-covered

[x] 77.7 Registry gate lifecycle alignment (Risk: MEDIUM) - Ensure startup: FREEZE -> READY. - Ensure close flow: READY -> FREEZE -> CLOSED. - Ensure idempotent close and terminal ERROR semantics. - Acceptance: - gate transitions are atomic and race-safe under concurrent calls - status mapping is consistent for all operations

[x] 77.8 API/status cleanup to match exception-driven load policy (Risk: MEDIUM) - Preserve SegmentRegistryAccess for status-oriented flows. - Keep load/open failure as propagated runtime exception from registry load paths (per docs). - Remove or deprecate status branches that conflict with this policy. - Acceptance: - no mixed behavior where same failure is sometimes status, sometimes throw

[x] 77.9 Unit tests for Entry/cache state machine (Risk: HIGH) - Extend SegmentRegistryCacheTest with deterministic tests: - single-flight: same key, many threads -> loader called once - wait-on-loading: loser threads block and then return same value - load failure wakeup: all waiters receive same failure - unloading maps to BUSY (no waiting) - eviction excludes exceptSegmentId - close failure leaves UNLOADING - Use CountDownLatch/CyclicBarrier to force races. - Add @Timeout to every concurrency-sensitive test.

[x] 77.10 Registry-level behavior tests (Risk: HIGH) - Update/add tests in: - SegmentRegistryImplTest - SegmentRegistryStateMachineTest - SegmentRegistryAccessImplTest - Verify: - gate mapping (FREEZE/BUSY, CLOSED/CLOSED, ERROR/ERROR) - startup transition (FREEZE->READY) - getSegment behavior across READY/LOADING/UNLOADING - exception propagation on load/open failure

[x] 77.11 High-concurrency integration verification (Risk: HIGH) - Extend/execute: - IntegrationSegmentIndexConcurrencyTest - SegmentIndexImplConcurrencyTest - SegmentSplitCoordinatorConcurrencyTest - Add focused registry stress tests (new class): - many threads on same key (single-flight proof) - many threads on different keys (independence proof) - eviction + concurrent gets + split coordinator interaction - Run repeated stress cycles to catch flakes. - Completed: - Added and executed src/test/java/org/hestiastore/index/segmentindex/SegmentRegistryConcurrencyStressTest.java. - Passed: mvn -q -Dtest=IntegrationSegmentIndexConcurrencyTest,SegmentIndexImplConcurrencyTest,SegmentSplitCoordinatorConcurrencyTest,SegmentRegistryConcurrencyStressTest test - Flake gate passed: 20/20 repeated runs with 0 failures.

[x] 77.12 Quality gates and release checklist (Risk: HIGH) - Mandatory local gates before merge: - targeted unit tests: mvn -q -Dtest=SegmentRegistryCacheTest,SegmentRegistryImplTest,SegmentRegistryStateMachineTest test - concurrency/integration tests: mvn -q -Dtest=IntegrationSegmentIndexConcurrencyTest,SegmentIndexImplConcurrencyTest,SegmentSplitCoordinatorConcurrencyTest test - full verification: mvn verify - Flake gate: - rerun concurrency suite N times (recommended N=20) and require 0 flakes. - Code quality gate: - no TODO/FIXME left in touched files - Javadocs reflect final behavior - diagrams and registry.md updated if behavior changed - Completed: - Passed targeted unit tests: mvn -q -Dtest=SegmentRegistryCacheTest,SegmentRegistryImplTest,SegmentRegistryStateMachineTest test - Passed concurrency/integration tests: mvn -q -Dtest=IntegrationSegmentIndexConcurrencyTest,SegmentIndexImplConcurrencyTest,SegmentSplitCoordinatorConcurrencyTest,SegmentRegistryConcurrencyStressTest test - Passed full verification: mvn verify - TODO/FIXME scan on touched files: none found.

[x] 77.13 Rollout and fallback plan (Risk: MEDIUM) - Deliver in small PRs matching 77.1-77.12 order. - After each PR: - run targeted regression suite - update docs/architecture/registry/registry.md if contract changed - Keep a temporary feature flag only if needed for safe migration. - Remove fallback/compatibility code when final parity is reached. - Completed: - Work delivered incrementally following 77.1 -> 77.12 sequence. - Regression suites executed after key steps and before final merge gate. - No temporary feature flag required for this rollout.