Skip to content

🧱 Segment implementation

Segment is core part of index. It represents one string sorted table file with:

  • Partial consistency - iterator stop working or return consistent data
  • Support Writing changes into delta files
  • Bloom filter for faster evaluating if key is in index
  • Scarce index for faster searching for data in main index

🔄 Segment put/get and iterate consistency

operations like write and get should be always consistent. What is written is read. Iteration behave differently. better than provide old data it stop providing any data.

Let's have a followin key value entries in main index:

<a, 20 >
<b, 30 >
<c, 40 >

In segment cache are following entries:

<a, 25>
<e, 28>
<b, tombstone>

When user will iterate throught segment data, there will be followin cases:

Case 1 - Read data

iterator.read() --> <a, 25>
iterator.read() --> <c, 40>
iterator.read() --> <e, 28>

Case 2 - Change data

iterator.read() --> <a, 25>
segment.write(c, 10)
iterator.read() --> null

Any segment write operation will break segment iterator. It's easier way to secure segment consistency.

🗄️ Caching of segment data

In segment following object are cached:

  • SegmentDeltaCache - contains changed key value entry from segment
  • BloomFilter - bloom filter data
  • ScarceIndex - scarce index data

There are few classes that provide lazy loading of segment data a flexibility to cache segment data. Segment data are managed by following classes:

Sequence of call when cached data are required

Object SegmentData could contains objects SegmentDeltaCache, BloomFilter and ScarceIndex. All of them are lazy loaded by SegmentDataSupplier. For closer class description look at source code.

The following image shows that SegmentDatafactory can be referenced from SegmentDataProviderSimple, which is the simplest implementation that merely holds segment data from the factory. The class SegmentDataProviderFromMainCache interacts with the main index cache where the segment data is stored. Data may be evicted from the cache without any notification.

Cache related object relations

✍️ Writing to segment

Opening segment writer immediatelly close all segment readers. When writing operation add key that is in index but is not in cache this value will not returned updated.

Putting new entry into segment is here:

Segment writing sequence diagram