Skip to content

βš™οΈ Configuration

Don’t be afraid to experimentβ€”if a configuration is missing or invalid, the SegmentIndex will fail fast, helping you catch issues early.

The index is configured using the IndexConfiguration class. All essential index properties are configurable through the builder. See the example below:

IndexConfiguration<Integer, Integer> conf = IndexConfiguration
    .<Integer, Integer>builder()//
    .withKeyClass(Integer.class)//
    .withValueClass(Integer.class)//
    .withKeyTypeDescriptor(tdi) //
    .withValueTypeDescriptor(tdi) //
    .withMaxNumberOfKeysInSegment(4) //
    .withMaxNumberOfKeysInSegmentCache(10L) //
    .withMaxNumberOfKeysInSegmentCacheDuringFlushing(12L)//
    .withMaxNumberOfKeysInSegmentIndexPage(2) //
    .withMaxNumberOfKeysInCache(3) //
    .withBloomFilterIndexSizeInBytes(0) //
    .withBloomFilterNumberOfHashFunctions(4) //
    .withContextLoggingEnabled(false) //
    .withName("test_index") //
    .build();

SegmentIndex<Integer, Integer> index = SegmentIndex.<Integer, Integer>create(directory, conf);

Now let's look at particular parameters.

πŸ“ SegmentIndex Directory

Place where all data are stored. There are two already prepared types:

🧠 In Memory

All data are stored in memory. It's created like this:

Directory directory = new MemDirectory();

It's usefull for testing purposes.

πŸ’Ύ File system

Its main purpose is to store index data in the file system. Create a file-system-based directory like this:

Directory directory = new FsDirectory(new File('my directory'));

🧾 Properties of IndexConfiguration class

All properties are required and have the following meanings:

πŸ”‘ Key class - withKeyClass()

A Class object that represents the type of keys used in the index. Only instances of this class may be inserted. While any Java class is technically supported, it's recommended to use simple, compact types for performance reasons. Predefined classes are:

  • Integer
  • Long
  • String
  • Byte

If a different class is used, the key type descriptor must be set using the withKeyTypeDescriptor() method from the builder. If you use a custom class, you must implement the com.hestiastore.index.datatype.TypeDescriptor interface to describe how the type is serialized and compared.

🧲 Value class - withValueClass()

Required. Specifies the Java class used for values. The same rules that apply to the key class also apply to the value class.

🏷️ SegmentIndex name - withName()

Required. Assigns a logical name to the index. This can be useful in diagnostics and logging.

🧩 Key type descriptor - withKeyTypeDescriptor()

Type descriptor for the key class. Required for non-default types.

🧩 Value type descriptor - withValueTypeDescriptor()

Type descriptor for the value class. Required for non-default types.

πŸ—ƒοΈ Max number of keys in cache - withMaxNumberOfKeysInCache()

Sets the maximum number of key-value entries allowed in the in-memory cache before flushing.

🧱 Max number of segments in cache - withMaxNumberOfSegmentsInCache()

Limits the number of segments stored in memory. Useful for controlling memory usage.

πŸ”’ Thread safe - withThreadSafe()

Whether the index instance is safe for concurrent access by multiple threads. When it's set to code true than index will be synchronized between threads.

Default value is 'false'.

πŸ—’οΈ Context logging enabled - withContextLoggingEnabled()

Controls whether the index wraps operations with MDC context propagation so log statements include the index name. When it's set on 'true' following loog message will contain set 'index' property:

<Console name="indexAppender" target="SYSTEM_OUT">
    <PatternLayout
        pattern="%d{ISO8601} %-5level [%t] index='%X{index.name}' %-C{1.mv}: %msg%n%throwable" />
</Console>

Default value is 'true'.

Please note, that in highly intensive applications enabling this option could eat up to 40% of CPU time.

πŸ“ Max number of keys in segment - withMaxNumberOfKeysInSegment()

Sets the maximum number of keys allowed in a single segment. Exceeding this splits the segment.

πŸ—ƒοΈ Max number of keys in segment cache - withMaxNumberOfKeysInSegmentCache()

Defines how many keys can be cached from a segment during regular operation.

🚿 Max number of keys in segment cache during flushing - withMaxNumberOfKeysInSegmentCacheDuringFlushing()

Specifies the maximum number of keys that can be temporarily cached from a segment during flushing.

πŸ“‘ Max number of keys in segment index page - withMaxNumberOfKeysInSegmentIndexPage()

Defines the number of keys in the index page for a segment. This impacts lookup efficiency.

🌸 Bloom filter configuration

A Bloom filter is a probabilistic data structure that efficiently tests whether an element is part of a set. You can find a detailed explanation on Wikipedia. In this context, each segment has its own Bloom filter.

To disable bloom filter completle set:

 .withBloomFilterIndexSizeInBytes(0)

The settings for the Bloom filter can be adjusted using the following methods:

πŸ“¦ Bloom filter size - withBloomFilterIndexSizeInBytes()

Sets the size of the Bloom filter in bytes. A value of 0 disables the use of the Bloom filter.

πŸ”’ Number of hash functions - withBloomFilterNumberOfHashFunctions()

Sets the number of hash functions used in the Bloom filter.

πŸ“ˆ Probability of false positive - withBloomFilterProbabilityOfFalsePositive()

Sets the probability of false positives. When get(someKey) is called on a segment, the Bloom filter is checked to determine if the value is not in the segment. It can return true, indicating that the key could be in the segment. If the Bloom filter indicates the key is in the segment but it's not found, that's a false positive. The probability of this occurring is a value between 0 and 1.

Usually, it's not necessary to adjust the Bloom filter settings.

✏️ Changing SegmentIndex propertise

Some parameters can be redefined when the index is opened.

SegmentIndex<String, String> index = SegmentIndex.<String, String>open(directory, conf);

At allows to pass IndexConfiguration object and this way change configuration parameters. Fllowing table shou parameters that can be changed.

Name Meaning Can be changed Applies to
indexName Logical name of the index 🟩 index
keyClass Key class πŸŸ₯ index
valueClass Value class πŸŸ₯ index
keyTypeDescriptor Key class type descriptor πŸŸ₯ index
valueTypeDescriptor Value class type descriptor πŸŸ₯ index
maxNumberOfKeysInSegmentIndexPage Maximum keys in segment index page πŸŸ₯ segment
maxNumberOfKeysInSegmentCache Maximum number of keys in segment cache 🟩 segment
maxNumberOfKeysInSegmentCacheDuringFlushing Maximum keys in cache during flushing 🟩 segment
maxNumberOfKeysInCache Maximum keys in the index cache 🟩 index
maxNumberOfKeysInSegment Maximum keys in a segment πŸŸ₯ segment
maxNumberOfSegmentsInCache Maximum number of segments in cache 🟩 index
bloomFilterNumberOfHashFunctions Bloom filter - number of hash functions used πŸŸ₯ segment bloom filter
bloomFilterIndexSizeInBytes Bloom filter - index size in bytes πŸŸ₯ segment bloom filter
bloomFilterProbabilityOfFalsePositive Bloom filter - probability of false positives πŸŸ₯ segment bloom filter
diskIoBufferSize Size of the disk I/O buffer 🟩 Disk IO
threadSafe If index is thread-safe 🟩 index
contextLoggingEnabled If MDC-based context logging is enabled 🟩 index

βž• Add custom data type

HestiaStore have to know how to work with new data type. So first is create implementatio of com.hestiastore.index.datatype.TypeDescriptor. Than during index creation set let index know about your implementation by withKeyTypeDescriptor. And it's done.