Skip pointer: A Trailblazing Search Algorithm for Time-series Databases

Savan Nahar
6 min readJan 13, 2024

--

A couple of days after completing the code

I wrote this blog a while ago, and now the repo and Slack channel are private to offer the database to customers 💵 💰

Introduction

In a world where data generation and consumption is skyrocketing, creating performative and scalable time-series databases has moved from being a desire to a necessity.

In this blog post, we shed light on the inner workings of our unique search algorithm named Optimus, which was developed with inspiration drawn from the idea of Faster Postings List Intersection via Skip Pointers presented by Stanford University.

In simple terms, what is this blog about

Before Optimization: In the initial framework, searching for terms like “Machine Learning” and “Python” involved sifting through every message ID in the database and comparing each one against the search terms — an exhaustive and time-consuming process. When multiple terms were involved, the process was even slower, intertwining the individual posting lists for each term and then locating the intersection.

After Optimization: With the implemented Faster Postings List Intersection via Skip Pointers concept, the search operation became dramatically faster. The revised search algorithm now “skips” over irrelevant segments in the postings list, reducing the total comparisons. For multiple term queries, our nested postings list design further economizes the intersection operation. The overall result is a tenfold performance improvement, accelerating searches drastically and boosting user experience with the system.

Understanding TSDB’s Structure

Index and Segments

At the highest level of our data structure, we have an index that contains multiple segments, stored in a DashMap<u32, Segment>. This DashMap ensures concurrent access to different segments in the structure. Each segment is an assemblage containing three parts: terms, inverted_map, and forward_map.

The terms: DashMap<String, u32> is a mapping from the term (a string) to a unique integer. This helps us to ensure that every unique key (term) is only stored once, which aids in efficient storage.

Next, there’s inverted_map: DashMap<u32, PostingsList>. An inverted map is a reference structure that holds the documents (or postings) in which a term appears. This design allows us to quickly retrieve all postings pertinent to a specific term, thus facilitating an optimised search.

Finally, we have the forward_map: DashMap<u32, LogMessage>. This is a mapping from a unique integer to the actual message or data that we want to retrieve. The purpose of having this map is to quickly fetch the desired data/document during the retrieval phase.

Posting List and Blocks

The postings lists are designed for optimized storage and retrieval. They have a two-tier structure: postings_list_compressed and last_block.

The postings_list_compressed: Vec<PostingsBlockCompressed>is primarily used to save space. The postings are grouped into blocks and then compressed using a certain compression algorithm, facilitating an optimized approach to space usage.

The last_block: PostingsBlock is an unprotected version of the compressed postings block. It’s the last block that’s been dealt with, kept uncompressed for quick retrieval.

Lastly, there is initial_values: Vec<u32>. This vector holds a set of integers that act as skip pointers. These pointers allow the search algorithm to ‘skip’ over certain blocks, allowing much faster traversal of the postings list.

Index
- Segments: DashMap<u32, Segment>

Segment Structure:
- Terms: DashMap<String, u32>
- Inverted_Map: DashMap<u32, PostingsList>
- Forward_Map: DashMap<u32, LogMessage>

Postings List:
- PostingBlockCompressed: Vec<PostingsBlockCompressed>
- LastBlock: PostingBlock
- Initial Values: Vec<u32>

PostingBlock
- MessageIds: Vec<u32>

A Deep Look at the Algorithm

Inspired by Stanford’s skip-pointers mechanism, this high-performance design is implemented in get_matching_doc_ids() function (Code link at the bottom), involving complex operations such as intersection, comparison, and skipping. Let’s examine this core function in detail:

Part 1: Preparing the Accumulator

As the first step, the get_matching_doc_ids() function prepares an accumulator, a Vec<u32>, drawing from the shortest posting list

loading accumulator

The accumulator loads from the uncompressed last_block of the shortest list if the initial population fails to provide any postings. If the accumulator remains empty still, the function short-circuits to conclude that there are no postings to discover.

Part 2: Traversing the Postings Lists

The function proceeds to traverse through the postings lists and the respective initial values (skip pointers). Within this outer loop, there are three main steps entailed in intersecting the accumulated postings with respective postings blocks.

2.1: Skipping to Reach Posting Block

Firstly, the accumulator iterates until its current value equals or exceeds the initial_value:

If the accumulator exceeds the initial_value, the function checks whether the current accumulator value lies between the current and the next initial_value, or if it’s greater than the last initial_value. If either condition is met, the matching postings block is decompressed to be compared with the accumulator. Decompressing only required blocks is what makes the algorithm efficient and optimized.

2.2: Intersection of Posting Block and Accumulator

After that, each posting block is compared with the accumulator, and if a match is found, it is pushed into a temporary result_set:

2.3: Further Skipping Postings

Lastly, every time an integer is added to the temp_result_set, the function reviews whether it can skip the remaining elements of the postings block. If it is feasible, it breaks the loop iterating over the postings block and moves to the next initial_value.
Finally, the accumulator is set to be the temp_result_set. This process continues iteratively until all the postings lists have been traversed and an intersection has been found.

Part 3: Clean-up and Fetch Log Messages

Following the completion of the main algorithm logic, the accumulator may contain duplicate Document IDs. Consecutive duplicates are removed by results_accumulator.dedup();.
The IDs collected in the accumulator are used to fetch the corresponding log messages using the get_log_messages_from_ids function.
This intricate algorithm ensures optimized search through skip pointers, ultimately enhancing the database’s performance on large-scale time-series datasets.

Conclusion

In conclusion, the optimization of the search algorithm for our time series database TSDB has proven to be exceedingly beneficial. Leveraging the concept of Faster Postings List Intersection via Skip Pointers as discussed in Stanford’s blog, helped improve query performance by 10x. This is a tangible enhancement that directly impacts the utility and efficiency of our system in meaningful ways.

For those interested in code this is the initial PR(Private now), this(Private now) is the refactored version of same code.

Checkout TSDB (Private now), give it a star, join Slack channel(Private now)😄

--

--

Savan Nahar
Savan Nahar

Written by Savan Nahar

Building software that scales!

No responses yet