Abstract
Search trees, like B+-trees, are often used as index structures in data systems to improve query performance at the cost of index construction and maintenance. For state-of-the-art B+-tree designs used in commercial data systems, this cost is negligible if the data arrives as fully sorted on the index attribute. Further, production systems employ a fast-path ingestion technique for B+-trees that directly appends the incoming entries to the tail leaf if the data is fully sorted, drastically reducing the index construction cost. However, this is only effective if the incoming data arrives fully sorted or with an extremely small number of out-of-order entries. In addition, the state-of-the-art sortedness-aware design (SWARE) navigates a tradeoff between reads and writes by buffering incoming data to absorb near-sortedness, which comes at the cost of slower query performance and increased overall design complexity.
To address these challenges, we present Quick Insertion Tree (QuIT), a sortedness-aware indexing data structure that improves ingestion performance with minimal design complexity and no read overhead. QuIT maintains in memory a pointer to the predicted ordered-leaf (pole) that provides a sortedness-aware fast-path optimization, and facilitates faster index ingestion. The key benefit comes from accurately predicting pole throughout data ingestion. Further, QuIT achieves high memory utilization by maintaining tightly packed leaf nodes when the ingested data arrives as near-sorted. This, in turn, helps improve performance during range lookups. Overall, we demonstrate that QuIT outperforms B+-tree (SWARE) by up to 3× (2×) for ingestion, while maintaining the same point lookup performance (up to 1.23× faster). QuIT also accesses up to 2× fewer leaf nodes than the B+-tree during range lookups.