Skip to content

Conversation

rongma7
Copy link
Contributor

@rongma7 rongma7 commented May 13, 2025

This PR implements a more efficient design for the history index table in our multi-version storage system. The current system uses two tables: an index table that tracks which versions contain changes for each key, and a change table that stores the actual values. This redesign focuses on the index table to improve access patterns and reduce storage overhead.

Key Improvements

  1. Version Range Optimization: Instead of storing each <key, version> pair individually, the new design groups versions into ranges for each key. This significantly reduces the number of records in the index table.

  2. Latest Version Optimization: A special LATEST marker is used to quickly identify and access the most recent versions, which are accessed most frequently.

  3. Space-efficient Encoding: The OffsetBasedVersionRange structure provides a flexible encoding scheme that adapts to the density and magnitude of version numbers:

    • Uses bitmap encoding for dense, small-offset ranges
    • Uses u16/u32 vector encoding for sparse ranges
    • Special handling for extremely large offsets

Implementation Details

The new index table uses two types of keys:

  • <key, LATEST> for the latest (mutable) record
  • <key, end_version_number> for previous (immutable) records

The values in the index table are represented by the HistoryIndices enum, which has two variants:

  • Latest(start_version_number, range_encoding, latest_value) for the most recent versions
  • Previous(range_encoding) for previous versions

Core Functionality

This PR implements the HistoryIndices structure as a standalone component with interfaces needed for integration with the storage system:

  1. get_latest_value: Directly retrieves the latest value from the index table
  2. last_le: Finds the most recent version number less than or equal to a given version
  3. collect_versions_le: Collects all version numbers less than or equal to a given version

All errors related to index table corruption are now consistently defined as StorageError::CorruptedHistoryIndices.

Next Steps

This PR focuses on the data structure implementation. The integration with the actual storage system will be addressed in a subsequent PR.


This change is Reviewable

@rongma7 rongma7 requested a review from ChenxingLi May 13, 2025 03:58
Copy link
Contributor

@ChenxingLi ChenxingLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add unit tests for OneRange and HistoryIndices.

Reviewable status: 0 of 4 files reviewed, 3 unresolved discussions


src/middlewares/versioned_flat_key_value/history_indices/mod.rs line 13 at r1 (raw file):

pub const ONE_RANGE_BYTES: usize = 1 << ONE_RANGE_BYTES_LOG;

const _: () = assert!(ONE_RANGE_BYTES == 64 || ONE_RANGE_BYTES == 128);

use const_assert! here.


src/middlewares/versioned_flat_key_value/history_indices/one_range.rs line 34 at r1 (raw file):

///     - There are more than `(ONE_RANGE_BYTES / 2)` bits.
#[derive(Debug, Clone, PartialEq)]
pub enum OneRange {

Consider a better name for OneRange, Four, Two.


src/middlewares/versioned_flat_key_value/history_indices/one_range.rs line 225 at r1 (raw file):

}

pub trait Max {

Don't define so many traits for just one function. You can define one trait including everything required for the item in integer list, e.g., trait MyTrait: Ord + Copy + Into<u64>, and with one function saturating_from to combine min(T::MAX.into()) and from_u64_unchecked.

@rongma7 rongma7 force-pushed the history_indices branch from e7275f4 to 0422f5d Compare May 13, 2025 10:29
…ask calculation by using right shift instead of left shift
@rongma7 rongma7 force-pushed the history_indices branch 4 times, most recently from 3989cd2 to 84fdbd7 Compare May 15, 2025 10:20
…VersionRange to cover Bitmap cases with varying maximum bit indices.
@rongma7 rongma7 force-pushed the history_indices branch from 84fdbd7 to 08812ab Compare May 15, 2025 11:02
Copy link
Contributor Author

@rongma7 rongma7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 3 unresolved discussions (waiting on @ChenxingLi)


src/middlewares/versioned_flat_key_value/history_indices/mod.rs line 13 at r1 (raw file):

Previously, ChenxingLi (Chenxing Li) wrote…

use const_assert! here.

Done.


src/middlewares/versioned_flat_key_value/history_indices/one_range.rs line 34 at r1 (raw file):

Previously, ChenxingLi (Chenxing Li) wrote…

Consider a better name for OneRange, Four, Two.

Done.


src/middlewares/versioned_flat_key_value/history_indices/one_range.rs line 225 at r1 (raw file):

Previously, ChenxingLi (Chenxing Li) wrote…

Don't define so many traits for just one function. You can define one trait including everything required for the item in integer list, e.g., trait MyTrait: Ord + Copy + Into<u64>, and with one function saturating_from to combine min(T::MAX.into()) and from_u64_unchecked.

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants